AgentSkillsCN

creating-pipeline-templates

当您为DataGenFlow创建新的管道模板(YAML + 种子文件)时使用此功能。该功能可引导您完成区块选择、YAML编写、种子文件创建与验证。适用于任何涉及lib/templates/目录的任务,无论是新增模板用例,还是创建与管道变量相匹配的种子文件时使用此功能。

SKILL.md
--- frontmatter
name: creating-pipeline-templates
description: Use when creating new pipeline templates (YAML + seed files) for DataGenFlow. Guides through block selection, YAML authoring, seed file creation, and validation. Use for any task involving lib/templates/ directory, adding new template use cases, or creating seed files that match pipeline variables.

Creating Pipeline Templates

Templates are YAML definitions + seed files in lib/templates/. Auto-discovered on startup by TemplateRegistry (lib/templates/__init__.py).

  • Template ID = filename without .yaml
  • Seed file: seed_<template_id>.json or seed_<template_id>.md

Template YAML Format

yaml
name: Template Display Name
description: What this template generates
blocks:
  - type: BlockClassName       # must match class name exactly
    config:
      param1: value1           # must match __init__ parameter names exactly
      user_prompt: "{{ var }}" # Jinja2 references to seed metadata
  - type: AnotherBlock
    config:
      field_name: generated

Seed File Format

JSON (most templates):

json
[
  {"repetitions": 3, "metadata": {"content": "input text here"}}
]

Markdown (only for MarkdownMultiplierBlock as first block):

  • File: seed_<template_id>.md
  • Registry auto-wraps as [{"repetitions": 1, "metadata": {"file_content": "<content>"}}]

Available Blocks

BlockCategoryKey OutputsNotes
TextGeneratorgeneratorsassistant, system, userfree-text via LLM
StructuredGeneratorgeneratorsgeneratedJSON via LLM with schema
SemanticInfillergeneratorsdynamiccomplete skeleton records
StructureSamplerseedersskeletons, _seed_samplesmultiplier, must be first
MarkdownMultiplierBlockseederscontentmultiplier, must be first
ValidatorBlockvalidatorstext, valid, assistanttext rules
JSONValidatorBlockvalidatorsvalid, parsed_jsonJSON parse + validate
DuplicateRemovervalidatorsgenerated_samplesembedding similarity
DiversityScoremetricsdiversity_scorelexical diversity
CoherenceScoremetricscoherence_scoretext coherence
RougeScoremetricsrouge_scoreROUGE comparison
RagasMetricsmetricsragas_scoresRAGAS QA evaluation
FieldMapperutilitiesdynamicJinja2 field expressions
LangfuseBlockobservabilitylangfuse_trace_urltrace logging

Common Pipeline Patterns

code
# simple generation + validation
StructuredGenerator → JSONValidatorBlock

# document processing (multiplier first)
MarkdownMultiplierBlock → TextGenerator → StructuredGenerator → JSONValidatorBlock

# data augmentation
StructureSampler → SemanticInfiller → DuplicateRemover

# generation + metrics
StructuredGenerator → FieldMapper → RagasMetrics

Step-by-Step Workflow

  1. Define use case — what data to generate, what fields in output, what seed input needed
  2. Choose blocks — pick from table above, wire outputs to inputs
  3. Write YAMLlib/templates/<template_id>.yaml
  4. Write seed file — match {{ variables }} in YAML to metadata keys
  5. Validate template loads:
    bash
    uv run python -c "
    from lib.templates import template_registry
    for t in template_registry.list_templates():
        print(f'{t[\"id\"]}: {t[\"name\"]}')
    "
    
  6. Check block params (if unsure about config keys):
    bash
    uv run python -c "
    from lib.blocks.registry import BlockRegistry
    registry = BlockRegistry()
    for name, cls in registry._blocks.items():
        schema = cls.get_schema()
        print(f'{name}: {list(schema.get(\"config_schema\", {}).get(\"properties\", {}).keys())}')
    "
    
  7. Test single execution:
    bash
    # create pipeline from template
    curl -s -X POST http://localhost:8000/api/pipelines/from_template/<template_id> | python -m json.tool
    # execute with seed
    curl -s -X POST http://localhost:8000/api/pipelines/<id>/execute \
      -H 'Content-Type: application/json' \
      -d '{"content": "test input"}' | python -m json.tool
    

Reference Templates

TemplateFilePattern
JSON Generationjson_generation.yamlStructuredGenerator → JSONValidator
Text Classificationtext_classification.yamlStructuredGenerator → JSONValidator
Q&A Generationqa_generation.yamlMultiplier → Text → Structured → JSONValidator
Data Augmentationdata_augmentation.yamlSampler → Infiller → DuplicateRemover
RAGAS Evaluationragas_evaluation.yamlStructured → FieldMapper → RagasMetrics

Common Mistakes

MistakeFix
Block type doesn't match class nameCheck lib/blocks/builtin/ for exact class names
Config key doesn't match __init__ paramRead block source, match parameter names
Missing seed variable referenced in promptAdd the variable to seed metadata
MarkdownMultiplierBlock not firstMultiplier blocks must always be first
Seed file not named seed_<template_id>.*Template ID must match: foo.yamlseed_foo.json

Checklist

  • YAML in lib/templates/ with correct block types and config keys
  • Seed file matching template ID with all referenced variables
  • Template loads via TemplateRegistry
  • Single execution produces expected output fields
  • Trace shows all blocks executed successfully
  • Seed file has 2-3 diverse examples

Related Skills

  • implementing-datagenflow-blocks — creating new block types
  • debugging-pipelines — troubleshooting template execution
  • testing-pipeline-templates — thorough end-to-end testing