Creating Pipeline Templates
Templates are YAML definitions + seed files in lib/templates/. Auto-discovered on startup by TemplateRegistry (lib/templates/__init__.py).
- •Template ID = filename without
.yaml - •Seed file:
seed_<template_id>.jsonorseed_<template_id>.md
Template YAML Format
yaml
name: Template Display Name
description: What this template generates
blocks:
- type: BlockClassName # must match class name exactly
config:
param1: value1 # must match __init__ parameter names exactly
user_prompt: "{{ var }}" # Jinja2 references to seed metadata
- type: AnotherBlock
config:
field_name: generated
Seed File Format
JSON (most templates):
json
[
{"repetitions": 3, "metadata": {"content": "input text here"}}
]
Markdown (only for MarkdownMultiplierBlock as first block):
- •File:
seed_<template_id>.md - •Registry auto-wraps as
[{"repetitions": 1, "metadata": {"file_content": "<content>"}}]
Available Blocks
| Block | Category | Key Outputs | Notes |
|---|---|---|---|
TextGenerator | generators | assistant, system, user | free-text via LLM |
StructuredGenerator | generators | generated | JSON via LLM with schema |
SemanticInfiller | generators | dynamic | complete skeleton records |
StructureSampler | seeders | skeletons, _seed_samples | multiplier, must be first |
MarkdownMultiplierBlock | seeders | content | multiplier, must be first |
ValidatorBlock | validators | text, valid, assistant | text rules |
JSONValidatorBlock | validators | valid, parsed_json | JSON parse + validate |
DuplicateRemover | validators | generated_samples | embedding similarity |
DiversityScore | metrics | diversity_score | lexical diversity |
CoherenceScore | metrics | coherence_score | text coherence |
RougeScore | metrics | rouge_score | ROUGE comparison |
RagasMetrics | metrics | ragas_scores | RAGAS QA evaluation |
FieldMapper | utilities | dynamic | Jinja2 field expressions |
LangfuseBlock | observability | langfuse_trace_url | trace logging |
Common Pipeline Patterns
code
# simple generation + validation StructuredGenerator → JSONValidatorBlock # document processing (multiplier first) MarkdownMultiplierBlock → TextGenerator → StructuredGenerator → JSONValidatorBlock # data augmentation StructureSampler → SemanticInfiller → DuplicateRemover # generation + metrics StructuredGenerator → FieldMapper → RagasMetrics
Step-by-Step Workflow
- •Define use case — what data to generate, what fields in output, what seed input needed
- •Choose blocks — pick from table above, wire outputs to inputs
- •Write YAML —
lib/templates/<template_id>.yaml - •Write seed file — match
{{ variables }}in YAML to metadata keys - •Validate template loads:
bash
uv run python -c " from lib.templates import template_registry for t in template_registry.list_templates(): print(f'{t[\"id\"]}: {t[\"name\"]}') " - •Check block params (if unsure about config keys):
bash
uv run python -c " from lib.blocks.registry import BlockRegistry registry = BlockRegistry() for name, cls in registry._blocks.items(): schema = cls.get_schema() print(f'{name}: {list(schema.get(\"config_schema\", {}).get(\"properties\", {}).keys())}') " - •Test single execution:
bash
# create pipeline from template curl -s -X POST http://localhost:8000/api/pipelines/from_template/<template_id> | python -m json.tool # execute with seed curl -s -X POST http://localhost:8000/api/pipelines/<id>/execute \ -H 'Content-Type: application/json' \ -d '{"content": "test input"}' | python -m json.tool
Reference Templates
| Template | File | Pattern |
|---|---|---|
| JSON Generation | json_generation.yaml | StructuredGenerator → JSONValidator |
| Text Classification | text_classification.yaml | StructuredGenerator → JSONValidator |
| Q&A Generation | qa_generation.yaml | Multiplier → Text → Structured → JSONValidator |
| Data Augmentation | data_augmentation.yaml | Sampler → Infiller → DuplicateRemover |
| RAGAS Evaluation | ragas_evaluation.yaml | Structured → FieldMapper → RagasMetrics |
Common Mistakes
| Mistake | Fix |
|---|---|
Block type doesn't match class name | Check lib/blocks/builtin/ for exact class names |
Config key doesn't match __init__ param | Read block source, match parameter names |
| Missing seed variable referenced in prompt | Add the variable to seed metadata |
| MarkdownMultiplierBlock not first | Multiplier blocks must always be first |
Seed file not named seed_<template_id>.* | Template ID must match: foo.yaml → seed_foo.json |
Checklist
- • YAML in
lib/templates/with correct block types and config keys - • Seed file matching template ID with all referenced variables
- • Template loads via TemplateRegistry
- • Single execution produces expected output fields
- • Trace shows all blocks executed successfully
- • Seed file has 2-3 diverse examples
Related Skills
- •
implementing-datagenflow-blocks— creating new block types - •
debugging-pipelines— troubleshooting template execution - •
testing-pipeline-templates— thorough end-to-end testing