Debugging DataGenFlow Pipelines
Overview
Systematic debugging workflow for any DataGenFlow pipeline failure or unexpected output. This skill provides a structured four-phase process to identify and fix root causes rather than guessing at solutions.
Core Principle: Find the root cause before attempting fixes. Random fixes waste time and create new bugs.
When to Use
Use this skill when:
- •Pipeline execution fails with unclear errors
- •Pipeline produces "bad data" or unexpected output
- •Need to isolate which block is causing issues
- •LLM generates duplicates or poor quality content
- •Output has unexpected fields (metadata pollution)
- •Results are missing expected fields
- •Performance issues or slow execution
- •Integration test failures
When NOT to Use
Skip this skill for:
- •Simple configuration errors (typos in config)
- •Documentation lookup (how to use a specific block)
- •Feature requests (adding new functionality)
- •Questions about architecture (use codebase exploration instead)
The Four-Phase Debugging Process
Phase 1: Observe & Gather Evidence
Goal: Understand what's wrong and collect data
Steps:
- •
Run the pipeline and capture full output
- •Use pytest for tests:
pytest tests/integration/test_X.py -v -s - •For API, check logs and response data
- •Save the complete error message and stack trace
- •Use pytest for tests:
- •
Identify what makes output "bad"
- •Missing fields? (expected
pricebut not in output) - •Wrong values? (all prices are 0)
- •Extra fields? (input metadata leaking:
samples,target_count) - •Duplicates? (similarity_score = 1.0, exact copies)
- •Type errors? (expected dict, got list)
- •Missing fields? (expected
- •
Check recent changes
- •Run
git diffto see what changed - •Review recent commits that might affect this pipeline
- •Check if tests passed before the change
- •Run
- •
Review error messages completely
- •Read the full stack trace, not just the last line
- •Note file paths, line numbers, and error types
- •Check for validation errors with detail context
Red Flags to Stop:
- •"I think I know the problem" (without evidence)
- •"Let me try changing X" (before tracing data flow)
- •Skipping logs because "error is obvious"
Phase 2: Trace Data Flow
Goal: Understand how data transforms through the pipeline
Steps:
- •
Identify which blocks touch the problematic data
- •Check pipeline definition (YAML or dict)
- •List all blocks in execution order
- •Note which blocks read/write the affected fields
- •
Read block implementations
- •Open
lib/blocks/builtin/[block_name].py - •Review the
execute()method - •Check what inputs it expects and outputs it returns
- •Look for data transformations or filtering logic
- •Open
- •
Trace data transformation between blocks
- •Check
lib/workflow.py:_process_single_seed()for multiplier pipelines - •See how
accumulated_statemerges block outputs - •Identify where data gets added, modified, or removed
- •Check
- •
Check workflow execution flow
- •Normal pipeline:
lib/workflow.py:85-224 - •Multiplier pipeline:
lib/workflow.py:305-449 - •Understand seed processing vs result filtering
- •Normal pipeline:
Key Files to Check:
- •
lib/workflow.py- Pipeline execution engine - •
lib/blocks/builtin/- All block implementations - •
lib/entities/block_execution_context.py- Context passed between blocks
Phase 3: Root Cause Analysis
Goal: Form a specific, testable hypothesis
Steps:
- •
Form specific hypothesis
- •Format: "I think X causes Y because Z"
- •Example: "I think input metadata leaks to output because workflow.py line 323 merges all initial_data without filtering"
- •Be specific, not vague
- •
Don't assume - verify with evidence
- •Read the actual code at the suspected line
- •Check logs or traces confirming the behavior
- •Look for similar patterns in other files
- •
Use logs, traces, and execution results
- •Check test output for actual vs expected values
- •Review trace data showing block inputs/outputs
- •Examine execution_time for performance issues
Red Flags:
- •"It's probably just..." (guessing)
- •"This usually means..." (pattern matching without verification)
- •Proposing fixes before understanding the cause
Phase 4: Fix & Verify
Goal: Implement minimal fix targeting the root cause
Steps:
- •
Make minimal fix
- •Change only what's necessary to fix the root cause
- •Don't refactor or "improve" surrounding code
- •One logical change at a time
- •
Run tests to verify fix
- •Run the specific failing test
- •Check for test passing
- •Run related tests to catch regressions
- •
Check for side effects
- •Did the fix break other tests?
- •Are there related features that might be affected?
- •Review the change for unintended consequences
- •
If fix doesn't work
- •Count: How many fixes have you tried?
- •If < 3: Return to Phase 1, re-analyze with new information
- •If ≥ 3: Question the architecture - might need design discussion
Success Criteria:
- •Tests pass
- •Root cause addressed (not just symptoms)
- •No new bugs introduced
- •Code follows project guidelines (KISS, minimal changes)
Common Pipeline Issues
| Issue Pattern | Where to Look | Typical Root Causes | Fix Pattern |
|---|---|---|---|
| Output has unexpected fields | lib/workflow.py data merging | Input metadata leaking to output | Filter initial_data_keys before returning results |
| Block returns wrong data type | Block's execute() method | Incorrect return type (dict vs list) | Fix block to return declared type |
| LLM generates poor quality | Block's prompt building | Unclear instructions, low temperature, copying examples | Improve prompt, add diversity instructions |
| LLM copying examples verbatim | SemanticInfiller prompt | Prompt doesn't emphasize creating NEW content | Add "do NOT copy" instruction to prompt |
| Pipeline crashes on specific input | Block's validation logic | Missing input validation or type checking | Add validation in block's execute() |
| Results missing fields | Block's output filtering or merging | Overly aggressive filtering or incorrect merge | Check field filtering logic |
| All duplicates flagged | DuplicateRemover threshold | Threshold too low or embedding model issues | Check similarity_threshold config |
| Metadata pollution | Workflow seed processing | Initial seed data not filtered from output | Use _filter_output_data() helper |
Critical Files Reference
Pipeline Execution:
- •
lib/workflow.py:85-224- Normal pipeline execution flow - •
lib/workflow.py:305-449- Multiplier pipeline (1→N expansion) with seed processing - •
lib/workflow.py:275-284-_filter_output_data()helper (filters metadata from results)
Built-in Blocks:
- •
lib/blocks/builtin/structure_sampler.py- Statistical sampling (multiplier block) - •
lib/blocks/builtin/semantic_infiller.py:59-109- LLM prompt building - •
lib/blocks/builtin/semantic_infiller.py:146-165- Metadata filtering in SemanticInfiller - •
lib/blocks/builtin/duplicate_remover.py- Embedding-based similarity detection
Core Infrastructure:
- •
lib/entities/block_execution_context.py- Context passed between blocks - •
lib/blocks/base.py- BaseBlock interface all blocks inherit from - •
lib/entities/pipeline.py- ExecutionResult, Usage models - •
lib/template_renderer.py- Jinja2 template rendering
Tests:
- •
tests/integration/- Integration tests for end-to-end verification - •
tests/blocks/- Unit tests for individual blocks
Debugging Checklist
Use this checklist to ensure systematic debugging:
Phase 1: Observe & Gather Evidence
□ Run pipeline and capture full output
□ Identify specific problem (what's wrong?)
□ Read error messages completely (full stack trace)
□ Check recent git changes (git diff, git log)
Phase 2: Trace Data Flow
□ Check which blocks are in the pipeline
□ Read those block implementations (execute methods)
□ Trace data flow through blocks (accumulated_state)
□ Understand workflow execution (normal vs multiplier)
Phase 3: Root Cause Analysis
□ Form specific hypothesis ("X causes Y because Z")
□ Verify hypothesis with evidence (code, logs, traces)
□ Don't assume - read actual code
□ Check for similar patterns elsewhere
Phase 4: Fix & Verify
□ Make minimal fix targeting root cause
□ Run tests to verify fix works
□ Check for unintended side effects
□ Follow project guidelines (KISS, simplicity)
Real-World Example: Data Augmentation Metadata Pollution
Problem Observed:
Pipeline output contained input configuration fields (samples, target_count, categorical_fields) mixed with generated data.
Phase 1 - Evidence:
// Expected output:
{"category": "electronics", "price": 449, "description": "...", "is_duplicate": false}
// Actual output:
{"category": "electronics", "price": 449, "description": "...",
"samples": [...], "target_count": 10, "categorical_fields": [...]} // ❌ Bad!
Phase 2 - Trace:
- •Traced workflow.py seed processing
- •Found
merged_state = {**initial_data, **seed_data}at line 323 - •Merged state flows through all blocks
- •No filtering before returning results
Phase 3 - Root Cause: Hypothesis: "Input metadata leaks to output because workflow.py merges all initial_data into accumulated_state without filtering configuration fields before returning results"
Phase 4 - Fix:
- •Added
_filter_output_data()helper method - •Track
initial_data_keysat merge time - •Filter those keys before returning
ExecutionResult - •Tests passed, metadata removed from output
Lessons:
- •Data flow tracing revealed the merge point
- •Minimal fix (filter helper) solved the root cause
- •No refactoring needed - targeted change only
Tips for Effective Debugging
- •
Start with the simplest explanation
- •Don't assume complex bugs when simple causes are more likely
- •Check configuration before code logic
- •
Use the scientific method
- •Observe → Hypothesize → Test → Verify
- •One variable at a time
- •
Trust but verify
- •Don't trust assumptions about what code does
- •Read the actual implementation
- •
Leverage existing patterns
- •Look for similar working code in the codebase
- •Compare broken vs working implementations
- •
Document as you go
- •Keep notes on what you've checked
- •Record hypotheses and test results
- •Helps if you need to ask for help
Related Skills
- •
implementing-datagenflow-blocks- For understanding block structure and creation - •
address-pr-review- For evaluating whether debugging revealed design issues