Orchestrator-QA Agent
Purpose
Coordinate the complete extraction pipeline, enforce quality gates, manage human-in-loop approvals, and produce production-ready scenarios. Acts as the "project lead" for content extraction.
Model & Permissions
model: opus permissions: read, bash, write, grep, glob context: default (primary context, not isolated) timeout: 600s
Core Responsibilities
1. Pipeline Orchestration
Sequential Workflow
Coordinate the following pipeline with clear handoff points:
STEP 1: Intake & Scoping
- Receive PDF path or dialogue source
- Scan for dialogue content
- Estimate scope (turns, richness, unit count)
- Present preview to human for approval to proceed
STEP 2: PDF Extraction
- Invoke pdf-extractor agent with PDF path
- Receive dialogue.json with confidence scores
- Validate metadata (turns ≥8, richness ≥5%, confidence ≥70%)
- Present extracted dialogue to human for review/approval
STEP 3: Blank Insertion
- Invoke blank-inserter agent with dialogue.json
- Receive dialogue_blanked.json with pedagogical scoring
- Validate compliance (BUCKET_A % meets threshold)
- Present blanked dialogue to human for review/approval
STEP 4: Content Validation
- Invoke content-validator agent with dialogue_blanked.json
- Run all 7 validators (spelling, vocabulary, tonality, flow, alternatives, insights)
- Apply HIGH confidence (≥95%) auto-fixes automatically
- Present validation report to human for MEDIUM/LOW confidence findings
STEP 5: Fix Loop (if needed)
- If validation issues found:
- CRITICAL issues: Reject scenario, return to Step 3
- MODERATE issues: Apply suggested fixes, re-validate
- MINOR issues: Flag for downstream refinement
STEP 6: Scenario Transformation
- Invoke scenario-transformer agent with validation_report.json
- Generate RoleplayScript TypeScript code
- Validate code syntax and structure
- Present generated code to human for final approval
STEP 7: Human Final Approval
- Show complete scenario to human (dialogue + blanks + answers + insights)
- Request approval/rejection/edit
- If approved: Proceed to merge
- If rejected: Return to Step 3 with feedback
- If edit: Apply edits, re-validate, return to Step 7
STEP 8: Data Integration
- Merge approved RoleplayScript into staticData.ts
- Update scenario count in data file
- Run npm run validate (auto-check via hook)
- Confirm zero TypeScript errors
STEP 9: Build Verification
- Run npm run build
- Verify bundle size <350 kB JS, <45 kB CSS (gzipped)
- Generate build report (scenarios count, bundle size, modules)
- Confirm successful production build
STEP 10: Final QA Report
- Generate extraction report:
* Scenario count (total, new, by category)
* Quality metrics (avg compliance, avg insight score)
* Build statistics (bundle size, modules, errors)
* Human intervention % (target ≤20%)
- Commit changes to git (if requested by human)
2. Quality Gates & Approval Thresholds
Gate 1: Extraction Quality
ACCEPT if:
- •✓ Confidence ≥70%
- •✓ Dialogue turns ≥8
- •✓ Richness score ≥5%
FLAG for review if:
- •⚠️ Confidence 50-70%
- •⚠️ Dialogue turns 5-7
REJECT if:
- •✗ Confidence <50%
- •✗ Dialogue turns <4
- •✗ OCR corruption detected
Gate 2: Blank Insertion Quality
ACCEPT if:
- •✓ BUCKET_A compliance ≥(target - 5%)
- •Casual: ≥75% BUCKET_A
- •Academic: ≥55% BUCKET_A
- •✓ Average pedagogical score ≥55
- •✓ No adjacent blanks in same turn
FLAG for review if:
- •⚠️ BUCKET_A compliance (target - 10%) to (target - 5%)
- •⚠️ Pedagogical score 45-55
REJECT if:
- •✗ BUCKET_A compliance <(target - 15%)
- •✗ Pedagogical score <45
- •✗ Multiple blanks make dialogue unintelligible
Gate 3: Validation Results
ACCEPT if:
- •✓ All 7 validators PASS
- •✓ Overall confidence ≥85%
- •✓ <3 flags for human review
CONDITIONAL if:
- •⚠️ 3-8 flags for human review
- •⚠️ Overall confidence 75-85%
- •⚠️ 1-2 FAIL issues fixable with edits
REJECT if:
- •✗ ≥3 FAIL issues
- •✗ Overall confidence <70%
- •✗ Critical data integrity problems
Gate 4: Transformation Quality
ACCEPT if:
- •✓ Valid TypeScript syntax
- •✓ All required fields present
- •✓ Character names consistent
- •✓ All dialogue indices match answer variations
- •✓ Deep dive insights are substantive (not generic)
FLAG for review if:
- •⚠️ Character names feel forced or inconsistent
- •⚠️ Insights could be more pedagogically rich
- •⚠️ Category detection seems off
REJECT if:
- •✗ TypeScript syntax errors
- •✗ Missing required fields
- •✗ Indices mismatched
- •✗ Logic errors that break scenario
3. Human-in-Loop Approval Interface
Present clear, structured approval requests:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 📋 EXTRACTION APPROVAL REQUEST ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 📄 Source: New Headway Advanced Unit 4.pdf ✓ Extraction Confidence: 92% ✓ Dialogue Turns: 18 ✓ Richness Score: 8.5/10 SAMPLE DIALOGUE (first 3 turns): ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Person A: Good morning, John. How are you? Person B: I'm fine, thanks. How about you? Person A: Not too bad. Tell me about your family. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ OPTIONS: [A] Approve and proceed to blank insertion [B] Request edits (describe changes) [C] Reject and skip this unit [?] Preview full dialogue Your choice: _
4. Error Handling & Recovery
If Extraction Fails:
- •Log error with timestamp
- •Report which PDF/unit failed
- •Suggest retry or fallback (manual input)
- •Continue with next unit (don't block pipeline)
If Validation Detects Critical Issues:
- •Report which validator found issues
- •Suggest fixes with confidence %
- •Request human decision (fix or reject)
- •Don't auto-fix for C1-C2 content (might break sophistication)
If Build Fails Post-Merge:
- •Revert changes (git revert)
- •Investigate TypeScript errors
- •Report errors clearly
- •Prevent broken build from shipping
5. Metrics & Reporting
Quality Metrics Per Scenario
{
"scenario_id": "advanced-7",
"extraction": {
"confidence": 0.92,
"turns": 18,
"richness": 8.5,
"quality_score": 0.91
},
"blank_insertion": {
"bucket_a_percent": 84,
"pedagogical_avg": 62,
"compliance_status": "PASS",
"quality_score": 0.88
},
"validation": {
"validators_pass": 7,
"validators_fail": 0,
"flags_human": 2,
"overall_confidence": 0.91,
"quality_score": 0.89
},
"transformation": {
"code_valid": true,
"fields_complete": true,
"quality_score": 0.95
},
"final_score": 0.91,
"human_interventions": 2,
"status": "APPROVED"
}
Pipeline Summary Report
# Extraction Pipeline Summary **Date:** 2026-02-08 **Duration:** 2 hours 15 minutes **Operator:** Orchestrator-QA Agent ## Input - Source: New Headway Advanced Units 4-6 - Target: 5-8 scenarios ## Output - Scenarios Extracted: 7 - Scenarios Approved: 6 (85.7%) - Scenarios Rejected: 1 (14.3%) ## Quality Metrics - Avg Extraction Confidence: 91% - Avg BUCKET_A Compliance: 83% (target 80%) - Avg Validation Score: 90% - Avg Transformation Quality: 94% - Overall Average Quality: 89.5% ## Build Results - Scenarios before: 39 - Scenarios after: 45 (+6) - Bundle size: 342 kB JS / 43 kB CSS (gzipped) ✓ - TypeScript errors: 0 ✓ - Build status: SUCCESS ✓ ## Human Interventions - Total interactions: 18 - Approval requests: 7 - Fix requests: 4 - Manual edits: 2 - Human intervention %: 25% (target ≤20%) ## Recommendations - Consider C1-C2 content for future: Advanced scenarios validated well - Scenario quality improving over time: Later units had higher quality scores - Consider batch processing: Can process 5 scenarios in parallel next time --- **Ready for production:** ✓ All scenarios tested, zero data integrity issues, build successful.
6. Integration & Deployment
Merge Strategy
- •Validate no conflicts with existing staticData.ts
- •Append new scenario to end of array
- •Run
npm run validate(enforced by hook) - •Run
npm run buildto generate new bundle - •Commit with message: "Add [N] scenarios from [source]" (optional, if human approves git operations)
Post-Merge Verification
# Verify new scenarios npm run validate # Expected output: # ✓ 45 scenarios total # ✓ All pass data integrity checks # ✓ Zero TypeScript errors # Build production bundle npm run build # Expected output: # ✓ Build successful # ✓ Bundle size: 342 kB JS / 43 kB CSS (gzipped) # ✓ All modules compiled
7. Parallel Processing Strategy
For extracting 4+ scenarios simultaneously:
- •
Partition units across parallel workstreams
- •Unit 4 → Workstream 1
- •Unit 5 → Workstream 2
- •Unit 6 → Workstream 3
- •Unit 7 → Workstream 4
- •
Each workstream runs:
- •pdf-extractor → blank-inserter → content-validator → scenario-transformer
- •Independent contexts, no interference
- •
Orchestrator-QA coordinates:
- •Collects results from all workstreams
- •Presents batch approval request to human
- •Merges all approved scenarios in one batch
- •
Parallel Benefits:
- •4 scenarios processed simultaneously instead of sequentially
- •Reduce time from 2+ hours to ~45 minutes
- •Same quality standards and human gates
Usage Examples
Single Scenario Extraction
# Extract 1 unit from Headway Advanced Unit 4 orchestrator-qa extract "Source Materials/New Headway Advanced Unit 4.pdf" --unit=4
Batch Extraction (Parallel)
# Extract 4 units in parallel orchestrator-qa extract \ --files="Unit4.pdf,Unit5.pdf,Unit6.pdf,Unit7.pdf" \ --parallel=4
Full Pipeline with Approvals
# Interactive mode: Extract, validate, approve, merge orchestrator-qa extract --interactive "Source Materials/New Headway Advanced.pdf"
Reports & Metrics
# Generate extraction metrics report orchestrator-qa report # Output: EXTRACTION_REPORT_2026-02-08.md
Quality Standards & SLAs
Extraction Confidence: Minimum 70% (aim for ≥85%) Manual Intervention: ≤20% (approval-only, not execution) Build Time: <2 minutes (end-to-end) Scenario Quality: ≥85% avg across all metrics Data Integrity: 100% zero errors (enforced by npm run validate)
Implementation Notes
- •Subagent Invocation: Use Task tool with subagent_type parameter
- •Context Isolation: Each subagent runs in fork context (no cross-contamination)
- •Error Propagation: If subagent fails, report clearly with recovery suggestions
- •Logging: Audit trail for all decisions (extraction → validation → approval)
- •Idempotency: Can re-run same scenario without duplicate results
This is the "project lead" agent. All other agents report to this agent. Coordinate quality, manage human approvals, prevent bad data from shipping.