Tester Skill — Legal Drafting Agent System (Pytest + LangGraph)
🎯 Purpose
This skill is responsible for testing (NOT implementing) the hardened 18-step legal drafting LangGraph pipeline.
The Tester skill ensures the system is production-safe and court-grade by validating:
- •workflow orchestration correctness
- •deterministic gate enforcement (NO LLM bypass)
- •hallucination prevention for facts and citations
- •pause/resume stability
- •parallel fan-out / fan-in correctness
- •mistake DB anti-pollution staging + promotion logic
- •end-to-end draft stability
This is a QA-only testing skill.
✅ Key Rule
This skill MUST NOT modify production code or DB schemas. It only runs tests and reports failures.
🧠 Scope of Testing
1. Workflow Orchestration (LangGraph)
Validate that the LangGraph pipeline correctly executes:
- •correct node routing
- •correct conditional edge selection
- •correct parallel execution (fan-out/fan-in)
- •correct resume after pause
2. Hallucination Safety Gates
Validate deterministic gates enforce:
- •Fact Validation Gate blocks unverified facts
- •Citation Validation Gate blocks unverified citations
- •Context Merge blocks contradictions
3. Mistake DB Anti-Pollution Safety
Validate:
- •candidate mistake rules are inserted only into
staging_rules - •promotion is blocked unless rule appears in >= 3 distinct cases
- •main DB (
mistake_rules_main) is never written directly - •contradictory or case-specific rules are rejected
4. Output Quality Requirements
Validate final draft output includes:
- •correct template structure
- •prayers inserted correctly
- •annexure references consistent
- •verification clause present and localized
- •placeholders remain for missing mandatory data (
{{MISSING_FIELD}})
5. Database Consistency
Validate DB audit trail correctness:
- •each step output stored in
agent_outputs - •stop/pause events logged in
validation_reports - •draft versions stored correctly
- •export history stored correctly
🏗️ Workflow Under Test (18-Step Pipeline)
The Tester skill must validate the full pipeline steps:
Step 0 → Raw Input Collection
Step 1 → Security + Normalization (NO LLM)
Step 2 → Supervisor Intake / Fact Extraction (LLM)
Step 3 → Fact Validation Gate (NO LLM)
Step 4A → Rule Classifier (NO LLM)
Step 4B → LLM Classifier (LLM)
Step 4C → Route Resolver (NO LLM)
Step 5 → Clarification Handler (STOP IF REQUIRED)
Step 6 → Mistake Rules Fetch (Main DB)
Step 7 → Template Pack Agent (LLM)
Step 8 → Parallel Agents: Compliance + Localization + Prayer
Step 9 → Optional Agents: Research + Citation
Step 10 → Citation Validation Gate (NO LLM)
Step 11 → Context Merge + Conflict Resolver (NO LLM)
Step 12 → Drafting Agent (LLM)
Step 13 → Quality Agent (LLM)
Step 14 → Store Candidate Rules (Staging DB)
Step 15 → Promotion Gate (NO LLM)
Step 16 → Update Main Mistake DB (NO LLM)
Step 17 → Promotion Logging (NO LLM)
Step 18 → Export Engine (NO LLM)
🧪 Required Testing Method: Pytest
Testing must be written using:
- •pytest
- •LangGraph testing patterns
The Tester must validate:
A) Unit Tests (Node Level)
- •validate each node input/output schema
- •validate gate behavior deterministically
- •validate that hard_stop conditions trigger pause
B) Integration Tests (Graph Level)
- •validate routing decisions
- •validate conditional edges
- •validate fan-out/fan-in merge behavior
- •validate pause/resume correctness
C) End-to-End Tests (Full Pipeline)
Run full pipeline on multiple Indian legal drafting scenarios and validate:
- •output stability
- •no hallucinated facts
- •no hallucinated citations
- •correct DB logging
- •correct staging + promotion behavior
🇮🇳 Mandatory Indian Drafting Test Scenarios (Minimum 5)
Tester must run E2E pipeline for at least:
- •Bail Application (Sessions Court / High Court)
- •NI Act 138 Cheque Bounce Complaint (Magistrate Court)
- •Divorce Petition (Family Court)
- •Writ Petition (High Court)
- •Civil Suit for Recovery (District/Civil Court)
Each scenario must validate:
- •routing correctness
- •STOP behavior if mandatory facts missing
- •citation validation behavior
- •prayer correctness
- •annexure correctness
- •export correctness
🛑 Hard Fail Conditions (Test Must Fail Immediately)
Tests must fail if:
- •any step bypasses deterministic validation gates
- •any unverified citation reaches FINAL_DRAFT
- •any fact without source_doc_id is used in final draft
- •workflow does not pause when jurisdiction is missing
- •workflow writes directly into
mistake_rules_main - •promotion happens without >= 3 case repetitions
📌 Expected Output of Tester Skill
The Tester skill must produce a structured report:
- •pass/fail summary
- •failed test names with reasons
- •coverage summary (steps covered)
- •safety violations detected
- •promotion gate violations detected
🔗 Reference
LangGraph Testing Documentation: https://docs.langchain.com/oss/python/langgraph/test