Tester Skill — Legal Drafting Agent System (Pytest + LangGraph)

🎯 Purpose

This skill is responsible for testing (NOT implementing) the hardened 18-step legal drafting LangGraph pipeline.

The Tester skill ensures the system is production-safe and court-grade by validating:

•workflow orchestration correctness
•deterministic gate enforcement (NO LLM bypass)
•hallucination prevention for facts and citations
•pause/resume stability
•parallel fan-out / fan-in correctness
•mistake DB anti-pollution staging + promotion logic
•end-to-end draft stability

This is a QA-only testing skill.

✅ Key Rule

This skill MUST NOT modify production code or DB schemas. It only runs tests and reports failures.

🧠 Scope of Testing

1. Workflow Orchestration (LangGraph)

Validate that the LangGraph pipeline correctly executes:

•correct node routing
•correct conditional edge selection
•correct parallel execution (fan-out/fan-in)
•correct resume after pause

2. Hallucination Safety Gates

Validate deterministic gates enforce:

•Fact Validation Gate blocks unverified facts
•Citation Validation Gate blocks unverified citations
•Context Merge blocks contradictions

3. Mistake DB Anti-Pollution Safety

Validate:

•candidate mistake rules are inserted only into staging_rules
•promotion is blocked unless rule appears in >= 3 distinct cases
•main DB (mistake_rules_main) is never written directly
•contradictory or case-specific rules are rejected

4. Output Quality Requirements

Validate final draft output includes:

•correct template structure
•prayers inserted correctly
•annexure references consistent
•verification clause present and localized
•placeholders remain for missing mandatory data ({{MISSING_FIELD}})

5. Database Consistency

Validate DB audit trail correctness:

•each step output stored in agent_outputs
•stop/pause events logged in validation_reports
•draft versions stored correctly
•export history stored correctly

🏗️ Workflow Under Test (18-Step Pipeline)

The Tester skill must validate the full pipeline steps:

Step 0 → Raw Input Collection
Step 1 → Security + Normalization (NO LLM)
Step 2 → Supervisor Intake / Fact Extraction (LLM)
Step 3 → Fact Validation Gate (NO LLM)
Step 4A → Rule Classifier (NO LLM)
Step 4B → LLM Classifier (LLM)
Step 4C → Route Resolver (NO LLM)
Step 5 → Clarification Handler (STOP IF REQUIRED)
Step 6 → Mistake Rules Fetch (Main DB)
Step 7 → Template Pack Agent (LLM)
Step 8 → Parallel Agents: Compliance + Localization + Prayer
Step 9 → Optional Agents: Research + Citation
Step 10 → Citation Validation Gate (NO LLM)
Step 11 → Context Merge + Conflict Resolver (NO LLM)
Step 12 → Drafting Agent (LLM)
Step 13 → Quality Agent (LLM)
Step 14 → Store Candidate Rules (Staging DB)
Step 15 → Promotion Gate (NO LLM)
Step 16 → Update Main Mistake DB (NO LLM)
Step 17 → Promotion Logging (NO LLM)
Step 18 → Export Engine (NO LLM)

🧪 Required Testing Method: Pytest

Testing must be written using:

•pytest
•LangGraph testing patterns

The Tester must validate:

A) Unit Tests (Node Level)

•validate each node input/output schema
•validate gate behavior deterministically
•validate that hard_stop conditions trigger pause

B) Integration Tests (Graph Level)

•validate routing decisions
•validate conditional edges
•validate fan-out/fan-in merge behavior
•validate pause/resume correctness

C) End-to-End Tests (Full Pipeline)

Run full pipeline on multiple Indian legal drafting scenarios and validate:

•output stability
•no hallucinated facts
•no hallucinated citations
•correct DB logging
•correct staging + promotion behavior

🇮🇳 Mandatory Indian Drafting Test Scenarios (Minimum 5)

Tester must run E2E pipeline for at least:

•Bail Application (Sessions Court / High Court)
•NI Act 138 Cheque Bounce Complaint (Magistrate Court)
•Divorce Petition (Family Court)
•Writ Petition (High Court)
•Civil Suit for Recovery (District/Civil Court)

Each scenario must validate:

•routing correctness
•STOP behavior if mandatory facts missing
•citation validation behavior
•prayer correctness
•annexure correctness
•export correctness

🛑 Hard Fail Conditions (Test Must Fail Immediately)

Tests must fail if:

•any step bypasses deterministic validation gates
•any unverified citation reaches FINAL_DRAFT
•any fact without source_doc_id is used in final draft
•workflow does not pause when jurisdiction is missing
•workflow writes directly into mistake_rules_main
•promotion happens without >= 3 case repetitions

📌 Expected Output of Tester Skill

The Tester skill must produce a structured report:

•pass/fail summary
•failed test names with reasons
•coverage summary (steps covered)
•safety violations detected
•promotion gate violations detected

🔗 Reference

LangGraph Testing Documentation: https://docs.langchain.com/oss/python/langgraph/test