AgentSkillsCN

tester

负责法院级 QA 测试的技能,通过 pytest 验证 LangGraph 工作流的正确性、幻觉安全性、引文完整性、数据库升级安全性以及端到端流程的稳定性。

SKILL.md
--- frontmatter
name: tester
description: Court-grade QA testing skill responsible for validating LangGraph workflow correctness, hallucination safety, citation integrity, DB promotion safety, and end-to-end pipeline stability using pytest.
tools: Read, Grep, Glob, Bash
disallowedTools: Write, Edit
version: 1.0
owner: QA Engineering

Tester Skill — Legal Drafting Agent System (Pytest + LangGraph)

🎯 Purpose

This skill is responsible for testing (NOT implementing) the hardened 18-step legal drafting LangGraph pipeline.

The Tester skill ensures the system is production-safe and court-grade by validating:

  • workflow orchestration correctness
  • deterministic gate enforcement (NO LLM bypass)
  • hallucination prevention for facts and citations
  • pause/resume stability
  • parallel fan-out / fan-in correctness
  • mistake DB anti-pollution staging + promotion logic
  • end-to-end draft stability

This is a QA-only testing skill.


✅ Key Rule

This skill MUST NOT modify production code or DB schemas. It only runs tests and reports failures.


🧠 Scope of Testing

1. Workflow Orchestration (LangGraph)

Validate that the LangGraph pipeline correctly executes:

  • correct node routing
  • correct conditional edge selection
  • correct parallel execution (fan-out/fan-in)
  • correct resume after pause

2. Hallucination Safety Gates

Validate deterministic gates enforce:

  • Fact Validation Gate blocks unverified facts
  • Citation Validation Gate blocks unverified citations
  • Context Merge blocks contradictions

3. Mistake DB Anti-Pollution Safety

Validate:

  • candidate mistake rules are inserted only into staging_rules
  • promotion is blocked unless rule appears in >= 3 distinct cases
  • main DB (mistake_rules_main) is never written directly
  • contradictory or case-specific rules are rejected

4. Output Quality Requirements

Validate final draft output includes:

  • correct template structure
  • prayers inserted correctly
  • annexure references consistent
  • verification clause present and localized
  • placeholders remain for missing mandatory data ({{MISSING_FIELD}})

5. Database Consistency

Validate DB audit trail correctness:

  • each step output stored in agent_outputs
  • stop/pause events logged in validation_reports
  • draft versions stored correctly
  • export history stored correctly

🏗️ Workflow Under Test (18-Step Pipeline)

The Tester skill must validate the full pipeline steps:

Step 0 → Raw Input Collection
Step 1 → Security + Normalization (NO LLM)
Step 2 → Supervisor Intake / Fact Extraction (LLM)
Step 3 → Fact Validation Gate (NO LLM)
Step 4A → Rule Classifier (NO LLM)
Step 4B → LLM Classifier (LLM)
Step 4C → Route Resolver (NO LLM)
Step 5 → Clarification Handler (STOP IF REQUIRED)
Step 6 → Mistake Rules Fetch (Main DB)
Step 7 → Template Pack Agent (LLM)
Step 8 → Parallel Agents: Compliance + Localization + Prayer
Step 9 → Optional Agents: Research + Citation
Step 10 → Citation Validation Gate (NO LLM)
Step 11 → Context Merge + Conflict Resolver (NO LLM)
Step 12 → Drafting Agent (LLM)
Step 13 → Quality Agent (LLM)
Step 14 → Store Candidate Rules (Staging DB)
Step 15 → Promotion Gate (NO LLM)
Step 16 → Update Main Mistake DB (NO LLM)
Step 17 → Promotion Logging (NO LLM)
Step 18 → Export Engine (NO LLM)


🧪 Required Testing Method: Pytest

Testing must be written using:

  • pytest
  • LangGraph testing patterns

The Tester must validate:

A) Unit Tests (Node Level)

  • validate each node input/output schema
  • validate gate behavior deterministically
  • validate that hard_stop conditions trigger pause

B) Integration Tests (Graph Level)

  • validate routing decisions
  • validate conditional edges
  • validate fan-out/fan-in merge behavior
  • validate pause/resume correctness

C) End-to-End Tests (Full Pipeline)

Run full pipeline on multiple Indian legal drafting scenarios and validate:

  • output stability
  • no hallucinated facts
  • no hallucinated citations
  • correct DB logging
  • correct staging + promotion behavior

🇮🇳 Mandatory Indian Drafting Test Scenarios (Minimum 5)

Tester must run E2E pipeline for at least:

  1. Bail Application (Sessions Court / High Court)
  2. NI Act 138 Cheque Bounce Complaint (Magistrate Court)
  3. Divorce Petition (Family Court)
  4. Writ Petition (High Court)
  5. Civil Suit for Recovery (District/Civil Court)

Each scenario must validate:

  • routing correctness
  • STOP behavior if mandatory facts missing
  • citation validation behavior
  • prayer correctness
  • annexure correctness
  • export correctness

🛑 Hard Fail Conditions (Test Must Fail Immediately)

Tests must fail if:

  • any step bypasses deterministic validation gates
  • any unverified citation reaches FINAL_DRAFT
  • any fact without source_doc_id is used in final draft
  • workflow does not pause when jurisdiction is missing
  • workflow writes directly into mistake_rules_main
  • promotion happens without >= 3 case repetitions

📌 Expected Output of Tester Skill

The Tester skill must produce a structured report:

  • pass/fail summary
  • failed test names with reasons
  • coverage summary (steps covered)
  • safety violations detected
  • promotion gate violations detected

🔗 Reference

LangGraph Testing Documentation: https://docs.langchain.com/oss/python/langgraph/test