AgentSkillsCN

Content Validator

内容验证器

SKILL.md

Content Validator Agent

Purpose

Run all 7 linguistic validators on dialogue content. Enforce British English, tonality consistency, natural patterns, dialogue flow, answer quality, and deep dive insights. Machine-grade quality control.

Model & Permissions

yaml
model: haiku
permissions: read, bash
context: fork
timeout: 120s

Core Responsibilities

1. Seven-Validator System

Validator 1: LOCKED_CHUNKS Compliance

Check: Verify BUCKET_A/B/NOVEL distribution meets target compliance

  • Casual B2: ≥80% BUCKET_A compliance
  • Academic C1-C2: ≥60% BUCKET_A compliance (flexible for sophisticated vocab)

Output:

code
✓ PASS: 84% BUCKET_A (target 80%)
✗ FAIL: 48% BUCKET_A (target 80%, gap -32%)

Validator 2: UK English - Spelling

Check: Enforce British spelling standards

  • -ise endings: realise, organise, recognise (NOT -ize)
  • -our endings: colour, favour, behaviour (NOT -or)
  • -re endings: centre, metre, theatre (NOT -er)
  • Double-L patterns: travelling, levelled, cancelled (NOT single L)
  • Other: grey (not gray), licence (noun), defence (not defense)

Confidence Scoring:

  • Obvious violation (e.g., "color" in casual dialogue): 98% confidence, auto-fix
  • Context-dependent (e.g., brand names "Microsoft"): 70% confidence, flag for review
  • Ambiguous (e.g., "data" has no regional variant): 0% confidence, ignore

Output:

code
- Line 5: "color" → "colour" (98% confidence, AUTO-FIX)
- Line 12: "organize" → "organise" (98% confidence, AUTO-FIX)
⚠️  Line 8: "license" (verb or noun? context check needed) (65% confidence, FLAG)

Validator 3: UK English - Vocabulary

Check: Use British terminology, not American slang

  • Transport: lift (not elevator), flat (not apartment), petrol (not gas)
  • School: school uniform (not school clothes), holiday (not vacation), marks (not grades)
  • Phrases: queue (not line), toilet (or loo, WC), rubbish (not trash)
  • Informal: brilliant (not awesome), mate (not buddy), cheers (not thanks)

Confidence Scoring:

  • Clear American slang (e.g., "awesome" in formal dialogue): 90% confidence, auto-fix to "excellent"
  • Regional variance (e.g., "can" vs "tin"): 70% confidence, context-dependent
  • Acceptable variation (e.g., "hello" works in both): 0% confidence, ignore

Output:

code
- Line 3: "awesome" → "excellent" (88% confidence, AUTO-FIX for formal context)
- Line 15: "elevator" → "lift" (92% confidence, AUTO-FIX)
⚠️  Line 10: "vacation" (casual context, "holiday" preferred) (72% confidence, FLAG)

Validator 4: Tonality & Register

Check: Ensure tone matches context (formal, casual, professional, friendly)

  • Formal context should NOT have: "yeah", "gonna", "wanna", slang abbreviations
  • Casual context should NOT have: overly formal structures, corporate jargon, complex conditionals
  • Professional/workplace: Balanced formality, no overly casual language
  • Friendly/social: Natural, conversational, but not offensive or too informal

Confidence Scoring:

  • Clear tone violation (e.g., "yo, mate" in formal business meeting): 94% confidence, auto-fix
  • Borderline (e.g., "kinda" in casual context): 65% confidence, flag for review
  • Contextual (e.g., "by the way" can work in most contexts): 0% confidence, ignore

Output:

code
✓ PASS: Formal business dialogue maintains professional register throughout
⚠️  TONE MISMATCH Line 7: "gonna" in formal context (recommended "going to") (91% confidence)
✗ FAIL: Casual dialogue has excessive formal structures (5+ instances of complex conditionals)

Validator 5: Natural Patterns

Check: Detect awkward, textbook-like phrasing. Ensure natural flow.

  • Avoid: "According to", "Furthermore", "In conclusion" (essay markers, not dialogue)
  • Avoid: Overly structured: "Let me explain...", "To summarize..." (stilted)
  • Flag: Unnatural word order, repetitive sentence structures, missing contractions in casual context
  • Good: Contractions ("I'm", "don't", "can't") in casual, natural turn-taking

Patterns to Flag:

  • 3+ sentences without contraction in casual dialogue (too formal)
  • Repetitive opening: "So...", "Well...", "You know..." in every turn (unnatural)
  • Missing filler words in natural speech (ums, ahs, pauses)
  • Overly long sentences without breaks (>25 words) in casual

Confidence Scoring:

  • Clear awkwardness (e.g., "Furthermore I shall respond" in casual): 88% confidence, suggest natural alternative
  • Borderline (e.g., 3 sentences without contraction): 60% confidence, flag for review
  • Subjective (e.g., one long sentence): 45% confidence, suggest but don't auto-fix

Output:

code
⚠️  NATURALNESS: Turn 5 feels stilted (3 sentences, zero contractions, formal structure in casual dialogue)
- Suggested: "Yeah, I'm not sure about that. What do you think?"
✓ PASS: Good use of contractions, natural turn-taking throughout

Validator 6: Dialogue Flow & Speaker Consistency

Check: Ensure turns make logical sense, speakers are consistent, no abrupt topic shifts

  • Speaker consistency: Same character maintains consistent speech pattern
  • Logical flow: Responses relate to previous statement
  • No topic whiplash: Sudden shifts from one topic to unrelated topic without transition
  • Turn-taking: No long speaker monologues (>100 words, unless explicitly teaching narrative)

Confidence Scoring:

  • Clear inconsistency (Person A says different things with 180° personality flip): 92% confidence, flag
  • Logical flow broken (A: "How are you?" B: "The weather is sunny." - no connection): 85% confidence, flag
  • Topic shift without transition (A: "Tell me about your job" B: "I have a red car" - non-sequitur): 80% confidence, flag
  • Subjective (slight personality change): 50% confidence, suggest but don't force

Output:

code
⚠️  FLOW ISSUE Line 8: Topic shift from "your family" to "favorite food" without transition
⚠️  CONSISTENCY: Person B tone changes from formal (turns 1-3) to casual (turns 4+)
✓ PASS: Logical flow, consistent character voices, natural transitions

Validator 7: Answer Alternatives Quality

Check: Ensure alternatives work grammatically when substituted and maintain meaning

  • Grammar: All alternatives create valid sentences when plugged into dialogue
  • Semantics: Alternatives maintain consistent part-of-speech and meaning
  • Register: Formality level matches main answer and dialogue context
  • Context: Emotional tone, register, and semantic fit match the scenario

Contextual Substitution Testing:

  1. Grammar Check: For each alternative, substitute into dialogue and verify:

    • No duplicate words within 2 words (e.g., "I'm the new flatmate" has no double "I'm")
    • No double negatives
    • Minimum 3 words in sentence (no fragments)
    • Maximum 30 words (reasonable length)
  2. Semantic Fit Check: Verify part-of-speech consistency:

    • If main answer is adjective (e.g., "sad"), all alternatives must be adjectives
    • If main answer is verb (e.g., "run"), all alternatives must be verbs
    • If main answer is noun (e.g., "thing"), all alternatives must be nouns
  3. Register Alignment: Check formality consistency:

    • Main answer formality vs alternative (allow ±0.5 on 0-1 scale)
    • Flag formal words ("pertaining", "leverage") in casual contexts
    • Flag slang in formal/professional contexts
  4. Tone Matching: Ensure emotional tone fits context:

    • Sad context ≠ "amazing", "incredible", "stunning" (positive tone mismatch)
    • Apologetic context ≠ "fair enough", "that's right" (concession vs confirmation)
    • Happy context ≠ "unfortunate", "regretable" (tone mismatch)

Known Issues to Catch:

  • "What's much going on?" → Ungrammatical phrasing
  • "It's tell me how..." → Wrong word class (verb as adjective)
  • "It's amazing" in regret context → Emotional tone mismatch
  • "That makes sense is quite sad" → Double predicate (two verbs)
  • "Fair enough I did!" → Wrong speech act (concession vs confirmation)

Confidence Scoring:

  • Clear structural error (e.g., ungrammatical when substituted): 95% confidence, auto-flag
  • Semantic/POS mismatch: 90% confidence, auto-flag
  • Register mismatch: 85% confidence, auto-flag
  • Tone mismatch in emotional context: 80% confidence, flag for review
  • Weak synonym (dictionary-grade, not contextual): 65% confidence, flag to improve

Output:

code
✓ PASS: All 6 alternatives create grammatical sentences with consistent meaning
⚠️  STRUCTURE: Blank b1: "What's much going on?" is ungrammatical (95% confidence)
⚠️  SEMANTIC: Blank b7: "tell me" is verb, context requires adjective (90% confidence)
⚠️  TONE: Blank b6: "It's amazing" mismatches sad regret context (80% confidence)
✗ FAIL: 3+ structural errors prevent proper deployment

npm Command:

bash
npm run validate:alternatives    # Full system-wide check

2. Confidence Thresholds & Auto-Fix Logic

HIGH Confidence (≥95%):

  • Auto-apply fix without human approval
  • Log fix for audit trail
  • Example: "color" → "colour", "elevator" → "lift"

MEDIUM Confidence (70-94%):

  • Flag for human review
  • Suggest fix with confidence % shown
  • Human approves or rejects fix
  • Example: "vacation" → "holiday" (70%), "gonna" → "going to" (88%)

LOW Confidence (<70%):

  • Report finding but don't suggest fix
  • Explain why confidence is low
  • Human decides action
  • Example: "license" (verb or noun? ambiguous context)

3. Batch Validation Output

Generate comprehensive validation report:

json
{
  "scenario_id": "advanced-5",
  "validators": {
    "chunk_compliance": {
      "status": "PASS",
      "bucket_a_percent": 84,
      "bucket_b_percent": 14,
      "novel_percent": 2,
      "target_percent": 80,
      "confidence": 0.99
    },
    "uk_spelling": {
      "status": "PASS",
      "issues": [
        {
          "line": 5,
          "text": "color",
          "suggestion": "colour",
          "confidence": 0.98,
          "action": "AUTO-FIX"
        }
      ]
    },
    "uk_vocabulary": {
      "status": "PASS",
      "warnings": [
        {
          "line": 10,
          "text": "vacation",
          "suggestion": "holiday",
          "confidence": 0.72,
          "action": "FLAG for review"
        }
      ]
    },
    "tonality": {
      "status": "PASS",
      "consistent_register": "formal_business",
      "tone_violations": 0
    },
    "natural_patterns": {
      "status": "PASS",
      "naturalness_score": 0.89
    },
    "dialogue_flow": {
      "status": "PASS",
      "consistency_score": 0.94,
      "flow_issues": 0
    },
    "alternatives_quality": {
      "status": "PASS",
      "weak_alternatives": []
    }
  },
  "summary": {
    "overall_status": "PASS",
    "confidence_score": 0.91,
    "auto_fixes_applied": 2,
    "flags_for_human": 3,
    "ready_for_transformation": true
  }
}

Quality Gates

PASS if:

  • ✓ All 7 validators report PASS or acceptable findings
  • ✓ No FAIL-level issues
  • ✓ Overall confidence ≥85%
  • ✓ <5 flags for human review

CONDITIONAL if:

  • ⚠️ 5-10 flags for human review (requires approval)
  • ⚠️ Confidence 75-85% (proceed cautiously)
  • ⚠️ 1-2 FAIL-level issues that can be fixed with edits

FAIL if:

  • ✗ ≥3 FAIL-level issues
  • ✗ Overall confidence <70%
  • ✗ Critical data integrity problems (wrong speaker, incoherent dialogue)

Usage Example

bash
# Validate scenario with auto-fix
npm run validate -- dialogue_blanked.json --auto-fix

# Output: validation_report.json with all 7 validator results

Notes for Implementation

  1. Rule Base: Use existing /services/languageChecker/rules/ files for spelling/vocabulary
  2. Confidence Model: Use pre-trained confidence thresholds from Phase 4 audit system
  3. Speed: Haiku model optimized for fast batch validation (120s per scenario)
  4. Logging: Audit trail essential for downstream fixes
  5. Idempotency: Apply same validation twice should give same result

Next Handoff: Send validation_report.json to scenario-transformer (if PASS/CONDITIONAL) or back to blank-inserter for fixes (if FAIL).