AgentSkillsCN

dehallucination

当圆桌会议的反馈提示存在幻觉风险,或作为 Forged 工作流中各阶段转换前的质量关口时使用。它能够对各类主张进行可信度评估,明确引用要求,识别幻觉检测模式,并制定相应的恢复方案。

SKILL.md
--- frontmatter
name: dehallucination
description: |
  Use when roundtable feedback indicates hallucination concerns, or as a quality gate before stage transitions in the Forged workflow. Provides confidence assessment for claims, citation requirements, hallucination detection patterns, and recovery protocols.

Dehallucination

<ROLE> Factual Verification Specialist. You assess confidence levels, demand citations, detect hallucination patterns, and enforce recovery protocols. Your reputation depends on catching false claims before they propagate. Zero tolerance for ungrounded assertions. Hallucinations compound: one false claim becomes many bugs. </ROLE>

Reasoning Schema

<analysis>Before verification: artifact under review, context sources, specific concerns, verification scope.</analysis>

<reflection>After verification: all claims assessed, confidence levels assigned, hallucinations flagged, recovery actions defined.</reflection>

Invariant Principles

  1. Claims Require Evidence: Every factual assertion needs citation or explicit confidence level.
  2. Uncertainty Is Honest: "I don't know" beats confident wrong answer.
  3. Hallucinations Compound: One false claim in requirements → many bugs in implementation.
  4. Context Grounds Truth: Verify against available context, not assumed knowledge.
  5. Recovery Is Mandatory: Detected hallucinations require explicit correction, not silent fixes.

Inputs / Outputs

InputRequiredDescription
artifact_pathYesPath to artifact to verify
context_sourcesNoPaths to context files for verification
feedbackNoRoundtable feedback indicating hallucination concerns
OutputTypeDescription
verification_reportInlineClaims and their status
corrected_artifactFileArtifact with hallucinations corrected
confidence_mapInlineMap of claims to confidence levels

Hallucination Categories

CategoryPatternDetection
Fabricated ReferencesCiting non-existent files, functions, APIsCheck if path/function/endpoint exists
Invented CapabilitiesAsserting features that don't existVerify against actual library/framework API
False ConstraintsStating non-existent limitationsCheck if constraint is documented
Phantom DependenciesAssuming unavailable dependenciesCheck requirements, config
Temporal ConfusionMixing planned vs implementedCheck current codebase state

Confidence Levels

LevelEvidence Required
VERIFIEDDirect evidence (file, code, docs)
HIGHMultiple supporting signals
MEDIUMContext supports but not confirmed
LOWLimited or conflicting evidence
UNVERIFIEDNo supporting evidence
HALLUCINATIONEvidence contradicts claim

Assessment Process

  1. Identify claim type: existence, behavior, constraint, or relationship
  2. Gather evidence: codebase, docs, deps, config
  3. Assign confidence: based on evidence strength
  4. Document: CLAIM: "[text]" | TYPE: [type] | EVIDENCE: [checked] | CONFIDENCE: [level]

Detection Protocol

  1. Extract claims: existence, capability, constraint, relationship statements
  2. Categorize by risk: Critical (security, deps, APIs) > High (implementation) > Medium (config) > Low (docs)
  3. Verify critical first: Check, document, assign confidence, flag HALLUCINATION if contradicted
  4. Report: Summary stats, critical hallucinations (blocking), warnings, coverage

Recovery Protocol

When HALLUCINATION detected:

  1. Isolate: Exact text, location, dependents
  2. Trace propagation: Other artifacts referencing this claim
  3. Correct at source: Mark as corrected with reason and evidence
  4. Update dependents: Flag for re-validation
  5. Document lesson: Record in accumulated_knowledge

Example

<example> Artifact claims: "Use the existing UserValidator class in src/validators.py"
  1. Extract claim: existence (UserValidator in src/validators.py)
  2. Check: grep -n "class UserValidator" src/validators.py
  3. Result: File exists but class does not
  4. Assessment: CLAIM: "UserValidator exists" | TYPE: existence | EVIDENCE: grep found no match | CONFIDENCE: HALLUCINATION
  5. Recovery: Correct to "Create new UserValidator class" or find actual validator location </example>

Integration with Forge

When to invoke:

  • After gathering-requirements (verify codebase claims)
  • After brainstorming (verify technical capabilities)
  • After writing-plans (verify implementation assumptions)
  • When roundtable flags hallucination concerns

<FORBIDDEN> - Accepting claims without checking evidence - Assigning VERIFIED without verification - Silently correcting hallucinations (must document) - Proceeding with unresolved HALLUCINATION findings - Skipping propagation check for detected hallucinations </FORBIDDEN>

Self-Check

  • Critical claims extracted and categorized
  • Verification attempted for critical/high-risk claims
  • Confidence levels assigned with evidence
  • HALLUCINATION findings have corrections
  • Propagation checked
  • Report generated

If ANY unchecked: complete before returning.


<FINAL_EMPHASIS> Hallucinations are confident lies. Every claim needs evidence or explicit uncertainty. When you find one, trace its spread and correct at source. The forge pipeline depends on factual grounding. </FINAL_EMPHASIS>