Behavior Diagnosing
Analyze AI tooling output against its source instructions to identify why behavior deviated from intent, then generate targeted introspection questions for the misbehaving session.
Workflow
Phase 1: Receive Input
The user pastes unsatisfying output, optionally with a description of what they expected instead. Extract and hold:
- •The actual output or behavior being questioned
- •Any stated expectations ("it should have done X")
- •Implicit expectations derivable from session context (what was the user working on?)
Phase 2: Identify the Tooling
Determine which skill, agent, subagent, or command produced the output. Use session context — recently discussed files, working directory, conversation history — to infer this. If the tooling cannot be identified with confidence, ask the user.
Phase 3: Read Source Instructions
Load everything that defines the intended behavior:
- •SKILL.md or agent markdown file
- •All files in
references/directory (if present) - •CLAUDE.md files in the plugin directory
- •Any referenced configuration or templates
Read thoroughly. The quality of the analysis depends on understanding the full instruction set, not just the top-level file.
Phase 4: Gap Analysis
Compare the actual output against the source instructions. For each instruction or behavioral rule, classify it:
- •Followed — the output correctly implements this instruction
- •Violated — the output contradicts or ignores this instruction
- •Ambiguous — the instruction is vague enough that both the intended and actual behavior are valid interpretations
- •Missing — the intended behavior has no corresponding instruction
Focus on violations and ambiguities — these are the diagnostic targets. For each, note the specific instruction passage and the corresponding output behavior. These pairs feed directly into Phase 5 and Phase 6.
Phase 5: Root Cause Analysis
Classify the likely cause(s) behind each violation or ambiguity:
| Category | Description |
|---|---|
| Instruction ambiguity | The instruction can be read multiple ways; the model chose a valid but unintended interpretation |
| Missing constraint | The intended behavior was never explicitly stated |
| Conflicting rules | Two instructions pull in opposite directions; the model resolved the conflict differently than intended |
| Over-broad scope | The instruction is too general, allowing the model to take unwanted liberties |
| Weak instruction | The instruction exists but lacks enforcement strength; model tendencies override it |
| Instruction burial | The instruction exists but is buried in dense text, reducing its salience |
| Context overflow | Too many instructions compete for attention; critical ones get deprioritized |
Phase 6: Generate Introspection Questions
Invoke llm-author:prompt-engineering with:
- •The violation/ambiguity pairs from Phase 4 (instruction passage + observed behavior)
- •The root cause classifications from Phase 5
- •The instruction to craft questions optimized for honest LLM self-reflection
The generated questions must:
- •Quote the specific instruction passage being probed
- •Ask the misbehaving session to describe its reasoning at the decision point where it diverged
- •Probe one violation or ambiguity per question — compound questions dilute answers
- •Provide enough context for meaningful answers without leading toward specific conclusions
Include a preamble block with the questions that sets the behavioral frame for the answering session: answer honestly, no excuses, no fixes, no deflection.
Phase 7: Offer Clipboard
Ask the user if they want the questions copied to clipboard. If yes, pipe the full question block (preamble + questions) to pbcopy via Bash.
Key Constraints
- •Proceed directly to analysis when the intended behavior is clear from session context and source instructions. Only ask the user what's wrong when the gap genuinely cannot be determined.
- •Output is diagnosis only — never propose fixes, rewrites, or improvements.
- •State findings directly. If an instruction is poorly written, say so.
- •Always quote specific instruction passages when identifying violations. "The skill says to do X" is insufficient — cite the actual text.