Framework Audit Skill

Comprehensive governance audit for the academicOps framework.

NO RATIONALIZATION: An audit reports ALL discrepancies. Do NOT justify ignoring files as "generated", "acceptable", or "probably don't need to be tracked". Every gap is reported. The user decides what's acceptable - not the auditor.

Workflow Entry Point

IMMEDIATELY call TodoWrite with the following items, then work through each one:

code

TodoWrite(todos=[
  {content: "Phase 0: Run health metrics", status: "pending", activeForm: "Running health audit"},
  {content: "Phase 1: Structure audit - compare filesystem to INDEX.md", status: "pending", activeForm: "Auditing structure"},
  {content: "Phase 2: Reference graph - invoke Skill(skill='framework') then run link audit scripts", status: "pending", activeForm: "Building reference graph"},
  {content: "Phase 3: Skill content audit - check size and actionability", status: "pending", activeForm: "Auditing skill content"},
  {content: "Phase 4: Justification audit - check specs for file references", status: "pending", activeForm: "Auditing file justifications"},
  {content: "Phase 4b: Instruction justification - verify every instruction traces to framework/enforcement-map.md", status: "pending", activeForm: "Auditing instruction justifications"},
  {content: "Phase 5: Documentation accuracy - verify README.md flowchart vs hooks", status: "pending", activeForm": "Verifying documentation"},
  {content: "Phase 6: Regenerate indices - invoke Skill(skill='flowchart') for README.md flowchart", status: "pending", activeForm: "Regenerating indices"},
  {content: "Phase 7: Other updates", status: "pending", activeForm: "Finalizing updates"},
  {content: "Phase 8: Save audit report to $ACA_DATA/projects/aops/audit/YYYY-MM-DD-HHMMSS-audit.md", status: "pending", activeForm: "Persisting report"},
  {content: "Phase 8b: Transcript QA - scan recent sessions for hydration gaps and operational failures", status: "pending", activeForm: "Running transcript QA analysis"},
  {content: "Phase 9: Create tasks for actionable findings", status: "pending", activeForm: "Creating tasks"}
])

CRITICAL: Work through EACH phase in sequence. When a phase requires a skill, invoke it explicitly as shown below.

Specialized Workflows

Session Effectiveness Audit

Qualitative assessment of session transcripts to evaluate framework performance.

code

Skill(skill="audit", args="session-effectiveness /path/to/transcript.md")

Workflow defined in workflows/session-effectiveness.md.

Individual Scripts (Reference Only)

These scripts run individual checks. They are NOT a substitute for the full workflow:

bash

uv run python scripts/audit_framework_health.py -m  # Phase 0 health metrics
uv run python scripts/check_skill_line_count.py
uv run python scripts/check_broken_wikilinks.py
uv run python scripts/check_orphan_files.py
cd aops-core && uv run python -c "from lib.transcript_error_analyzer import scan_recent_sessions; print(scan_recent_sessions(hours=48).format_markdown())"  # Phase 8b transcript QA

Phase Instructions

Phase 0: Health Metrics

Run comprehensive health audit first:

bash

uv run python scripts/audit_framework_health.py \
  --output /tmp/health-$(date +%Y%m%d).json

This generates:

•/tmp/health-YYYYMMDD.json - Machine-readable metrics
•/tmp/health-YYYYMMDD.md - Human-readable report

Metrics tracked: Component counts, hook coverage, skill sizes, wikilink validation

→ Continue to Phase 1 (do not stop here)

Phase 1: Structure Audit

Compare filesystem to documentation:

•Scan filesystem: find . -type f -not -path "*/.git/*" -not -path "*/__pycache__/*" | sort
•Compare to INDEX.md: Flag missing or extra entries
•Check cross-references: Verify → references point to existing files
•Find broken wikilinks: Grep for [[...]] patterns, validate targets exist

Phase 2: Reference Graph & Link Audit

First: Invoke Skill(skill="framework") to load framework conventions for linking rules.

Then build reference graph and check linking conventions:

bash

# Generate graph for aops-core
uv run python aops-core/skills/audit/scripts/build_reference_map.py --root aops-core --output data/aops/reference-graph-core.json

# Find orphans in aops-core
uv run python aops-core/skills/audit/scripts/find_orphans.py --graph data/aops/reference-graph-core.json

# Or use the health script for wikilink/orphan checks
uv run python scripts/audit_framework_health.py -m

Linking rules to enforce (from framework skill):

•Skills via invocation (Skill(skill="x")), not file paths
•No backward links (children → parent)
•Parents must reference children
•Use wikilinks, not backticks for graph connectivity
•Full relative paths in wikilinks

Phase 3: Skill Content Audit

For each $AOPS/skills/*/SKILL.md:

•Size check: Must be <500 lines
•Actionability test: Each section must tell agents WHAT TO DO
•
Content separation violations:
- •❌ Multi-paragraph "why" → move to spec
- •❌ Historical context → delete
- •❌ Reference material >20 lines → move to references/

Phase 4: Justification Audit (Files)

For each significant file in $AOPS/:

•Search specs: Grep $AOPS/specs/ for references
•Check core docs: JIT-INJECTION.md, README.md, INDEX.md
•Classify: Justified / Implicit / Orphan

Skip: __pycache__/, .git/, individual files within skills, tests, assets

Phase 4b: Instruction Justification Audit

Every behavioral instruction injected to agents must trace to framework/enforcement-map.md.

Unjustified instructions are bloat - they cost tokens and create confusion about what's actually enforced.

Sources to scan (files injected at SessionStart or via hooks):

•FRAMEWORK-PATHS.md - core instructions
•AXIOMS.md, HEURISTICS.md - principle statements
•skills/*/SKILL.md - skill-specific instructions
•commands/*.md - command instructions
•agents/*.md - agent instructions

What constitutes a "behavioral instruction":

•Imperative statements: "always do X", "never do Y", "you MUST", "you SHOULD"
•Conditional rules: "when X, do Y", "if X then Y"
•Workflow requirements: "invoke skill X first", "before doing X, check Y"

Validation process:

•Extract behavioral instructions from each source file (look for imperatives, MUSTs, SHOULDs, "always", "never", "before", "first")
•
For each instruction, search enforcement-map.md for:
- •Direct reference to the instruction text
- •Reference to the source file + line number
- •Mapping to an axiom or heuristic that covers this instruction
•
Classify each instruction:
- •Justified: Appears in enforcement-map.md with axiom/heuristic mapping
- •Implicit: Derives from a documented axiom/heuristic but not explicitly in enforcement-map.md
- •Orphan: No traceability - FLAG FOR REVIEW

Example orphan (discovered in session):

code

FRAMEWORK-PATHS.md:35 - "When working with session logs, always invoke Skill(skill='transcript') first"
→ NOT in enforcement-map.md
→ No axiom/heuristic reference
→ ORPHAN - needs justification or removal

Output format:

code

### Instruction Justification Status

**Justified** (N instructions):
- FRAMEWORK-PATHS.md:78 "NEVER hardcode paths" → [[axioms/dry-modular-explicit.md]]

**Implicit** (N instructions):
- skills/python-dev/SKILL.md:42 "use uv run" → derives from [[axioms/use-standard-tools.md]]

**Orphan** (N instructions) - REQUIRES ACTION:
- FRAMEWORK-PATHS.md:35 "invoke transcript skill first for session logs" → NO JUSTIFICATION
- commands/learn.md:56 "..." → NO JUSTIFICATION

Resolution for orphans:

•Create heuristic if rule is valuable
•Add to enforcement-map.md with axiom/heuristic mapping
•Or DELETE the instruction if it's not worth formalizing

Phase 5: Documentation Accuracy

Verify README.md flowchart reflects actual hook architecture:

•Parse Mermaid for hook names
•Compare to hooks/router.py dispatch mappings
•Compare to settings.json hook events
•Flag drift

Phase 6: Curate Index Files

Index files are root-level files for agent consumption. The auditing agent curates these using LLM judgment, not mechanical script generation.

Target files: INDEX.md, enforcement-map.md, WORKFLOWS.md, SKILLS.md, AXIOMS.md, HEURISTICS.md, docs/ENFORCEMENT.md, README.md (flowchart section).

Approach: For each index file, read the source materials, then write a curated index that accurately reflects current state. Use your judgment to:

•Prioritize what's most useful for agent routing and context
•Remove stale entries that no longer match the filesystem
•Add missing entries discovered during earlier audit phases
•Keep descriptions concise and actionable

Per-File Instructions

Index File	Sources	Key Judgment
AXIOMS.md	`axioms/*.md` files	Priority ordering, concise summaries
HEURISTICS.md	`heuristics/*.md` files	Priority ordering, concise summaries
SKILLS.md	`skills//SKILL.md` frontmatter, `commands/.md`	Routing triggers, description accuracy
WORKFLOWS.md	`workflows/.md`, `skills//workflows/*.md`	Decision tree accuracy, scope routing
INDEX.md	Filesystem scan of `$AOPS/`	File tree with accurate purpose annotations
enforcement-map.md	`hooks/*.py` "Enforces:" docstrings, `gate_config.py`	Axiom-to-hook mapping accuracy
docs/ENFORCEMENT.md	`specs/enforcement.md`, existing content	Mechanism ladder, root cause model
README.md (flowchart)	`hooks/router.py`, `gate_config.py`, `gates.py`	Invoke `Skill(skill="flowchart")` first. Mermaid accuracy

WORKFLOWS.md Curation

Source data: Each workflow file in workflows/*.md has YAML frontmatter with:

•id: Workflow identifier
•category: Workflow category (development, operations, routing, etc.)
•bases: Array of base patterns this workflow composes (e.g., [base-task-tracking, base-tdd])

Generation requirements:

•Preserve existing structure: Keep the decision tree, key distinctions, and project-specific sections
•Preserve annotations: Do NOT delete  or  comments - these contain design history
•Add Bases column: In workflow tables, include a "Bases" column showing which base patterns each workflow composes
•Extract from frontmatter: Read bases: field from each workflow's YAML frontmatter
•Handle missing bases: If a workflow lacks bases: in frontmatter, show "-" in the Bases column

Table format:

markdown

| Workflow            | When to Use                        | Bases                                    |
| ------------------- | ---------------------------------- | ---------------------------------------- |
| [[tdd-cycle]]       | Any testable code change           | task-tracking, tdd, verification, commit |
| [[debugging]]       | Cause unknown, investigating       | task-tracking, verification              |
| [[simple-question]] | Pure information, no modifications | -                                        |

Why this matters: The bases: metadata enables the hydrator to compose workflow steps rather than just listing options (see task aops-4f512f50).

enforcement-map.md Derivation

Hook-Axiom Declaration Convention: Every hook that enforces an axiom declares it in its module docstring:

python

"""
Hook description.

Enforces: current-state-machine (Current State Machine)
"""

Cross-reference validation:

•Parse all hooks for "Enforces:" declarations
•Compare against enforcement-map.md Axiom-Enforcement table
•Flag discrepancies (hook declares axiom not in map, map lists hook without declaration, etc.)

README.md Flowchart

First: Invoke Skill(skill="flowchart") to load Mermaid conventions.

Regenerate the core loop flowchart from hooks/router.py dispatch mappings, gate_config.py gate definitions, and hooks/*.py implementations. Every gate in gate_config.py must be represented.

Generated File Header

Each curated index must include:

code

> **Curated by audit skill** - Regenerate with `Skill(skill="audit")`

Phase 7: Other Updates

•Fix README.md: Update tables including sub-workflows (see below)
•Report orphans: Flag for human review (do NOT auto-delete)
•Report violations: List with file:line refs

Sub-Workflow Extraction for README.md

Skills with multiple workflows/modes MUST have each sub-workflow documented separately in the Skills table.

Detection: For each skills/*/SKILL.md:

•Grep for ^## Workflow: or ^## Mode headers
•Check for workflows/ subdirectory with separate workflow files
•Check for ## Modes section listing multiple invocation patterns

Output format (add third column to Skills table):

markdown

| Skill            | Purpose                     | Sub-workflows                                |
| ---------------- | --------------------------- | -------------------------------------------- |
| session-insights | Session transcript analysis | Current (default), Batch, Issues             |
| audit            | Framework governance        | Full audit (default), Session effectiveness  |
| tasks            | Task lifecycle              | View/archive/create (default), Email capture |

Rules:

•Mark default workflow with "(default)"
•List workflows in order they appear in SKILL.md
•If only one workflow exists, leave sub-workflows column as "—"
•Extract workflow names from ## Workflow: X headers or workflows/*.md filenames

Phase 8: Persist Report (MANDATORY)

Every audit MUST save a written report to $ACA_DATA/projects/aops/audit/.

bash

# Create directory if needed
mkdir -p "$ACA_DATA/projects/aops/audit"

# Generate timestamped filename (format: YYYY-MM-DD-HHMMSS-audit.md)
REPORT_PATH="$ACA_DATA/projects/aops/audit/$(date +%Y-%m-%d-%H%M%S)-audit.md"

Use the Write tool to save the complete audit report (see Report Format below) to $REPORT_PATH.

Report file MUST include:

•YAML frontmatter with date, duration, and summary stats
•All phase results from Phase 0-7
•Clear pass/fail status for each validation criterion

After writing, confirm: Audit report saved to: [path]

Phase 8b: Transcript QA Analysis

Quantitative scan of recent session transcripts to detect operational failures — hydration gaps, stuck patterns, tool failures — that structural auditing (Phases 0-7) cannot catch.

Distinction from Session Effectiveness (the session-effectiveness sub-workflow):

Aspect	Transcript QA (Phase 8b)	Session Effectiveness
Scope	Batch — all sessions in last 48h	Single session
Method	Mechanical extraction from JSONL errors	LLM qualitative analysis of full transcript
Output	Severity-weighted investigation queue	6-dimension evaluation report
Typical trigger	Every full audit	On-demand or session-end hook
Detects	Hydration gaps, stuck loops, tool crashes	Token waste, goal drift, sycophancy patterns

Run from aops-core/ directory:

bash

cd aops-core && uv run python -c "
from lib.transcript_error_analyzer import scan_recent_sessions
report = scan_recent_sessions(hours=48)
print(report.format_markdown())
"

This produces a severity-weighted investigation queue. Error categories: hydration_gap, exploration_miss, stuck_pattern, hook_denial, user_rejection, tool_failure.

Include in audit report:

•Summary line: "Transcript QA: N sessions scanned, M errors across K patterns"
•Top 5 issues by weighted score
•If no sessions or no errors: "Transcript QA: N sessions scanned, no issues detected."

Task creation criteria (applied in Phase 9):

•Recurring pattern: appears in 2+ sessions OR has weighted_score >= 6
•Critical stuck patterns: any stuck_pattern with repeat count >= 3
•High-severity hydration gaps: hydration_gap category with weighted_score >= 4

Phase 9: Create Tasks for Actionable Findings

Create tasks for findings that require human action.

For each finding from Phases 0-7 that requires action:

•Classify finding type using the mapping below
•Create task with appropriate metadata via tasks MCP
•Track task IDs for summary

Finding Type → Issue Mapping

Finding Type	Priority	Issue Type	Labels
Broken wikilinks	P2	bug	audit,documentation
Orphan files	P3	chore	audit,cleanup
Skill >500 lines	P2	chore	audit,refactor
Explanatory content in skill	P2	chore	audit,refactor
Missing from INDEX.md	P3	chore	audit,documentation
Orphan instruction (no enforcement-map.md trace)	P2	bug	audit,governance
README.md flowchart drift	P2	bug	audit,documentation
Hook→Axiom mismatch	P2	bug	audit,governance
Recurring hydration gap (2+ sessions or score≥6)	P2	bug	audit,hydration
Critical stuck pattern (repeat≥3)	P1	bug	audit,hydration
Recurring tool failure (2+ sessions)	P3	bug	audit,hydration

Task Creation Pattern

python

mcp__plugin_aops-core_task_manager__create_task(
    title="[Finding Type]: [specific details]",
    type="task",
    priority=[2|3],
    tags=["audit", "[category]"],
    body="[context from audit]"
)

Skip Conditions

Do NOT create issues for:

•Regenerated indices (Phase 6 actions) - already handled
•Pass status findings - no action needed
•Justified files (Phase 4) - no action needed
•Implicit files (Phase 4) - acceptable, no action needed

Output Summary

After creating tasks, add to audit report:

markdown

### Tasks Created

Created N tasks:

- ns-xxx: Broken wikilink: [[foo.md]] in bar.md
- ns-yyy: Orphan file: docs/old.md
- ns-zzz: Skill over limit: skills/big/SKILL.md

Report Format

See [[references/report-format]] for the complete report template and validation criteria.