Post-Mortem Skill
Quick Ref: Toolchain → Agent Synthesis → Knowledge Extraction. Output:
.agents/retros/*.md+.agents/learnings/*.md
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
Architecture: Tools do 95% of validation. Agents synthesize the 5%.
Execution Steps
Given /post-mortem [epic-id]:
Step 1: Run Toolchain Validation (MANDATORY FIRST)
Before anything else, run the toolchain:
./scripts/toolchain-validate.sh --gate 2>&1 | tee .agents/tooling/run.log TOOL_EXIT=$?
Check exit code:
- •
0= All tools pass → proceed - •
2= CRITICAL findings → STOP, report to user - •
3= HIGH findings only → proceed with warnings
Step 1a: Early Exit on CRITICAL Tool Failures
If TOOL_EXIT == 2:
STOP IMMEDIATELY. Report to user: "Toolchain validation found CRITICAL issues. Fix before post-mortem." Tool outputs in .agents/tooling/: - gitleaks.txt: <count> secret findings - ruff.txt: <count> lint errors - ... Run: cat .agents/tooling/<tool>.txt Fix the issues, then re-run /post-mortem
DO NOT dispatch agents. Tools found definitive problems.
Step 2: Identify What Was Completed
If epic ID provided: Use it directly.
If no epic ID: Find recently completed work:
bd list --status closed --since "7 days ago" 2>/dev/null | head -5
Or check recent git activity:
git log --oneline --since="24 hours ago" | head -10
Step 3: Build Mechanical Comparison Table
Generate checklist from plan before reading code:
If plan exists (.agents/plans/*.md):
1. Extract all TODO/deliverable items from plan 2. For each item: - Expected file: <path> - File exists: yes/no - Implementation matches spec: yes/no (cite file:line)
This creates ground truth for plan-compliance, not gestalt impression.
Write comparison table to prompt text for plan-compliance-expert agent.
Step 4: Triage Tool Findings and Synthesize
TRIAGE tool findings, not FIND issues. Tools already found issues.
Read the tool outputs:
cat .agents/tooling/gitleaks.txt cat .agents/tooling/semgrep.txt cat .agents/tooling/ruff.txt cat .agents/tooling/golangci-lint.txt cat .agents/tooling/radon.txt
4a. Triage Security Findings
For each finding from gitleaks/semgrep:
- •Read the cited file:line
- •Assess: true positive or false positive?
- •If true positive: severity (CRITICAL/HIGH/MEDIUM/LOW) and fix
Record in table:
| File:Line | Tool Finding | Verdict | Severity | Fix |
|---|
4b. Triage Code Quality Findings
For each finding from linters:
- •Read the cited file:line
- •Assess: worth fixing now, tech debt, or noise?
- •If worth fixing: suggest specific change
Record in table:
| File:Line | Tool Finding | Priority | Suggested Fix |
|---|
4c. Verify Plan Completion
Using the mechanical comparison table from Step 3:
For each plan item marked "no" or "partial":
- •Is this a real gap or scope change?
- •Should it be tracked as follow-up issue?
Record in table:
| Plan Item | Status | Gap Type | Action |
|---|
4d. Extract Verified Learnings
From the completed work and changed files:
Extract learnings that are VERIFIED (appeared in multiple places or have source citation):
For each learning:
- •ID: L-<date>-<N>
- •Category: technical/process/architecture
- •What: <1 sentence>
- •Source: <file:line or commit hash>
- •Verification: <how we know this is true>
DO NOT include confidence scores. Include source citations only.
Step 5a: Log Triage Decisions
For each TRUE_POS or FALSE_POS verdict from agents, log it for accuracy tracking:
# Log each triage decision ./scripts/log-triage-decision.sh "src/auth.go:42" "semgrep" "TRUE_POS" "security-reviewer" ./scripts/log-triage-decision.sh "tests/mock.py:15" "gitleaks" "FALSE_POS" "security-reviewer"
This enables accuracy tracking over time. Ground truth is added later when:
- •CI confirms (test pass/fail)
- •Production incident occurs
- •Human reviews the decision
View accuracy report:
./scripts/compute-triage-accuracy.sh
Step 5: Synthesize Results
Combine agent outputs:
- •Deduplicate findings by file:line
- •Sort by severity (CRITICAL → HIGH → MEDIUM → LOW)
- •Count verified vs disputed findings
Compute grade based on TOOL findings, not agent opinions:
Grade A: 0 critical tool findings Grade B: 0 critical, <5 high tool findings Grade C: 0 critical, 5-15 high tool findings Grade D: 1+ critical tool findings Grade F: Multiple critical, tests failing
Step 6: Request Human Approval (Gate 4)
Tool: AskUserQuestion
Parameters:
questions:
- question: "Post-mortem complete. Grade: <grade>. Tool findings triaged. Store learnings?"
header: "Gate 4"
options:
- label: "TEMPER & STORE"
description: "Learnings are good - lock and index"
- label: "ITERATE"
description: "Need another round of fixes"
multiSelect: false
Step 7: Write Post-Mortem Report
Write to: .agents/retros/YYYY-MM-DD-post-mortem-<topic>.md
Merge agent outputs:
- •Collect tables from all 4 agents
- •Deduplicate by file:line (within 5 lines tolerance)
- •Sort by severity (CRITICAL -> HIGH -> MEDIUM -> LOW)
- •List which agents agreed on each finding
# Post-Mortem: <Topic/Epic> **Date:** YYYY-MM-DD **Epic:** <epic-id or description> **Duration:** <how long> ## Toolchain Results | Tool | Status | Findings | |------|--------|----------| | gitleaks | PASS/FAIL | <count> | | ruff | PASS/FAIL | <count> | | pytest | PASS/FAIL | <count> | **Gate Status:** <PASS/BLOCKED> ## Triaged Findings ### True Positives (actionable) | File:Line | Issue | Severity | Status | |-----------|-------|----------|--------| ### False Positives (dismissed) | File:Line | Tool Claimed | Why Dismissed | |-----------|--------------|---------------| ## Plan Compliance <Mechanical comparison table from Step 3> ## Learnings Extracted | ID | Category | Learning | Source | |----|----------|----------|--------| **Note:** No confidence scores. Source citations only. ## Follow-up Issues <Issues created from findings> ## Knowledge Flywheel Status - **Learnings indexed:** <count from ao forge> - **Session provenance:** <session-id> - **ao forge status:** PASS/SKIP (not available)
Step 7a: Index Learnings via ao forge (Knowledge Flywheel)
If user approved TEMPER & STORE in Gate 4, index learnings into knowledge base:
# Check if ao CLI is available
if command -v ao &>/dev/null; then
# Index learnings from the retro/learnings directory
ao forge index .agents/learnings/ 2>&1 | tee -a .agents/tooling/ao-forge.log
AO_EXIT=$?
# Add provenance tracking - link learnings to this session
SESSION_ID=$(ao session id 2>/dev/null || echo "unknown")
echo "Provenance: session=$SESSION_ID, timestamp=$(date -Iseconds)" >> .agents/learnings/provenance.txt
# Check flywheel status
FLYWHEEL_STATUS=$(ao flywheel status --json 2>/dev/null || echo '{"indexed": 0}')
INDEXED_COUNT=$(echo "$FLYWHEEL_STATUS" | jq -r '.indexed // 0')
if [ $AO_EXIT -eq 0 ]; then
echo "Flywheel: Learnings indexed successfully"
else
echo "Flywheel: ao forge failed (exit $AO_EXIT) - learnings NOT indexed"
fi
else
echo "Flywheel: ao CLI not available - learnings written but NOT indexed"
echo " Install ao or run manually: ao forge index .agents/learnings/"
fi
Fallback: If ao is not available, learnings are still written to .agents/learnings/*.md but won't be searchable via ao search. The skill continues normally.
Step 8: Report to User
Tell the user:
- •Toolchain results (which tools ran, pass/fail)
- •Grade (based on tool findings)
- •Key triaged findings
- •Learnings extracted
- •Gate 4 decision
- •Flywheel status (learnings indexed count)
Key Differences from Previous Version
| Before | After |
|---|---|
| Agents find issues | Tools find issues, agents triage |
| Vague prompts | Checklist-driven prompts with tool output |
| Fake confidence scores | Source citations only |
| Always produces report | Gates on tool failures |
| 6 agents | 4 agents (focused on synthesis) |