Quality Review Workflow

Implements PROTO-ORG-7 from organon/protocols/PROTOCOLS.md. Semantic quality review that catches what automation cannot.

When to Use This Skill

Use this skill when:

•Reviewing new organon files — before merging, check semantic quality
•Weekly quality audits — periodic check beyond automated gates
•After significant methodology changes — verify meaning wasn't lost
•When something "feels off" — gut check against quality standards

Purpose: Automated gates check structure (frontmatter present, sections exist, references resolve). This workflow checks meaning (invariants testable, principles ordered correctly, heuristics covering real decisions).

Context Loading

•
Load quality standards:
- •Read book-llms/ETHOS.md (meta-organon constraints — what every organon must follow)
- •Read book-llms/patterns.md (anti-pattern tables — common mistakes to check for)
- •Read book-llms/workflow-authoring.md (workflow quality attributes)
•
Load project constraints:
- •Read CLAUDE.md (project-level guidance)

Steps

Step 1: Select Review Scope

Choose what to review:

•Single file — focused review of one organon file
•Directory — review all organon files in a scope (e.g., organon/domains/testing/)
•Project-wide — full quality audit of all organon files

Step 2: Run Automated Gates First

bash

cd packages/tools && npx organon verify
cd packages/tools && npx organon validate <path>

Fix any automated failures before proceeding. Semantic review is only meaningful when structural issues are resolved.

Step 3: Review Invariants

For each invariant in the target file(s), apply these filters:

Filter	Question	Pass Criteria
Testable?	Can you write a verification gate for this?	Yes — describes a checkable condition
Specific?	Does this constrain behavior concretely?	Yes — not a vague aspiration like "be good"
Enforceable?	Is the enforcement mechanism real?	Yes — references a specific tool, gate, or process
Non-redundant?	Does this say something the parent scope doesn't already say?	Yes — adds value beyond inheritance
Correctly IDed?	Does the ID follow the INV-SCOPE-N format?	Yes — matches scope and sequential numbering

Red flags:

•"Should be high quality" → too vague, rewrite
•"Enforced by: reviews" → too weak, specify which gate or tool
•Duplicates parent invariant verbatim → remove (inherited automatically)

Step 4: Review Principle Ordering

For each principle list, test the ordering:

•Conflict test: If principle 3 conflicts with principle 1, does 1 genuinely win? If not, reorder.
•Independence test: Are any two principles actually saying the same thing? If so, merge.
•Actionability test: Can an agent use this ordering to make a decision? If not, principles are too abstract.

Example of bad ordering:

code

1. Clarity over complexity
2. Simplicity over complexity

These overlap significantly — merge or differentiate.

Example of good ordering:

code

1. Correctness over performance
2. Clarity over brevity

These are distinct and genuinely prioritized — correctness beats clarity when they conflict.

Step 5: Review Identity Statements

For each IS/IS NOT section:

Check	Good Example	Bad Example
Specific?	"A semantic testing framework for tier-4 invariant verification"	"A good testing framework"
Distinguishing?	Could describe only this thing, not any similar thing	Could describe any testing framework
Boundary-defining?	IS NOT statements prevent scope creep	IS NOT statements are obvious negations

Red flags:

•"IS: well-designed and useful" → generic, could be anything
•"IS NOT: bad" → obvious negation, not a real boundary
•All IS statements could apply to competitors → not specific enough

Step 6: Review Heuristics

For each decision heuristic table:

Check	Question
Coverage	Do these cover the decisions that actually recur in this scope?
Missing decisions	Are there common decisions NOT in the table?
Actionability	Are the actions specific enough to follow without judgment?
Overlap	Do any heuristics give conflicting advice for the same situation?

How to find missing heuristics: Think about the last 5 times you worked in this scope. What decisions did you make? Are they in the table?

Step 7: Check Cross-File Terminology

Search for key terms across all organon files:

Use the Grep tool to search for key terms across all project files:

code

Grep pattern="workflow" path="." glob="*.md"
Grep pattern="skill" path="." glob="*.md"

Flag inconsistencies:

•"Skill" used as generic term (should be "workflow")
•Same concept with different names in different files
•Outdated terminology from previous methodology versions

Known terminology rules (from MEMORY.md):

•"Workflow" = generic term for agent binding layer (Layer 2)
•"Skill" = specific instance (Claude Code skills, Cursor rules)
•Never use "skill" as the generic term — always "workflow"

Step 8: Check Anti-Patterns

Compare content against anti-pattern tables:

•book-llms/ETHOS.md anti-patterns (~17 entries) — organon authoring mistakes
•book-llms/patterns.md anti-patterns (~17 entries) — broader project anti-patterns

For each anti-pattern, check: does the reviewed content exhibit this pattern?

Common anti-patterns found in reviews:

•Phantom Enforcement: invariant claims enforcement but mechanism doesn't exist
•Aspiration as Invariant: "should be" language in invariant (should be "must be")
•Context Overload: workflow loads too many files
•Missing Error Recovery: workflow has no failure/recovery table

Step 9: Generate Review Report

Structure findings by severity:

code

## Quality Review Report — [scope]

### Errors (must fix)
- [file:line] — [finding] — [fix]

### Warnings (should fix)
- [file:line] — [finding] — [fix]

### Suggestions (consider)
- [file:line] — [finding] — [suggestion]

### Summary
- Files reviewed: N
- Errors: N
- Warnings: N
- Suggestions: N

Step 10: Track Improvements

If fixes are made during the review:

bash

cd packages/tools && npx organon verify
cd packages/tools && npx organon health

Confirm no regressions. Health score should be equal to or higher.

Verification

• All automated gates pass (prerequisite for semantic review)
• Every invariant passes the "testable?" filter
• Principle ordering is defensible (1 genuinely beats 2 in conflicts)
• Identity statements are specific and distinguishing
• Heuristics cover actual recurring decisions
• No anti-pattern matches found
• Terminology is consistent across all files reviewed
• Review report generated with findings by severity

Error Recovery

Failure	Recovery Action
Invariant is untestable	Rewrite to be more specific. If it can't be made testable, move it to principles (aspirational, not invariant).
Principles not truly prioritized	Reorder with explicit trade-off reasoning. Document conflicts between adjacent principles.
Identity too generic	Add specifics: technology names, scope boundaries, unique characteristics that distinguish this from similar things.
Anti-pattern found in content	Apply the fix from the anti-pattern table. If the fix is unclear, read the full anti-pattern description in the source file.
Terminology inconsistent	Choose the canonical term (check MEMORY.md for established conventions). Grep-and-replace across all affected files.
Too many findings to fix	Prioritize errors over warnings over suggestions. Fix errors first, then create TODOs for the rest.
Review scope too large	Narrow to a single directory or file. Project-wide reviews should be broken into multiple focused sessions.