Librarian - Documentation Prose Architect

Analyze the prose content of documentation - what it says, how it explains concepts, where ideas are scattered or duplicated - to reorganize by semantic meaning, not just file structure.

Core Philosophy

The librarian examines PROSE, not just code.

The librarian reads documentation like a human reader would, understanding:

•What each file explains - The concepts, workflows, and ideas described in prose
•How explanations relate - Where the same idea is explained differently in multiple places
•Where prose contradicts - Conflicting explanations, not just conflicting numbers
•What's scattered vs. cohesive - Ideas that belong together split across files
•What's mixed vs. focused - Unrelated explanations crammed into one file

This is NOT about:

•Fixing broken links (that's a side effect)
•Checking operator consistency (that's mechanical)
•Enforcing line limits (that's a constraint)
•Validating code blocks (that's syntax)

This IS about:

•Understanding what each document is trying to explain
•Finding where explanations are incomplete, scattered, or contradictory
•Reorganizing prose so concepts flow logically
•Consolidating duplicate explanations into single authoritative sources

What "Examining Prose" Means

When you read a documentation file, ask:

code

PROSE ANALYSIS QUESTIONS:
├── UNDERSTANDING:
│   ├── What concept is this file trying to explain?
│   ├── Who is the intended reader?
│   ├── What should the reader know after reading this?
│   └── What does this file ASSUME the reader already knows?
│
├── QUALITY:
│   ├── Is the explanation complete?
│   ├── Is the explanation clear?
│   ├── Are there gaps in the logic?
│   └── Would a reader be confused by anything?
│
├── RELATIONSHIPS:
│   ├── What other concepts does this explanation reference?
│   ├── Are those concepts explained here or elsewhere?
│   ├── If elsewhere, is the reference clear?
│   └── Does this explanation contradict any other?
│
└── ORGANIZATION:
    ├── Does this explanation belong in this file?
    ├── Is this file trying to explain too many things?
    ├── Is this explanation scattered across multiple files?
    └── Would reorganizing improve understanding?

Prose Issues vs Mechanical Issues

Prose Issues (Focus Here)	Mechanical Issues (Secondary)
Same concept explained 3 different ways	Broken markdown link
Conflicting instructions for same scenario	Missing file reference
Workflow explanation split across 5 files	Line count over limit
Tutorial mixed with reference material	Inconsistent heading levels
Incomplete explanation assumes too much	Duplicate signal format
Confusing explanation with unclear logic	Wrong anchor tag

Always prioritize prose issues. Mechanical issues often fix themselves when prose is reorganized correctly.

Parameters

Parameter	Required	Default	Description
`path`	No	`.claude/docs`	Directory to analyze
`line_limit`	No	`500`	Maximum lines per file (constraint)
`mode`	No	`refactor`	`refactor`, `audit`, or `consistency`

Invocation

code

/librarian                           # Full refactor of .claude/docs
/librarian mode=audit                # Report only, no changes
/librarian mode=consistency          # Check for contradictions only
/librarian path=docs line_limit=400  # Custom path and stricter limit

Phase 1: Semantic Analysis (READ THE PROSE)

1.1 Build Concept Map

Read ALL documentation files as prose and extract:

code

CONCEPT EXTRACTION:
├── For each file, READ IT and identify:
│   ├── Primary concept (what is this file fundamentally ABOUT?)
│   │   → Not "what's in the filename" but "what does it explain"
│   │
│   ├── Key explanations (what does it teach the reader?)
│   │   → Summarize each major section's purpose
│   │
│   ├── Audience assumptions (what does it assume you know?)
│   │   → What would confuse a new reader?
│   │
│   ├── Terminology defined (what terms does it establish?)
│   │   → Look for definitions, not just usage
│   │
│   └── Dependencies (what must you read first?)
│       → What concepts are referenced but not explained?
│
├── Build concept index:
│   ├── concept → [files that EXPLAIN it in prose]
│   ├── concept → [files that MENTION it without explaining]
│   └── concept → [files that ASSUME reader knows it]
│
└── Identify concept clusters:
    ├── Concepts that form a coherent topic (should be together)
    ├── Concepts that build on each other (should be ordered)
    └── Concepts that conflict (should be reconciled)

1.2 Map Explanation Patterns

Understand HOW concepts are explained:

code

EXPLANATION PATTERN ANALYSIS:
├── Explanation styles found:
│   ├── Tutorial (step-by-step, "do this then this")
│   ├── Reference (lookup, "X means Y")
│   ├── Conceptual (understanding, "why X matters")
│   └── Example-driven (showing, "here's X in action")
│
├── Mixed styles in same file (potential split):
│   ├── File tries to be both tutorial AND reference
│   ├── File mixes conceptual overview with implementation detail
│   └── File combines quick-start with exhaustive reference
│
├── Incomplete explanations:
│   ├── Concept mentioned but never fully explained
│   ├── Workflow described but steps are vague
│   ├── Term used but never defined
│   └── "See X" but X doesn't exist
│
└── Redundant explanations:
    ├── Same concept explained in multiple files
    ├── Same workflow described with different wording
    ├── Same template shown in multiple places
    └── Same rules stated in different contexts

1.3 Detect Prose Fragmentation

Find explanations that are SCATTERED when they should be TOGETHER:

code

FRAGMENTATION DETECTION:
├── Same concept explained partially in multiple files:
│   Example: "Expert delegation" explained 30% in expert-delegation.md,
│            20% in escalation-specification.md, 50% in agent-conduct.md
│   → Consolidate into single authoritative explanation
│
├── Workflow explanation split across files:
│   Example: Steps 1-3 in file A, steps 4-6 in file B, step 7 missing
│   → Reunify or clearly link the sequence
│
├── Definition separated from context:
│   Example: Signal format in signals.md, usage rules in workflows.md
│   → Either consolidate or create clear cross-references
│
├── Examples separated from concepts:
│   Example: Concept explained in chapter.md, examples in examples.md
│   → Consider reunifying unless examples are extensive
│
└── Prerequisites scattered:
    Example: "You need X" in file A, "X works like..." in file C
    → Reader can't follow without hunting through multiple files

1.4 Detect Prose Mixing

Find explanations that are TOGETHER when they should be SEPARATE:

code

MIXING DETECTION:
├── Multiple unrelated concepts in one file:
│   Example: File covers state management AND signal formats AND recovery
│   → These serve different readers at different times - separate them
│
├── Mixed audiences:
│   Example: Developer how-to mixed with orchestrator implementation details
│   → Developers don't need orchestrator details - separate them
│
├── Mixed depths:
│   Example: Quick-start overview mixed with exhaustive reference
│   → Reader wants one or the other - separate them
│
├── Mixed purposes:
│   Example: "What is X" (conceptual) mixed with "How to X" (tutorial)
│   → These serve different needs - consider separating
│
└── Mixed time contexts:
    Example: Setup (do once) mixed with runtime (do repeatedly)
    → Reader needs these at different times - separate them

Phase 2: Consistency Analysis (CHECK THE PROSE)

2.1 Semantic Contradictions

Check for CONFLICTING EXPLANATIONS (not just conflicting numbers):

code

CONTRADICTION DETECTION:
├── Same concept explained differently:
│   Example: File A says "agents can delegate to experts"
│            File B says "agents must solve problems themselves"
│   → CRITICAL: Resolve which explanation is correct
│
├── Conflicting instructions:
│   Example: File A says "always signal before exiting"
│            File B says "exit immediately on failure"
│   → CRITICAL: Clarify when each applies
│
├── Conflicting definitions:
│   Example: File A defines "task" as single work item
│            File B uses "task" to mean entire workflow
│   → CRITICAL: Standardize terminology
│
├── Conflicting assumptions:
│   Example: File A assumes reader knows signals
│            File B assumes reader is learning signals
│   → These files serve different audiences - clarify or separate
│
└── Implicit contradictions:
    Example: File A implies X is always true
             File B describes a case where X is false
    → Make the exception explicit

2.2 Explanation Completeness

Check for GAPS in explanations:

code

GAP DETECTION:
├── Concepts referenced but never explained:
│   Example: "Use the signal format" - but what IS the signal format?
│   → Add explanation or clear reference
│
├── Workflows with missing steps:
│   Example: "1. Do X, 2. Do Y, 4. Do Z" - what's step 3?
│   → Complete the workflow
│
├── Assumptions stated but not justified:
│   Example: "Always use method A" - why? When is method B appropriate?
│   → Explain the reasoning
│
├── Edge cases ignored:
│   Example: "Process the input" - what if input is invalid?
│   → Address edge cases or state assumptions
│
└── Reader questions unanswered:
    Example: After reading, would a reader still wonder "but what about...?"
    → Anticipate and answer those questions

2.3 Referential Consistency

Check that references match reality:

code

REFERENTIAL CHECKS:
├── Link claims match content:
│   Example: Link says "see signal formats" but target explains workflows
│   → Fix the link or the description
│
├── Cross-references are bidirectional:
│   Example: A links to B, but B doesn't acknowledge A
│   → Add backlink if relationship is important
│
├── Hierarchies are consistent:
│   Example: Index claims file is about X, file is actually about Y
│   → Update index or clarify file's purpose
│
└── Navigation matches content organization:
    Example: Navigation implies reading order A→B→C but C should come first
    → Fix navigation to match logical flow

Phase 3: Reorganization Planning (PLAN PROSE CHANGES)

3.1 Consolidation Planning

Plan where to MERGE scattered explanations:

code

CONSOLIDATION PLANNING:
├── For each fragmented concept:
│   ├── Which file should be the AUTHORITATIVE source?
│   │   → Usually the most complete/accurate version
│   │
│   ├── What content needs to MOVE there?
│   │   → List specific prose sections, not just "merge files"
│   │
│   ├── What should REMAIN in source files?
│   │   → Brief mention + reference, or nothing?
│   │
│   └── How will readers find it?
│       → Update navigation, add redirects
│
└── Consolidation output format:
    CONSOLIDATE: "Expert Delegation"
    FROM: expert-delegation.md (lines 50-120), escalation-spec.md (lines 30-45)
    TO: expert-delegation.md
    REASON: Concept explained twice with slight variations
    ACTION: Merge prose, pick clearer wording, reference from other file

3.2 Separation Planning

Plan where to SPLIT mixed content:

code

SEPARATION PLANNING:
├── For each mixed file:
│   ├── What DISTINCT explanations does it contain?
│   │   → List them as separate topics
│   │
│   ├── Can they stand alone?
│   │   → Each new file should make sense independently
│   │
│   ├── How should they be named?
│   │   → Names should reflect the explanation, not arbitrary sections
│   │
│   └── How will they link to each other?
│       → Cross-references for related concepts
│
└── Separation output format:
    SEPARATE: "state-management.md"
    INTO:
      - state/fields.md: "What state fields exist" (lines 10-200)
      - state/updates.md: "When state changes" (lines 201-400)
      - state/recovery.md: "How to recover state" (lines 401-500)
    REASON: Three distinct topics serving different needs
    ACTION: Create focused files, add index, cross-reference

3.3 Movement Planning

Plan where to RELOCATE misplaced content:

code

MOVEMENT PLANNING:
├── Content in wrong file:
│   Example: Signal parsing details in workflow-overview.md
│   → Move to signals.md where it belongs conceptually
│
├── Content at wrong level:
│   Example: Implementation details in quick-start.md
│   → Move to detailed reference
│
├── Content for wrong audience:
│   Example: Orchestrator internals in developer-guide.md
│   → Move to orchestrator documentation
│
└── Movement output format:
    MOVE: "Signal parsing implementation"
    FROM: task-dispatch.md (lines 200-350)
    TO: signals/parsing.md
    REASON: Parsing details don't belong in dispatch workflow
    ACTION: Move prose, leave brief reference in source

3.4 Line Limit Compliance

AFTER semantic reorganization, check line limits:

code

LINE LIMIT ANALYSIS (Secondary to semantic concerns):
├── Files over limit AFTER semantic fixes:
│   ├── Is the content genuinely cohesive? → Exception may be justified
│   ├── Can it be split semantically? → Find natural boundaries in prose
│   └── Would splitting harm understanding? → Keep together if so
│
├── Line limits are a CONSTRAINT, not a GOAL:
│   ├── Never split mid-explanation to meet a limit
│   ├── Never refuse consolidation because result is large
│   ├── Semantic organization > line counts
│
└── Report files over limit with recommendations

Phase 4: Refactoring Execution

4.1 Execution Order

Execute changes prioritizing semantic correctness:

code

EXECUTION SEQUENCE:
1. BUILD CHANGE MANIFEST (before any edits)
   - List all files to be modified
   - Track all paths/anchors that will change
   - This enables reference integrity checking later

2. RESOLVE CONTRADICTIONS (highest priority)
   - Determine correct explanation
   - Update conflicting files to align
   - Document the resolution

3. CONSOLIDATE SCATTERED EXPLANATIONS
   - Merge prose into authoritative source
   - Replace duplicates with references
   - Preserve all unique information

4. SEPARATE MIXED CONTENT
   - Create focused files for distinct topics
   - Move prose to appropriate locations
   - Add navigation and cross-references

5. MOVE MISPLACED CONTENT
   - Relocate prose to correct conceptual homes
   - Update source to reference new location
   - Fix navigation paths

6. ADDRESS SIZE (only if needed after above)
   - Semantic splits only
   - Never split mid-explanation
   - Create navigation for split content

7. REFERENCE INTEGRITY (MANDATORY - See Phase 5)
   - Scan ENTIRE project for references to changed files
   - Update ALL broken references
   - Verify zero orphaned references remain
   - THIS STEP IS NOT OPTIONAL

8. VERIFY CONSISTENCY
   - Re-check for contradictions
   - Confirm explanations are complete
   - Validate all links work

4.2 Prose Preservation Rules

When moving/merging explanations:

code

PRESERVATION RULES:
├── NEVER lose explanations - all prose must survive reorganization
├── NEVER change meaning - preserve explanatory intent
├── PREFER clearer wording when merging duplicate explanations
├── PRESERVE examples that illustrate concepts
├── ADD context when moving content to new location
├── UPDATE references so readers can find moved content
└── DOCUMENT changes - what moved where and why

4.3 Navigation Maintenance

Ensure readers can find reorganized content:

code

NAVIGATION RULES:
├── Every explanation reachable within 1 level from entry points
├── Index files summarize what each file EXPLAINS
├── Related explanations linked bidirectionally
├── Clear reading paths for different audiences
├── No dead ends (every file leads somewhere)
└── Moved content findable from old locations (redirects or references)

Phase 5: Reference Integrity (MANDATORY)

This phase is NOT optional. Every librarian run that modifies files MUST complete this phase before finishing.

5.1 Build Change Manifest

Before any edits, track what will change:

code

CHANGE MANIFEST (build before editing):
├── Files to be modified:
│   ├── {file_path}: {what changes}
│   └── ...
│
├── Files to be moved/renamed:
│   ├── {old_path} → {new_path}
│   └── ...
│
├── Anchors to be changed:
│   ├── {file}#{old_anchor} → {file}#{new_anchor}
│   └── ...
│
└── Content relocated:
    ├── "{concept}" from {source} to {target}
    └── ...

5.2 Execute Reference Scan

After ALL edits are complete, scan the ENTIRE project for references:

bash

# Scan all markdown files in project for references to modified files
# For each file in CHANGE MANIFEST:

grep -r "old_filename\.md" .claude/ --include="*.md"
grep -r "old_anchor" .claude/ --include="*.md"

Scan locations (MANDATORY - do not skip any):

•.claude/docs/ - All documentation
•.claude/agents/ - Generated agent files
•.claude/skills/ - Skill definitions
•.claude/commands/ - Command definitions
•.claude/prompts/ - Prompt templates
•README.md, CLAUDE.md - Root documentation
•Any other .md files in the project

5.3 Update All References

For each broken reference found:

code

REFERENCE UPDATE PROCEDURE:
1. Read the file containing the broken reference
2. Identify the reference context (what is it linking to?)
3. Update to new location/anchor
4. Verify the new reference is valid
5. Save the file

Reference patterns to check:

•[text](path/to/file.md) - Markdown links
•[text](path/to/file.md#anchor) - Anchored links
•See [document](path) - Inline references
•**Reference**: [Name](path) - Formal references
•path/to/file.md - Plain path mentions

5.4 Verify Reference Integrity

After updating all references, VERIFY nothing is broken:

code

VERIFICATION PROCEDURE:

1. RE-SCAN for old paths/anchors
   - grep -r "{old_path}" .claude/ --include="*.md"
   - Result MUST be empty (zero matches)

2. VALIDATE new references exist
   - For each updated reference, confirm target file exists
   - For anchored links, confirm anchor exists in target

3. CHECK for orphaned files
   - Files that were sources of moved content
   - Ensure they either redirect or are deleted

4. REPORT results
   - List all references updated
   - Confirm zero broken references remain

5.5 Reference Integrity Output

Include in final report:

code

REFERENCE INTEGRITY CHECK
=========================

Change Manifest:
- {N} files modified
- {N} files moved/renamed
- {N} anchors changed

References Updated: {count}
- {file}: Updated {N} references to {target}
- ...

Verification:
✓ Zero references to old paths remain
✓ All new references validated
✓ No orphaned files

Scanned Locations:
✓ .claude/docs/ ({N} files)
✓ .claude/agents/ ({N} files)
✓ .claude/skills/ ({N} files)
✓ .claude/commands/ ({N} files)
✓ .claude/prompts/ ({N} files)

If ANY broken references remain, the librarian run is INCOMPLETE.

Output Formats

Audit Mode Output

code

LIBRARIAN AUDIT REPORT
======================

Directory: {path}
Files Analyzed: {count}

PROSE CONTRADICTIONS (must resolve):
------------------------------------
CRITICAL: {concept} explained inconsistently
  - {file_a}: "{explanation_a}" (lines {range})
  - {file_b}: "{explanation_b}" (lines {range})
  Resolution needed: {what needs to be decided}

FRAGMENTED EXPLANATIONS (should consolidate):
---------------------------------------------
Concept "{concept}" explained in pieces:
  - {file1}: Explains {aspect1} (lines {range})
  - {file2}: Explains {aspect2} (lines {range})
  - {file3}: Explains {aspect3} (lines {range})
  Recommendation: Consolidate to {target}, reference from others

MIXED CONTENT (should separate):
--------------------------------
File "{file}" mixes unrelated topics:
  - {topic_a}: {description} (lines {range})
  - {topic_b}: {description} (lines {range})
  Recommendation: Separate into {file_a}, {file_b}

INCOMPLETE EXPLANATIONS:
------------------------
- {file}: {concept} mentioned but not explained
- {file}: {workflow} missing steps {which}
- {file}: {term} used but never defined

SIZE ISSUES (after semantic fixes):
-----------------------------------
| File | Lines | Semantic Split Possible? |
|------|-------|--------------------------|
| {file} | {lines} | {yes/no - why} |

Run with mode=refactor to apply changes.

Refactor Mode Output

code

LIBRARIAN REFACTOR COMPLETE
===========================

CHANGE MANIFEST:
----------------
Files modified: {count}
Files moved/renamed: {count}
Anchors changed: {count}

CONTRADICTIONS RESOLVED: {count}
--------------------------------
1. {concept}: Chose {file}'s explanation as authoritative
   Updated: {other_files}
   Reason: {why this explanation is correct}

EXPLANATIONS CONSOLIDATED: {count}
----------------------------------
1. "{concept}" unified from {count} sources → {target}
   Prose merged: {line_count} lines
   Sources now reference: {target}

CONTENT SEPARATED: {count}
--------------------------
1. {source} split into:
   - {file1}: Explains {topic1}
   - {file2}: Explains {topic2}

CONTENT MOVED: {count}
----------------------
1. {topic} moved from {source} to {target}
   Reason: {why it belongs there}

REFERENCE INTEGRITY (MANDATORY):
--------------------------------
Locations scanned:
  ✓ .claude/docs/ ({N} files)
  ✓ .claude/agents/ ({N} files)
  ✓ .claude/skills/ ({N} files)
  ✓ .claude/commands/ ({N} files)
  ✓ .claude/prompts/ ({N} files)

References updated: {count}
  - {file}: {old_ref} → {new_ref}
  - ...

Verification:
  ✓ Re-scan found ZERO broken references
  ✓ All new references validated

PROSE VERIFICATION:
-------------------
✓ No contradicting explanations remain
✓ Each concept has single authoritative source
✓ All explanations complete and findable
✓ Navigation paths preserved

Concept Affinity Rules

When deciding what explanations belong together:

High Affinity (same file)

•Concept definition and its examples
•Workflow overview and its steps
•Rule and its exceptions
•Format specification and its usage

Medium Affinity (same directory, linked)

•Overview and detailed reference
•Tutorial and troubleshooting
•Concept and related concepts

Low Affinity (separate, cross-linked)

•Different roles' documentation
•Different phases of a process
•Conceptual vs implementation

Quality Checks

Before Refactoring

• All contradicting explanations identified
• Consolidation preserves all unique information
• Separations create coherent standalone topics
• Movements improve conceptual organization
• Change manifest built (list of all files/paths/anchors to change)

After Refactoring

• No contradictions remain
• Each concept explained in ONE place authoritatively
• Related concepts linked
• Readers can find what they need
• All prose preserved

Reference Integrity (MANDATORY)

• All project locations scanned for references to changed files
• All broken references updated to new locations
• Zero orphaned references confirmed via re-scan
• Verification report included in output

The librarian run is NOT complete until reference integrity is verified.

Error Handling

Issue	Action
Contradictions can't be resolved from prose alone	Flag for human decision
Explanation doesn't fit anywhere cleanly	Create new topic category
Consolidation would create huge file	Accept if cohesive, flag if not
Separation would fragment explanation	Keep together

Prerequisites

Requirement	Purpose
Read access to `path`	Read and understand documentation prose
Write access to `path`	Reorganize files (refactor mode)
Task tool	Parallel analysis agents to read files
Understanding	Actually READ the prose, don't just scan