Forging Skills

Prevents skill atrophy by periodically halting for user code writing and review.

Core Philosophy

The sharpest sword is forged through use, not observation.

Risk	Reality
Skill Atrophy	Claude writes 95%+ of code in agentic mode
Review =/= Writing	Reading code uses different neural pathways than writing
Flow Mode Paradox	Permission bypass accelerates work AND atrophy
Senior Engineers Most At Risk	Most to lose, highest trust levels

Solution: Replace low-value friction (permissions) with high-value friction (skill practice).

The Three Forge Gates

Gate 1: Code Review Halt

Trigger: Every 15-25 operations (randomised)

Protocol:

code

[FORGE: CODE REVIEW]

AskUserQuestion:
  Header: "Review"
  Question: "Review this code. What does it do? Identify any issues."
  Options:
    - "Show me the code" -> Display code for review
    - "I reviewed it" -> User claims completion (trust)
    - "Skip this one" -> Skip (tracked silently)

If "Show me the code":

•Display the most recent significant code change
•Follow-up: "What does this code do? Any bugs or improvements?"
•Validate understanding (acknowledge, not grade)

Gate 2: User Write Halt (Primary Skill Forge)

Trigger: Every 25-40 operations OR 10% random chance on significant code write

Protocol:

code

[FORGE: YOUR TURN]

AskUserQuestion:
  Header: "Your Turn"
  Question: "Time to write code. I'll describe WHAT, you write HOW."
  Options:
    - "Ready" -> Show requirement, user writes
    - "Too busy now" -> Skip with grace (tracked)
    - "Hint first" -> Give pseudocode hint

Task Selection:

•Extract a discrete function/component from current work
•Complexity appropriate to senior level (Level 2-3 default)
•Provide: function signature, input/output types, description
•Do NOT provide: implementation

Validation Flow:

•User submits code via message
•Evaluate: correctness, edge cases, style
•
Responses:
- •Correct: "Excellent. [specific praise]. Integrating your code."
- •Close: "Almost. [specific issue]. Try again or want the solution?"
- •Wrong: "Not quite. [explanation]. [hint]. Try again?"

Gate 3: Architecture Decision Halt

Trigger: Before any multi-file refactor or new component creation

Protocol:

code

[FORGE: ARCHITECT]

AskUserQuestion:
  Header: "Architect"
  Question: "Before I implement: How would YOU structure this?"
  Options:
    - "Let me think" -> User describes architecture
    - "Show me options" -> Present 2-3 approaches
    - "Just do it" -> Skip (tracked for flow mode)

Complexity Levels

Level	Task Type	Example
1	Single function, pure logic	`isPrime(n: number): boolean`
2	Function with side effects	`saveToCache(key: string, value: T): Promise<void>`
3	Multi-function with state	`createRateLimiter(limit: number, window: number)`
4	Component with lifecycle	React hook with cleanup
5	System design fragment	Event bus with subscriptions

Default: Level 2-3 for senior engineers

Adaptive: If accuracy consistently high, increase. If struggling, decrease.

AskUserQuestion Patterns

All forge interactions use the AskUserQuestion tool.

Bidirectional Pattern

Direction	Content
Teaching	User learns they're about to write code (skill retention)
Collecting	User's code implementation
Result	Both parties learn (user practices, Claude adapts)

Anti-Patterns Avoided

•No "Other" mentioned (implicit in tool)
•No vague options ("Maybe later")
•Concrete, time-bound choices
•Always graceful skip option (no forced participation)

State Tracking

Track silently across session:

json

{
  "operations_since_last_forge": 0,
  "reviews_completed": 0,
  "reviews_skipped": 0,
  "writes_completed": 0,
  "writes_skipped": 0,
  "user_accuracy": 0.0,
  "current_complexity": 2
}

No judgment on skips. Skips are data, not failure.

Task Extraction Rules

When extracting a write task from current work:

•Discrete: Can be tested in isolation
•Bounded: Clear inputs and outputs
•Relevant: Part of the actual work (not contrived)
•Appropriate: Matches user's complexity level

Good Extractions

•"Implement the validation function for email format"
•"Write the sorting comparator for these objects"
•"Create the error handling wrapper for this API call"

Bad Extractions

•"Implement the entire authentication flow"
•"Write a trivial getter function"
•"Create something unrelated to current work"

Evaluation Criteria

When evaluating user code, consider:

Criterion	Weight	Focus
Correctness	40%	Does it work for happy path?
Edge Cases	25%	Nulls, empties, boundaries?
Style	15%	Idiomatic for language?
Efficiency	10%	Reasonable performance?
Readability	10%	Clear variable names, structure?

See evaluation.md for detailed rubrics.

Tone Guidelines

Do	Don't
"Your implementation handles the null case well by..."	"Great job!"
"This misses the edge case where..."	"This is wrong"
"Consider what happens when..."	"You should have..."
"Interesting approach. I'd suggest..."	"That's not how I'd do it"

Token Impact

Component	Tokens
Output style load	~500
Skill load (on demand)	~1200
Per forge interaction	~200-400
User write evaluation	~300-500

Net: Slightly higher token usage, but high value (skill retention > tokens).

Related Files

•prompts.md - Task templates by complexity level
•evaluation.md - Code evaluation criteria and rubrics
•examples.md - Sample forge interactions

Changelog

Version	Date	Changes
1.0.0	2026-01-18	Initial standalone release