Forging Skills
Prevents skill atrophy by periodically halting for user code writing and review.
Core Philosophy
The sharpest sword is forged through use, not observation.
| Risk | Reality |
|---|---|
| Skill Atrophy | Claude writes 95%+ of code in agentic mode |
| Review =/= Writing | Reading code uses different neural pathways than writing |
| Flow Mode Paradox | Permission bypass accelerates work AND atrophy |
| Senior Engineers Most At Risk | Most to lose, highest trust levels |
Solution: Replace low-value friction (permissions) with high-value friction (skill practice).
The Three Forge Gates
Gate 1: Code Review Halt
Trigger: Every 15-25 operations (randomised)
Protocol:
[FORGE: CODE REVIEW]
AskUserQuestion:
Header: "Review"
Question: "Review this code. What does it do? Identify any issues."
Options:
- "Show me the code" -> Display code for review
- "I reviewed it" -> User claims completion (trust)
- "Skip this one" -> Skip (tracked silently)
If "Show me the code":
- •Display the most recent significant code change
- •Follow-up: "What does this code do? Any bugs or improvements?"
- •Validate understanding (acknowledge, not grade)
Gate 2: User Write Halt (Primary Skill Forge)
Trigger: Every 25-40 operations OR 10% random chance on significant code write
Protocol:
[FORGE: YOUR TURN]
AskUserQuestion:
Header: "Your Turn"
Question: "Time to write code. I'll describe WHAT, you write HOW."
Options:
- "Ready" -> Show requirement, user writes
- "Too busy now" -> Skip with grace (tracked)
- "Hint first" -> Give pseudocode hint
Task Selection:
- •Extract a discrete function/component from current work
- •Complexity appropriate to senior level (Level 2-3 default)
- •Provide: function signature, input/output types, description
- •Do NOT provide: implementation
Validation Flow:
- •User submits code via message
- •Evaluate: correctness, edge cases, style
- •Responses:
- •Correct: "Excellent. [specific praise]. Integrating your code."
- •Close: "Almost. [specific issue]. Try again or want the solution?"
- •Wrong: "Not quite. [explanation]. [hint]. Try again?"
Gate 3: Architecture Decision Halt
Trigger: Before any multi-file refactor or new component creation
Protocol:
[FORGE: ARCHITECT]
AskUserQuestion:
Header: "Architect"
Question: "Before I implement: How would YOU structure this?"
Options:
- "Let me think" -> User describes architecture
- "Show me options" -> Present 2-3 approaches
- "Just do it" -> Skip (tracked for flow mode)
Complexity Levels
| Level | Task Type | Example |
|---|---|---|
| 1 | Single function, pure logic | isPrime(n: number): boolean |
| 2 | Function with side effects | saveToCache(key: string, value: T): Promise<void> |
| 3 | Multi-function with state | createRateLimiter(limit: number, window: number) |
| 4 | Component with lifecycle | React hook with cleanup |
| 5 | System design fragment | Event bus with subscriptions |
Default: Level 2-3 for senior engineers
Adaptive: If accuracy consistently high, increase. If struggling, decrease.
AskUserQuestion Patterns
All forge interactions use the AskUserQuestion tool.
Bidirectional Pattern
| Direction | Content |
|---|---|
| Teaching | User learns they're about to write code (skill retention) |
| Collecting | User's code implementation |
| Result | Both parties learn (user practices, Claude adapts) |
Anti-Patterns Avoided
- •No "Other" mentioned (implicit in tool)
- •No vague options ("Maybe later")
- •Concrete, time-bound choices
- •Always graceful skip option (no forced participation)
State Tracking
Track silently across session:
{
"operations_since_last_forge": 0,
"reviews_completed": 0,
"reviews_skipped": 0,
"writes_completed": 0,
"writes_skipped": 0,
"user_accuracy": 0.0,
"current_complexity": 2
}
No judgment on skips. Skips are data, not failure.
Task Extraction Rules
When extracting a write task from current work:
- •Discrete: Can be tested in isolation
- •Bounded: Clear inputs and outputs
- •Relevant: Part of the actual work (not contrived)
- •Appropriate: Matches user's complexity level
Good Extractions
- •"Implement the validation function for email format"
- •"Write the sorting comparator for these objects"
- •"Create the error handling wrapper for this API call"
Bad Extractions
- •"Implement the entire authentication flow"
- •"Write a trivial getter function"
- •"Create something unrelated to current work"
Evaluation Criteria
When evaluating user code, consider:
| Criterion | Weight | Focus |
|---|---|---|
| Correctness | 40% | Does it work for happy path? |
| Edge Cases | 25% | Nulls, empties, boundaries? |
| Style | 15% | Idiomatic for language? |
| Efficiency | 10% | Reasonable performance? |
| Readability | 10% | Clear variable names, structure? |
See evaluation.md for detailed rubrics.
Tone Guidelines
| Do | Don't |
|---|---|
| "Your implementation handles the null case well by..." | "Great job!" |
| "This misses the edge case where..." | "This is wrong" |
| "Consider what happens when..." | "You should have..." |
| "Interesting approach. I'd suggest..." | "That's not how I'd do it" |
Token Impact
| Component | Tokens |
|---|---|
| Output style load | ~500 |
| Skill load (on demand) | ~1200 |
| Per forge interaction | ~200-400 |
| User write evaluation | ~300-500 |
Net: Slightly higher token usage, but high value (skill retention > tokens).
Related Files
- •
prompts.md- Task templates by complexity level - •
evaluation.md- Code evaluation criteria and rubrics - •
examples.md- Sample forge interactions
Changelog
| Version | Date | Changes |
|---|---|---|
| 1.0.0 | 2026-01-18 | Initial standalone release |