Skill Creator v2: Principle-Driven Skill Design

This skill guides you through creating skills that transfer expertise, not just list instructions.

The 4 Core Truths

Every skill you create must embody these principles:

Truth	What It Means	In Practice
Expertise Transfer, Not Instructions	Make Claude think like an expert, not follow steps	Teach mental models and decision frameworks, not checklists
Flow, Not Friction	Produce output, not intermediate documents	Go straight to deliverables - no "now write a plan" steps
Voice Matches Domain	Sound like a practitioner, not documentation	Use domain language naturally, avoid meta-commentary
Focused Beats Comprehensive	Constrain ruthlessly	Every section must justify its token cost

When to Use This Skill

Use this skill when:

•Creating a new skill from scratch
•A skill isn't performing well and needs rethinking
•You want to understand why certain patterns work
•You need to make tough trade-offs about what to include

Don't use this for:

•Quick iterations on working skills (use the standard skill-creator)
•Just packaging an existing skill (use packaging scripts directly)

The 10-Step Process

code

UNDERSTAND the problem
    ↓
EXPLORE Claude's failures  
    ↓
RESEARCH domain expertise
    ↓
SYNTHESIZE principles
    ↓
DRAFT the skill
    ↓
SELF-CRITIQUE rigorously
    ↓
ITERATE on feedback
    ↓
TEST on real scenarios
    ↓
FINALIZE structure
    ↓
PACKAGE for distribution

1. UNDERSTAND → What skill? What problem?

Start by crystallizing the core need:

•What specific capability gap exists? (Not "documentation" but "Claude rewrites the same 50-line parsing script every time")
•What does success look like? (Concrete examples of before/after)
•Who benefits and how? (Time saved? Quality improved? Consistency achieved?)

Good examples:

•"Engineers spend 20 minutes each time formatting API responses to match our schema"
•"Claude generates valid SQL but doesn't know our table relationships"
•"We need Claude to follow our 47-point brand guidelines without a wall of text"

Bad examples:

•"Make Claude better at X" (too vague)
•"Help with documents" (too broad)

Output: A crisp problem statement you could explain in 30 seconds.

2. EXPLORE → See where Claude fails without guidance

Critical step - don't skip this. You need to observe the failure mode, not imagine it.

Try the task without the skill:

•Give Claude a representative request
•Note where it struggles, hesitates, or produces suboptimal output
•Try 3-5 variations to see if it's consistent

Document:

•What did Claude do wrong?
•What knowledge was it missing?
•What did it waste time on?
•When did it ask for clarification vs. guess?

Example observations:

•"Claude generated working code but used pandas when our stack is polars"
•"Claude wrote a 200-line form instead of using our 10-line template"
•"Claude asked what format we wanted - it should know we always use ISO 8601"

Why this matters: You're designing for actual failure modes, not theoretical ones. This prevents over-specifying (wasting tokens) or under-specifying (skill doesn't help).

3. RESEARCH → Go deep on the domain

Now that you know what fails, understand why success looks like and what experts know.

For technical domains:

•What mental models do experts use?
•What are the key decision points?
•What patterns repeat across scenarios?
•What's stable vs. what varies?

For workflow domains:

•What's the expert's internal checklist?
•What do they check for quality?
•What shortcuts do they know?
•What mistakes do novices make?

Research sources:

•Interview domain experts
•Review high-quality examples
•Read practitioner documentation (not beginner tutorials)
•Analyze your own expert behavior

Output: A list of insights that would make Claude competent, not just capable.

4. SYNTHESIZE → Extract principles from research

Transform observations into teachable principles.

Pattern recognition:

•What do all good examples have in common?
•What varies based on context?
•What rules have exceptions? (Document both)
•What can be expressed as "if X then Y"?

Compression:

•Can 5 bullet points become 1 principle?
•Can 3 examples show a pattern?
•Can a decision tree replace prose?

Example synthesis:

code

Raw research:
- Example 1 uses indentation for hierarchy
- Example 2 uses bullet points for parallel items
- Example 3 uses numbered lists for sequences
- Example 4 combines all three appropriately

Synthesized principle:
"Match structure to meaning: indent for hierarchy, bullets for parallelism, numbers for sequence."

Output: Distilled principles that transfer expertise, not just information.

5. DRAFT → Write initial skill

Now you can draft. Structure using progressive disclosure:

code

skill-name/
├── SKILL.md          # Core workflow + principles (<500 lines)
│   ├── YAML frontmatter (name, description)
│   └── Essential instructions
├── references/       # Deep knowledge (loaded as needed)
│   ├── patterns.md
│   └── examples.md
├── scripts/          # Executable code
│   └── helper.py
└── assets/           # Templates, not documentation
    └── template.json

Frontmatter:

yaml

---
name: skill-name
description: >
  What the skill does + when to trigger it. Be specific about use cases.
  Good: "Process medical transcripts following HIPAA guidelines, including
  de-identification and structured output formatting."
  Bad: "Help with medical documents."
---

SKILL.md structure:

markdown

# Skill Name

[One paragraph: what problem this solves]

## Core Approach

[The mental model - how an expert thinks about this domain]

## When [Most Common Scenario]

[Direct instructions using imperative form]
[Include only essential decision points]
[Reference detailed guides in references/ as needed]

## When [Second Most Common Scenario]

[...]

## Quality Checks

[What good output looks like - help Claude self-evaluate]

Key drafting principles:

•Imperative mood: "Extract text" not "You should extract text"
•Example over explanation: Show one good example > describe in prose
•Decision points explicit: "If X then Y, otherwise Z"
•Offload detail: "See references/advanced.md" not "here's 500 words"
•Quality criteria: Help Claude know when it's done

6. SELF-CRITIQUE → Review against quality criteria

Now be ruthlessly critical. For each section, ask:

Expertise Transfer Test:

• Does this make Claude think like an expert or just act like one?
• Would a domain expert recognize this approach?
• Are we teaching patterns or prescribing steps?

Flow Test:

• Can Claude go straight to output?
• Do we force intermediate artifacts Claude doesn't need?
• Would an expert work this way?

Voice Test:

• Does this sound like domain documentation or AI instructions?
• Would a practitioner say "execute step 3" or just do it?
• Are we narrating process or enabling work?

Focus Test:

• Can we delete this section and still succeed? (If yes, delete it)
• Is this addressing observed failure modes? (If no, question it)
• Could this be a one-line reference instead?

Token Efficiency Test:

• Would this information surprise Claude? (If no, cut it)
• Is this stable knowledge vs. variable context?
• Should this be in SKILL.md or references/?

Progressive Disclosure Test:

• Do we force-load information Claude might not need?
• Can variations go in separate reference files?
• Are scripts documented or just executable?

Red flags:

•Phrases like "you should", "make sure to", "don't forget"
•Apologetic language: "This might seem complex but..."
•Meta-commentary: "The next step is to..."
•Over-specification: Dictating every detail when heuristics suffice
•Under-specification: Vague guidance on fragile operations

7. ITERATE → Fix gaps, get feedback, improve

Testing approaches:

•Yourself: Use the skill on real scenarios - note friction
•Others: Have someone else try it - watch where they struggle
•Claude: Have Claude use it - monitor for confusion or errors

Feedback loop:

code

Use skill → Note issue → Hypothesis on why → Update skill → Test again

Common improvements:

•Too much guidance: Claude over-thinks → Trust Claude more, delete text
•Too little guidance: Claude guesses wrong → Add decision framework
•Wrong abstraction: Scenarios don't map cleanly → Reorganize around real patterns
•Missing context: Claude lacks key info → Move knowledge from your head to references/

Iteration triggers:

•Claude asks questions that the skill should answer
•Output quality varies unexpectedly
•Claude ignores parts of the skill
•You find yourself adding ad-hoc instructions in chat

8. TEST → Use skill on a real scenario

Final validation with a realistic task:

•Pick a task that's representative but not used during design
•Use the skill as if you're a new user
•
Document:
- •Did it work on first try?
- •Where did Claude hesitate?
- •What did you need to clarify?
- •Was anything missing?
- •Was anything unused?

Success criteria:

•Claude produces quality output without ad-hoc guidance
•The skill handles expected variations naturally
•Token usage is reasonable (<2000 tokens loaded typically)
•Claude doesn't ask questions the skill should answer

9. FINALIZE → Codify into optimal structure

Polish for production:

SKILL.md:

•Remove TODOs and draft artifacts
•Verify all references exist and are linked
•Ensure imperative mood throughout
•Trim any remaining cruft

References:

•Organize by use case (not by type)
•Add grep-able patterns for large files
•Include only what Claude might need

Scripts:

•Test thoroughly
•Include docstrings (Claude might read them)
•Consider: should this be a reference instead?

Assets:

•Only include files that are used in output
•Remove examples/samples unless they're templates

Validation:

bash

# Use the packaging script - it validates automatically
scripts/package_skill.py /path/to/skill-folder

10. PACKAGE → Share the skill

Once validated:

bash

# Creates skill-name.skill file
scripts/package_skill.py /path/to/skill-folder /output/directory

The .skill file is a zip with a .skill extension containing your complete skill structure.

Common Skill Patterns

High-Level Guide with Deep References

When: Complex domain with many variations

SKILL.md:

markdown

## Core workflow

[Essential steps + decision points]

For detailed guidance:
- **Pattern library:** See references/patterns.md
- **Full examples:** See references/examples.md  
- **API reference:** See references/api.md

Benefit: SKILL.md stays focused; Claude loads depth only when needed

Script-Heavy with Minimal Instructions

When: Fragile operations requiring exact execution

SKILL.md:

markdown

## Rotating PDFs

Use scripts/rotate_pdf.py:

\`\`\`bash
python scripts/rotate_pdf.py input.pdf --angle 90 --output rotated.pdf
\`\`\`

The script handles edge cases and validation.

Benefit: Deterministic, token-efficient, no "reinventing the wheel"

Template-Based with Examples

When: Output follows a strict format

SKILL.md:

markdown

## Report format

ALWAYS use this structure:

# [Title]

## Executive Summary
[One paragraph]

## Key Findings  
- Finding 1 [with data]
- Finding 2 [with data]

## Recommendations
1. Specific action
2. Specific action

Benefit: Claude produces consistent output without guessing

Anti-Patterns to Avoid

Anti-Pattern	Why It's Bad	Fix
Novel-length SKILL.md	Wastes tokens, hard to maintain	Split into references/
Step-by-step recipes	Makes Claude mechanical, not thoughtful	Teach principles
Generic advice	Claude already knows this	Only include novel info
Assuming incompetence	Over-explains, wastes tokens	Trust Claude's base knowledge
README/CHANGELOG/etc	AI doesn't need meta-documentation	Delete these files
Loading everything upfront	Wastes context	Use progressive disclosure
Vague triggering	Skill loaded when not needed	Be specific in description

Degrees of Freedom Framework

Match instruction specificity to task fragility:

High Freedom (General Guidance)

•Use when: Multiple valid approaches exist
•Format: Principles + examples
•Example: "Write engaging product copy"

Medium Freedom (Preferred Patterns)

•Use when: Best practices exist but context varies
•Format: Decision framework + examples
•Example: "Structure SQL queries for readability"

Low Freedom (Exact Execution)

•Use when: Operations are fragile or compliance-critical
•Format: Scripts or strict templates
•Example: "Fill IRS tax forms"

Working with Existing Skills

To improve a skill:

•Use it on real tasks - note where it fails
•Check: Is this a skill problem or wrong use case?
•Run through EXPLORE phase again
•Apply targeted fixes (resist full rewrites)
•Test that fixes don't break existing use cases

To merge skills:

Only if they share 80%+ overlap. Otherwise keep separate - Claude can use multiple skills.

To split a skill:

When SKILL.md exceeds 500 lines or covers truly distinct workflows. Split at natural boundaries, update descriptions.

Quick Reference: Skill Creation Checklist

References

For detailed patterns:

•Workflow patterns: See the standard skill-creator's references/workflows.md
•Output patterns: See the standard skill-creator's references/output-patterns.md

Tools

Use the standard skill-creator scripts:

•scripts/init_skill.py - Initialize skill structure
•scripts/package_skill.py - Validate and package
•scripts/quick_validate.py - Check structure only

Final Wisdom

The best skill is the one that disappears. When Claude uses it, it should feel like Claude "just knows" how to do the task - not like it's following a manual.

If your skill makes Claude sound like it's reading instructions, you've created friction. If Claude sounds like a domain expert who happens to have expertise in this area, you've transferred expertise.

That's the difference.