Review-Multi
Overview
review-multi provides a systematic framework for conducting comprehensive, multi-dimensional reviews of Claude Code skills. It evaluates skills across 5 independent dimensions, combining automated validation with manual assessment to deliver objective quality scores and actionable improvement recommendations.
Purpose: Systematic skill quality assurance through multi-dimensional assessment
The 5 Review Dimensions:
- •Structure Review - YAML frontmatter, file organization, naming conventions, progressive disclosure
- •Content Review - Section completeness, clarity, examples, documentation quality
- •Quality Review - Pattern compliance, best practices, anti-pattern detection, code quality
- •Usability Review - Ease of use, learnability, real-world effectiveness, user satisfaction
- •Integration Review - Dependency documentation, data flow, component integration, composition
Automation Levels:
- •Structure: 95% automated (validate-structure.py)
- •Content: 40% automated, 60% manual assessment
- •Quality: 50% automated, 50% manual assessment
- •Usability: 10% automated, 90% manual testing
- •Integration: 30% automated, 70% manual review
Scoring System:
- •Scale: 1-5 per dimension (Excellent/Good/Acceptable/Needs Work/Poor)
- •Overall Score: Weighted average across dimensions
- •Grade: A/B/C/D/F mapping
- •Production Readiness: ≥4.5 ready, 4.0-4.4 ready with improvements, 3.5-3.9 needs work, <3.5 not ready
Value Proposition:
- •Objective: Evidence-based scoring using detailed rubrics (not subjective opinion)
- •Comprehensive: 5 dimensions cover all quality aspects
- •Efficient: Automation handles 30-95% of checks depending on dimension
- •Actionable: Specific, prioritized improvement recommendations
- •Consistent: Standardized checklists ensure repeatable results
- •Flexible: 3 review modes (Comprehensive, Fast Check, Custom)
Key Benefits:
- •Catch 70% of issues with fast automated checks
- •Reduce common quality issues by 30% using checklists
- •Ensure production readiness before deployment
- •Identify improvement opportunities systematically
- •Track quality improvements over time
- •Establish quality standards across skill ecosystem
When to Use
Use review-multi when:
- •
Pre-Production Validation - Review new skills before deploying to production to catch issues early and ensure quality standards
- •
Quality Assurance - Conduct systematic QA on skills to validate they meet ecosystem standards and user needs
- •
Identifying Improvements - Discover specific, actionable improvements for existing skills through multi-dimensional assessment
- •
Continuous Improvement - Regular reviews throughout development lifecycle, not just at end, to maintain quality
- •
Production Readiness Assessment - Determine if skill is ready for production use with objective scoring and grade mapping
- •
Skill Ecosystem Standards - Ensure consistency and quality across multiple skills using standardized review framework
- •
Post-Update Validation - Review skills after major updates to ensure changes don't introduce issues or degrade quality
- •
Learning and Improvement - Use review findings to learn patterns, improve future skills, and refine development practices
- •
Team Calibration - Standardize quality assessment across multiple reviewers with objective rubrics
Don't Use When:
- •Quick syntax checks (use validate-structure.py directly)
- •In-progress drafts (wait until reasonably complete)
- •Experimental prototypes (not production-bound)
Prerequisites
Required:
- •Skill to review (in
.claude/skills/[skill-name]/format) - •Time allocation based on review mode:
- •Fast Check: 5-10 minutes
- •Single Operation: 15-60 minutes (varies by dimension)
- •Comprehensive Review: 1.5-2.5 hours
Optional:
- •Python 3.7+ (for automation scripts in Structure and Quality reviews)
- •PyYAML library (for YAML frontmatter validation)
- •Access to skill-under-review documentation
- •Familiarity with Claude Code skill patterns (see
development-workflow/references/common-patterns.md)
Skills (no required dependencies, complementary):
- •development-workflow: Use review-multi after skill development
- •skill-updater: Apply review-multi recommendations
- •testing-validator: Combine with review-multi for full QA
Scoring System
The review-multi scoring system provides objective, consistent quality assessment across all skill dimensions.
Per-Dimension Scoring (1-5 Scale)
Each dimension is scored independently using a 1-5 integer scale:
5 - Excellent (Exceeds Standards)
- •All criteria met perfectly
- •Goes beyond minimum requirements
- •Exemplary quality that sets the bar
- •No issues or concerns identified
- •Can serve as example for others
4 - Good (Meets Standards)
- •Meets all critical criteria
- •1-2 minor, non-critical issues
- •Production-ready quality
- •Standard expected level
- •Small improvements possible
3 - Acceptable (Minor Improvements Needed)
- •Meets most criteria
- •3-4 issues, some may be critical
- •Usable but not optimal
- •Several improvements recommended
- •Can proceed with noted concerns
2 - Needs Work (Notable Issues)
- •Missing several criteria
- •5-6 issues, multiple critical
- •Not production-ready
- •Significant improvements required
- •Rework needed before deployment
1 - Poor (Significant Problems)
- •Fails most criteria
- •7+ issues, fundamentally flawed
- •Major quality concerns
- •Extensive rework required
- •Not viable in current state
Overall Score Calculation
The overall score is a weighted average of the 5 dimension scores:
Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
(Usability × 0.15) + (Integration × 0.15)
Weight Rationale:
- •Content & Quality (25% each): Core skill value - what it does and how well
- •Structure (20%): Important foundation - organization and compliance
- •Usability & Integration (15% each): Supporting factors - user experience and composition
Example Calculations:
- •Scores (5, 4, 4, 3, 4) → Overall = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → Grade B
- •Scores (4, 5, 5, 4, 4) → Overall = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → Grade A
- •Scores (3, 3, 2, 3, 3) → Overall = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → Grade C
Grade Mapping
Overall scores map to letter grades:
- •A (4.5-5.0): Excellent - Production ready, high quality
- •B (3.5-4.4): Good - Ready with minor improvements
- •C (2.5-3.4): Acceptable - Needs improvements before production
- •D (1.5-2.4): Poor - Requires significant rework
- •F (1.0-1.4): Failing - Major issues, not viable
Production Readiness Assessment
Based on overall score:
- •≥4.5 (Grade A): ✅ Production Ready - High quality, deploy with confidence
- •4.0-4.4 (Grade B+): ✅ Ready with Minor Improvements - Can deploy, address improvements in next iteration
- •3.5-3.9 (Grade B-): ⚠️ Needs Improvements - Address issues before production deployment
- •<3.5 (Grade C-F): ❌ Not Ready - Significant rework required before deployment
Decision Framework:
- •A Grade: Ship it - exemplary quality
- •B Grade (4.0+): Ship it - standard quality, note improvements for future
- •B- Grade (3.5-3.9): Hold - fix identified issues first
- •C-F Grade: Don't ship - substantial work needed
Operations
Operation 1: Structure Review
Purpose: Validate file organization, naming conventions, YAML frontmatter compliance, and progressive disclosure
When to Use This Operation:
- •Always run first (fast automated check catches 70% of issues)
- •Before comprehensive review (quick validation of basics)
- •During development (continuous structure validation)
- •Quick quality checks (5-10 minute validation)
Automation Level: 95% automated via scripts/validate-structure.py
Process:
- •
Run Structure Validation Script
bashpython3 scripts/validate-structure.py /path/to/skill [--json] [--verbose]
Script checks YAML, file structure, naming, progressive disclosure
- •
Review YAML Frontmatter
- •Verify name field in kebab-case format
- •Check description has 5+ trigger keywords naturally embedded
- •Validate YAML syntax is correct
- •
Verify File Structure
- •Confirm SKILL.md exists
- •Check references/ and scripts/ organization (if present)
- •Verify README.md exists
- •
Check Naming Conventions
- •SKILL.md and README.md uppercase
- •references/ files: lowercase-hyphen-case
- •scripts/ files: lowercase-hyphen-case with extension
- •
Validate Progressive Disclosure
- •SKILL.md <1,500 lines (warn if >1,200)
- •references/ files 300-800 lines each
- •No monolithic files
Validation Checklist:
- • YAML frontmatter present and valid syntax
- •
namefield in kebab-case format (e.g., skill-name) - •
descriptionincludes 5+ trigger keywords (naturally embedded) - • SKILL.md file exists
- • File naming follows conventions (SKILL.md uppercase, references lowercase-hyphen)
- • Directory structure correct (references/, scripts/ if present)
- • SKILL.md size appropriate (<1,500 lines, ideally <1,200)
- • References organized by topic (if present)
- • No monolithic files (progressive disclosure maintained)
- • README.md present
Scoring Criteria:
- •5 - Excellent: All 10 checks pass, perfect compliance, exemplary structure
- •4 - Good: 8-9 checks pass, 1-2 minor non-critical issues (e.g., README missing but optional)
- •3 - Acceptable: 6-7 checks pass, 3-4 issues including some critical (e.g., YAML invalid but fixable)
- •2 - Needs Work: 4-5 checks pass, 5-6 issues with multiple critical (e.g., no SKILL.md, bad naming)
- •1 - Poor: ≤3 checks pass, 7+ issues, fundamentally flawed structure
Outputs:
- •Structure score (1-5)
- •Pass/fail status for each checklist item
- •List of issues found with severity (critical/warning/info)
- •Specific improvement recommendations with fix guidance
- •JSON report (if using script with --json flag)
Time Estimate: 5-10 minutes (mostly automated)
Example:
$ python3 scripts/validate-structure.py .claude/skills/todo-management Structure Validation Report =========================== Skill: todo-management Date: 2025-11-06 ✅ YAML Frontmatter: PASS - Name format: valid (kebab-case) - Trigger keywords: 8 found (target: 5+) ✅ File Structure: PASS - SKILL.md: exists - README.md: exists - references/: 3 files found - scripts/: 1 file found ✅ Naming Conventions: PASS - All files follow conventions ⚠️ Progressive Disclosure: WARNING - SKILL.md: 569 lines (good) - state-management-guide.md: 501 lines (good) - BUT: No Quick Reference section detected Overall Structure Score: 4/5 (Good) Issues: 1 warning (missing Quick Reference) Recommendation: Add Quick Reference section to SKILL.md
Operation 2: Content Review
Purpose: Assess section completeness, content clarity, example quality, and documentation comprehensiveness
When to Use This Operation:
- •Evaluate documentation quality
- •Assess completeness of skill content
- •Review example quality and quantity
- •Validate information architecture
- •Check clarity and organization
Automation Level: 40% automated (section detection, example counting), 60% manual assessment
Process:
- •
Check Section Completeness (automated + manual)
- •Verify 5 core sections present: Overview, When to Use, Main Content (workflow/operations), Best Practices, Quick Reference
- •Check optional sections: Prerequisites, Common Mistakes, Troubleshooting
- •Assess if all necessary sections included
- •
Assess Content Clarity (manual)
- •Is content understandable?
- •Is organization logical?
- •Are explanations clear without being verbose?
- •Is technical level appropriate for audience?
- •
Evaluate Example Quality (automated count + manual quality)
- •Count code/command examples (target: 5+)
- •Check if examples are concrete (not abstract placeholders)
- •Verify examples are executable/copy-pasteable
- •Assess if examples help understanding
- •
Review Documentation Completeness (manual)
- •Is all necessary information present?
- •Are there unexplained gaps?
- •Is sufficient detail provided?
- •Are edge cases covered?
- •
Check Explanation Depth (manual)
- •Not too brief (insufficient detail)?
- •Not too verbose (unnecessary length)?
- •Balanced depth for complexity?
Validation Checklist:
- • Overview/Introduction section present
- • When to Use section present with 5+ scenarios
- • Main content (workflow steps OR operations OR reference material) complete
- • Best Practices section present
- • Quick Reference section present
- • 5+ code/command examples included
- • Examples are concrete (not abstract placeholders like "YOUR_VALUE_HERE")
- • Content clarity: readable and well-structured
- • Sufficient detail: not too brief
- • Not too verbose: concise without unnecessary length
Scoring Criteria:
- •5 - Excellent: All 10 checks pass, exceptional clarity, great examples, comprehensive documentation
- •4 - Good: 8-9 checks pass, good content with minor gaps or clarity issues
- •3 - Acceptable: 6-7 checks pass, some sections weak or missing, acceptable clarity
- •2 - Needs Work: 4-5 checks pass, multiple sections incomplete/unclear, poor examples
- •1 - Poor: ≤3 checks pass, major gaps, confusing content, few/no examples
Outputs:
- •Content score (1-5)
- •Section-by-section assessment (present/missing/weak)
- •Example quality rating and count
- •Specific content improvement recommendations
- •Clarity issues identified with examples
Time Estimate: 15-30 minutes (requires manual review)
Example:
Content Review: prompt-builder ============================== Section Completeness: 9/10 ✅ ✅ Overview: Present, clear explanation of purpose ✅ When to Use: 7 scenarios listed ✅ Main Content: 5-step workflow, well-organized ✅ Best Practices: 6 practices documented ✅ Quick Reference: Present ⚠️ Common Mistakes: Not present (optional but valuable) Example Quality: 8/10 ✅ - Count: 12 examples (exceeds target of 5+) - Concrete: Yes, all examples executable - Helpful: Yes, demonstrate key concepts - Minor: Could use 1-2 edge case examples Content Clarity: 9/10 ✅ - Well-organized logical flow - Clear explanations without verbosity - Technical level appropriate - Minor: Step 3 could be clearer (add diagram) Documentation Completeness: 8/10 ✅ - All workflow steps documented - Validation criteria clear - Minor gaps: Error handling not covered Content Score: 4/5 (Good) Primary Recommendation: Add Common Mistakes section Secondary: Add error handling guidance to Step 3
Operation 3: Quality Review
Purpose: Evaluate pattern compliance, best practices adherence, anti-pattern detection, and code/script quality
When to Use This Operation:
- •Validate standards compliance
- •Check pattern implementation
- •Detect anti-patterns
- •Assess code quality (if scripts present)
- •Ensure best practices followed
Automation Level: 50% automated (pattern detection, anti-pattern checking), 50% manual assessment
Process:
- •
Detect Architecture Pattern (automated + manual)
- •Identify pattern type: workflow/task/reference/capabilities
- •Verify pattern correctly implemented
- •Check pattern consistency throughout skill
- •
Validate Documentation Patterns (automated + manual)
- •Verify 5 core sections present
- •Check consistent structure across steps/operations
- •Validate section formatting
- •
Check Best Practices (manual)
- •Validation checklists present and specific?
- •Examples throughout documentation?
- •Quick Reference available?
- •Error cases considered?
- •
Detect Anti-Patterns (automated + manual)
- •Keyword stuffing (trigger keywords unnatural)?
- •Monolithic SKILL.md (>1,500 lines, no progressive disclosure)?
- •Inconsistent structure (each section different format)?
- •Vague validation ("everything works")?
- •Missing examples (too abstract)?
- •Placeholders in production ("YOUR_VALUE_HERE")?
- •Ignoring error cases (only happy path)?
- •Over-engineering simple skills?
- •Unclear dependencies?
- •No Quick Reference?
- •
Assess Code Quality (manual, if scripts present)
- •Scripts well-documented (docstrings)?
- •Error handling present?
- •CLI interfaces clear?
- •Code style consistent?
Validation Checklist:
- • Architecture pattern correctly implemented (workflow/task/reference/capabilities)
- • Consistent structure across steps/operations (same format throughout)
- • Validation checklists present and specific (measurable, not vague)
- • Best practices section actionable (specific guidance)
- • No keyword stuffing (trigger keywords natural, contextual)
- • No monolithic SKILL.md (progressive disclosure used if >1,000 lines)
- • Examples are complete (no "YOUR_VALUE_HERE" placeholders in production)
- • Error cases considered (not just happy path documented)
- • Dependencies documented (if skill requires other skills)
- • Scripts well-documented (if present: docstrings, error handling, CLI help)
Scoring Criteria:
- •5 - Excellent: All 10 checks pass, exemplary quality, no anti-patterns, exceeds standards
- •4 - Good: 8-9 checks pass, high quality, meets all standards, minor deviations
- •3 - Acceptable: 6-7 checks pass, acceptable quality, some standard violations, 2-3 anti-patterns
- •2 - Needs Work: 4-5 checks pass, quality issues, multiple standard violations, 4-5 anti-patterns
- •1 - Poor: ≤3 checks pass, poor quality, significant problems, 6+ anti-patterns detected
Outputs:
- •Quality score (1-5)
- •Pattern compliance assessment (pattern detected, compliance level)
- •Anti-patterns detected (list with severity)
- •Best practices gaps identified
- •Code quality assessment (if scripts present)
- •Prioritized improvement recommendations
Time Estimate: 20-40 minutes (mixed automated + manual)
Example:
Quality Review: workflow-skill-creator ====================================== Pattern Compliance: ✅ - Pattern Detected: Workflow-based - Implementation: Correct (5 sequential steps with dependencies) - Consistency: High (all steps follow same structure) Documentation Patterns: ✅ - 5 Core Sections: All present - Structure: Consistent across all 5 steps - Formatting: Proper heading levels Best Practices Adherence: 8/10 ✅ ✅ Validation checklists: Present and specific ✅ Examples throughout: 6 examples included ✅ Quick Reference: Present ⚠️ Error handling: Limited (only happy path in examples) Anti-Pattern Detection: 1 detected ⚠️ ✅ No keyword stuffing (15 natural keywords) ✅ No monolithic file (1,465 lines but has references/) ✅ Consistent structure ✅ Specific validation criteria ✅ Examples complete (no placeholders) ⚠️ Error cases: Only happy path documented ✅ Dependencies: Clearly documented ✅ Not over-engineered Code Quality: N/A (no scripts) Quality Score: 4/5 (Good) Primary Issue: Limited error handling documentation Recommendation: Add error case examples and recovery guidance
Operation 4: Usability Review
Purpose: Evaluate ease of use, learnability, real-world effectiveness, and user satisfaction through scenario testing
When to Use This Operation:
- •Test real-world usage
- •Assess user experience
- •Evaluate learnability
- •Measure effectiveness
- •Validate skill achieves stated purpose
Automation Level: 10% automated (basic checks), 90% manual testing
Process:
- •
Test in Real-World Scenario
- •Select appropriate use case from "When to Use" section
- •Actually use the skill to complete task
- •Document experience: smooth or friction?
- •Note any confusion or difficulty
- •
Assess Navigation/Findability
- •Can you find needed information easily?
- •Is information architecture logical?
- •Are sections well-organized?
- •Is Quick Reference helpful?
- •
Evaluate Clarity
- •Are instructions clear and actionable?
- •Are steps easy to follow?
- •Do examples help understanding?
- •Is technical terminology explained?
- •
Measure Effectiveness
- •Does skill achieve stated purpose?
- •Does it deliver promised value?
- •Are outputs useful and complete?
- •Would you use it again?
- •
Assess Learning Curve
- •How long to understand skill?
- •How long to use effectively?
- •Is learning curve reasonable for complexity?
- •Are first-time users supported well?
Validation Checklist:
- • Skill tested in real-world scenario (actual usage, not just reading)
- • Users can find information easily (navigation clear, sections logical)
- • Instructions are clear and actionable (can follow without confusion)
- • Examples help understanding (concrete, demonstrate key concepts)
- • Skill achieves stated purpose (delivers promised value)
- • Learning curve reasonable (appropriate for skill complexity)
- • Error messages helpful (if applicable: clear, actionable guidance)
- • Overall user satisfaction high (would use again, recommend to others)
Scoring Criteria:
- •5 - Excellent: All 8 checks pass, excellent usability, easy to learn, highly effective, very satisfying
- •4 - Good: 6-7 checks pass, good usability, minor friction points, generally effective
- •3 - Acceptable: 4-5 checks pass, acceptable usability, some confusion/difficulty, moderately effective
- •2 - Needs Work: 2-3 checks pass, usability issues, frustrating or confusing, limited effectiveness
- •1 - Poor: ≤1 check passes, poor usability, hard to use, ineffective, unsatisfying
Outputs:
- •Usability score (1-5)
- •Scenario test results (success/partial/failure)
- •User experience assessment (smooth/acceptable/frustrating)
- •Specific usability improvements identified
- •Learning curve assessment
- •Effectiveness rating
Time Estimate: 30-60 minutes (requires actual testing)
Example:
Usability Review: skill-researcher ================================== Real-World Scenario Test: ✅ - Scenario: Research GitHub API integration patterns - Result: SUCCESS - Found 5 relevant sources, synthesized findings - Experience: Smooth, operations clearly explained - Time: 45 minutes (expected 60 min range) Navigation/Findability: 9/10 ✅ - Information easy to find - 5 operations clearly separated - Quick Reference table very helpful - Minor: Could use table of contents for long doc Instruction Clarity: 9/10 ✅ - Steps clear and actionable - Process well-explained - Examples demonstrate concepts - Minor: Web search query formulation could be clearer Effectiveness: 10/10 ✅ - Achieved purpose: Found patterns and synthesized - Delivered value: Comprehensive research in 45 min - Would use again: Yes, very helpful Learning Curve: 8/10 ✅ - Time to understand: 10 minutes - Time to use effectively: 15 minutes - Reasonable for complexity - First-time user: Some concepts need explanation (credibility scoring) Error Handling: N/A (no errors encountered) User Satisfaction: 9/10 ✅ - Would use again: Yes - Would recommend: Yes - Overall experience: Very positive Usability Score: 5/5 (Excellent) Minor Improvement: Add brief explanation of credibility scoring concept
Operation 5: Integration Review
Purpose: Assess dependency documentation, data flow clarity, component integration, and composition patterns
When to Use This Operation:
- •Review workflow skills (that compose other skills)
- •Validate dependency documentation
- •Check integration clarity
- •Assess composition patterns
- •Verify cross-references valid
Automation Level: 30% automated (dependency checking, cross-reference validation), 70% manual assessment
Process:
- •
Review Dependency Documentation (manual)
- •Are required skills documented?
- •Are optional/complementary skills mentioned?
- •Is YAML
dependenciesfield used (if applicable)? - •Are dependency versions noted (if relevant)?
- •
Assess Data Flow Clarity (manual, for workflow skills)
- •Is data flow between skills explained?
- •Are inputs/outputs documented for each step?
- •Do users understand how data moves?
- •Are there diagrams or flowcharts (if helpful)?
- •
Evaluate Component Integration (manual)
- •How do component skills work together?
- •Are integration points clear?
- •Are there integration examples?
- •Is composition pattern documented?
- •
Verify Cross-References (automated + manual)
- •Do internal links work (references to references/, scripts/)?
- •Are external skill references correct?
- •Are complementary skills mentioned?
- •
Check Composition Patterns (manual, for workflow skills)
- •Is composition pattern identified (sequential/parallel/conditional/etc.)?
- •Is pattern correctly implemented?
- •Are orchestration details provided?
Validation Checklist:
- • Dependencies documented (if skill requires other skills)
- • YAML
dependenciesfield correct (if used) - • Data flow explained (for workflow skills: inputs/outputs clear)
- • Integration points clear (how component skills connect)
- • Component skills referenced correctly (names accurate, paths valid)
- • Cross-references valid (internal links work, external references correct)
- • Integration examples provided (if applicable: how to use together)
- • Composition pattern documented (if workflow: sequential/parallel/etc.)
- • Complementary skills mentioned (optional but valuable related skills)
Scoring Criteria:
- •5 - Excellent: All 9 checks pass (applicable ones), perfect integration documentation
- •4 - Good: 7-8 checks pass, good integration, minor gaps in documentation
- •3 - Acceptable: 5-6 checks pass, some integration unclear, missing details
- •2 - Needs Work: 3-4 checks pass, integration issues, poorly documented dependencies/flow
- •1 - Poor: ≤2 checks pass, poor integration, confusing or missing dependency documentation
Outputs:
- •Integration score (1-5)
- •Dependency validation results (required/optional/complementary documented)
- •Data flow clarity assessment (for workflow skills)
- •Integration clarity rating
- •Cross-reference validation results
- •Improvement recommendations
Time Estimate: 15-25 minutes (mostly manual)
Example:
Integration Review: development-workflow ======================================== Dependency Documentation: 10/10 ✅ - Required Skills: None (workflow is standalone) - Component Skills: 5 clearly documented (skill-researcher, planning-architect, task-development, prompt-builder, todo-management) - Optional Skills: 3 complementary skills mentioned (review-multi, skill-updater, testing-validator) - YAML Field: Not used (not required, skills referenced in content) Data Flow Clarity: 10/10 ✅ (Workflow Skill) - Data flow diagram present (skill → output → next skill) - Inputs/outputs for each step documented - Users understand how artifacts flow - Example:
skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development
Component Integration: 10/10 ✅ - Integration method documented for each step (Guided Execution) - Integration examples provided - Clear explanation of how skills work together - Process for using each component skill detailed Cross-Reference Validation: ✅ - Internal links valid (references/ files exist and reachable) - External skill references correct (all 5 component skills exist) - Complementary skills mentioned appropriately Composition Pattern: 10/10 ✅ (Workflow Skill) - Pattern: Sequential Pipeline (with one optional step) - Correctly implemented (Step 1 → 2 → [3 optional] → 4 → 5) - Orchestration details provided - Clear flow diagram Integration Score: 5/5 (Excellent) Notes: Exemplary integration documentation for workflow skill
Review Modes
Comprehensive Review Mode
Purpose: Complete multi-dimensional assessment across all 5 dimensions with aggregate scoring
When to Use:
- •Pre-production validation (ensure skill ready for deployment)
- •Major skill updates (validate changes don't degrade quality)
- •Quality certification (establish baseline quality score)
- •Periodic quality audits (track quality over time)
Process:
- •
Run All 5 Operations Sequentially
- •Operation 1: Structure Review (5-10 min, automated)
- •Operation 2: Content Review (15-30 min, manual)
- •Operation 3: Quality Review (20-40 min, mixed)
- •Operation 4: Usability Review (30-60 min, manual)
- •Operation 5: Integration Review (15-25 min, manual)
- •
Aggregate Scores
- •Record score (1-5) for each dimension
- •Calculate weighted overall score using formula
- •Map overall score to grade (A/B/C/D/F)
- •
Assess Production Readiness
- •≥4.5: Production Ready
- •4.0-4.4: Ready with minor improvements
- •3.5-3.9: Needs improvements before production
- •<3.5: Not ready, significant rework required
- •
Compile Improvement Recommendations
- •Aggregate issues from all dimensions
- •Prioritize: Critical → High → Medium → Low
- •Provide specific, actionable fixes
- •
Generate Comprehensive Report
- •Executive summary (overall score, grade, readiness)
- •Per-dimension scores and findings
- •Prioritized improvement list
- •Detailed rationale for scores
Output:
- •Overall score (1.0-5.0 with one decimal)
- •Grade (A/B/C/D/F)
- •Production readiness assessment
- •Per-dimension scores (Structure, Content, Quality, Usability, Integration)
- •Comprehensive improvement recommendations (prioritized)
- •Detailed review report
Time Estimate: 1.5-2.5 hours total
Example Output:
Comprehensive Review Report: skill-researcher ============================================= OVERALL SCORE: 4.6/5.0 - GRADE A STATUS: ✅ PRODUCTION READY Dimension Scores: - Structure: 5/5 (Excellent) - Perfect file organization - Content: 5/5 (Excellent) - Comprehensive, clear documentation - Quality: 4/5 (Good) - High quality, minor error handling gaps - Usability: 5/5 (Excellent) - Easy to use, highly effective - Integration: 4/5 (Good) - Well-documented dependencies Production Readiness: READY - High quality, deploy with confidence Recommendations (Priority Order): 1. [Medium] Add error handling examples for web search failures 2. [Low] Consider adding table of contents for long SKILL.md Strengths: - Excellent structure and organization - Comprehensive coverage of 5 research operations - Strong usability with clear instructions - Good examples throughout Overall: Exemplary skill, production-ready quality
Fast Check Mode
Purpose: Quick automated validation for rapid quality feedback during development
When to Use:
- •During development (continuous validation)
- •Quick quality checks (before detailed review)
- •Pre-commit validation (catch issues early)
- •Rapid iteration (fast feedback loop)
Process:
- •
Run Automated Structure Validation
bashpython3 scripts/validate-structure.py /path/to/skill
- •
Check Critical Issues
- •YAML frontmatter valid?
- •Required files present?
- •Naming conventions followed?
- •File sizes appropriate?
- •
Generate Pass/Fail Report
- •PASS: Critical checks passed, proceed to development
- •FAIL: Critical issues found, fix before continuing
- •
Provide Quick Fixes (if available)
- •Specific commands to fix issues
- •Examples of correct format
- •References to documentation
Output:
- •Pass/Fail status
- •Critical issues list (if failed)
- •Quick fixes or guidance
- •Score estimate (if passed)
Time Estimate: 5-10 minutes
Example Output:
$ python3 scripts/validate-structure.py .claude/skills/my-skill Fast Check Report ================= Skill: my-skill ❌ FAIL - Critical Issues Found Critical Issues: 1. YAML frontmatter: Invalid syntax (line 3: unexpected character) 2. Naming convention: File "MyGuide.md" should be "my-guide.md" Quick Fixes: 1. Fix YAML: Remove trailing comma on line 3 2. Rename file: mv references/MyGuide.md references/my-guide.md Run full validation after fixes: python3 scripts/validate-structure.py .claude/skills/my-skill
Custom Review
Purpose: Flexible review focusing on specific dimensions or concerns
When to Use:
- •Targeted improvements (focus on specific dimension)
- •Time constraints (can't do comprehensive review)
- •Specific concerns (e.g., only check usability)
- •Iterative improvements (focus on one dimension at a time)
Options:
- •Select Dimensions: Choose 1-5 operations to run
- •Adjust Thoroughness: Quick/Standard/Thorough per dimension
- •Focus Areas: Specify particular concerns (e.g., "check examples quality")
Process:
- •
Define Custom Review Scope
- •Which dimensions to review?
- •How thorough for each?
- •Any specific focus areas?
- •
Run Selected Operations
- •Execute chosen operations
- •Apply thoroughness level
- •
Generate Targeted Report
- •Scores for selected dimensions only
- •Focused findings
- •Specific recommendations
Example Scenarios:
Scenario 1: Content-Focused Review
Custom Review: Content + Examples - Operations: Content Review only - Thoroughness: Thorough - Focus: Example quality and completeness - Time: 30 minutes
Scenario 2: Quick Quality Check
Custom Review: Structure + Quality (Fast) - Operations: Structure + Quality - Thoroughness: Quick - Focus: Pattern compliance, anti-patterns - Time: 15-20 minutes
Scenario 3: Workflow Integration Review
Custom Review: Integration Deep Dive - Operations: Integration Review only - Thoroughness: Thorough - Focus: Data flow, composition patterns - Time: 30 minutes
Best Practices
1. Self-Review First
Practice: Run Fast Check mode before requesting comprehensive review
Rationale: Automated checks catch 70% of structural issues in 5-10 minutes, allowing manual review to focus on higher-value assessment
Application: Always run validate-structure.py before detailed review
2. Use Checklists Systematically
Practice: Follow validation checklists item-by-item for each operation
Rationale: Research shows teams using checklists reduce common issues by 30% and ensure consistent results
Application: Print or display checklist, mark each item explicitly
3. Test in Real Scenarios
Practice: Conduct usability review with actual usage, not just documentation reading
Rationale: Real-world testing reveals hidden usability issues that documentation review misses
Application: For Usability Review, actually use the skill to complete a realistic task
4. Focus on Automation
Practice: Let scripts handle routine checks, focus manual effort on judgment-requiring assessment
Rationale: Automation provides 70% reduction in manual review time for routine checks
Application: Use scripts for Structure and partial Quality checks, manual for Content/Usability
5. Provide Actionable Feedback
Practice: Make improvement recommendations specific, prioritized, and actionable
Rationale: Vague feedback ("improve quality") is less valuable than specific guidance ("add error handling examples to Step 3")
Application: For each issue, specify: What, Why, How (to fix), Priority
6. Review Regularly
Practice: Conduct reviews throughout development lifecycle, not just at end
Rationale: Early reviews catch issues before they compound; rapid feedback maintains momentum (37% productivity increase)
Application: Fast Check during development, Comprehensive Review before production
7. Track Improvements
Practice: Document before/after scores to measure improvement over time
Rationale: Tracking demonstrates progress, identifies patterns, validates improvements
Application: Save review reports, compare scores across iterations
8. Iterate Based on Findings
Practice: Use review findings to improve future skills, not just current skill
Rationale: Learnings compound; patterns identified in reviews improve entire skill ecosystem
Application: Document common issues, create guidelines, update templates
Common Mistakes
Mistake 1: Skipping Structure Review
Symptom: Spending time on detailed review only to discover fundamental structural issues
Cause: Assumption that structure is correct, eagerness to assess content
Fix: Always run Structure Review (Fast Check) first - takes 5-10 minutes, catches 70% of issues
Prevention: Make Fast Check mandatory first step in any review process
Mistake 2: Subjective Scoring
Symptom: Inconsistent scores, debate over ratings, difficulty justifying scores
Cause: Using personal opinion instead of rubric criteria
Fix: Use references/scoring-rubric.md - score based on specific criteria, not feeling
Prevention: Print rubric, refer to criteria for each score, document evidence
Mistake 3: Ignoring Usability
Symptom: Skill looks good on paper but difficult to use in practice
Cause: Skipping Usability Review (90% manual, time-consuming)
Fix: Actually test skill in real scenario - reveals hidden issues
Prevention: Allocate 30-60 minutes for usability testing, cannot skip for production
Mistake 4: No Prioritization
Symptom: Long list of improvements, unclear what to fix first, overwhelmed
Cause: Treating all issues equally without assessing impact
Fix: Prioritize issues: Critical (must fix) → High → Medium → Low (nice to have)
Prevention: Tag each issue with priority level during review
Mistake 5: Batch Reviews
Symptom: Discovering major issues late in development, costly rework
Cause: Waiting until end to review, accumulating issues
Fix: Review early and often - Fast Check during development, iterations
Prevention: Continuous validation, rapid feedback, catch issues when small
Mistake 6: Ignoring Patterns
Symptom: Repeating same issues across multiple skills
Cause: Treating each review in isolation, not learning from patterns
Fix: Track common issues, create guidelines, update development process
Prevention: Document patterns, share learnings, improve templates
Quick Reference
The 5 Operations
| Operation | Focus | Automation | Time | Key Output |
|---|---|---|---|---|
| Structure | YAML, files, naming, organization | 95% | 5-10m | Structure score, compliance report |
| Content | Completeness, clarity, examples | 40% | 15-30m | Content score, section assessment |
| Quality | Patterns, best practices, anti-patterns | 50% | 20-40m | Quality score, pattern compliance |
| Usability | Ease of use, effectiveness | 10% | 30-60m | Usability score, scenario test results |
| Integration | Dependencies, data flow, composition | 30% | 15-25m | Integration score, dependency validation |
Scoring Scale
| Score | Level | Meaning | Action |
|---|---|---|---|
| 5 | Excellent | Exceeds standards | Exemplary - use as example |
| 4 | Good | Meets standards | Production ready - standard quality |
| 3 | Acceptable | Minor improvements | Usable - note improvements |
| 2 | Needs Work | Notable issues | Not ready - significant improvements |
| 1 | Poor | Significant problems | Not viable - extensive rework |
Production Readiness
| Overall Score | Grade | Status | Decision |
|---|---|---|---|
| 4.5-5.0 | A | ✅ Production Ready | Ship it - high quality |
| 4.0-4.4 | B+ | ✅ Ready (minor improvements) | Ship - note improvements for next iteration |
| 3.5-3.9 | B- | ⚠️ Needs Improvements | Hold - fix issues first |
| 2.5-3.4 | C | ❌ Not Ready | Don't ship - substantial work needed |
| 1.5-2.4 | D | ❌ Not Ready | Don't ship - significant rework |
| 1.0-1.4 | F | ❌ Not Ready | Don't ship - major issues |
Review Modes
| Mode | Time | Use Case | Coverage |
|---|---|---|---|
| Fast Check | 5-10m | During development, quick validation | Structure only (automated) |
| Custom | Variable | Targeted review, specific concerns | Selected dimensions |
| Comprehensive | 1.5-2.5h | Pre-production, full assessment | All 5 dimensions + report |
Common Commands
# Fast structure validation python3 scripts/validate-structure.py /path/to/skill # Verbose output python3 scripts/validate-structure.py /path/to/skill --verbose # JSON output python3 scripts/validate-structure.py /path/to/skill --json # Pattern compliance check python3 scripts/check-patterns.py /path/to/skill # Generate review report python3 scripts/generate-review-report.py review_data.json --output report.md # Run comprehensive review python3 scripts/review-runner.py /path/to/skill --mode comprehensive
Weighted Average Formula
Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
(Usability × 0.15) + (Integration × 0.15)
Weight Rationale:
- •Content & Quality (25% each): Core value
- •Structure (20%): Foundation
- •Usability & Integration (15% each): Supporting
For More Information
- •Structure details:
references/structure-review-guide.md - •Content details:
references/content-review-guide.md - •Quality details:
references/quality-review-guide.md - •Usability details:
references/usability-review-guide.md - •Integration details:
references/integration-review-guide.md - •Complete scoring rubrics:
references/scoring-rubric.md - •Report templates:
references/review-report-template.md
For detailed guidance on each dimension, see reference files. For automation tools, see scripts/.