Review-Multi

Overview

review-multi provides a systematic framework for conducting comprehensive, multi-dimensional reviews of Claude Code skills. It evaluates skills across 5 independent dimensions, combining automated validation with manual assessment to deliver objective quality scores and actionable improvement recommendations.

Purpose: Systematic skill quality assurance through multi-dimensional assessment

The 5 Review Dimensions:

•Structure Review - YAML frontmatter, file organization, naming conventions, progressive disclosure
•Content Review - Section completeness, clarity, examples, documentation quality
•Quality Review - Pattern compliance, best practices, anti-pattern detection, code quality
•Usability Review - Ease of use, learnability, real-world effectiveness, user satisfaction
•Integration Review - Dependency documentation, data flow, component integration, composition

Automation Levels:

•Structure: 95% automated (validate-structure.py)
•Content: 40% automated, 60% manual assessment
•Quality: 50% automated, 50% manual assessment
•Usability: 10% automated, 90% manual testing
•Integration: 30% automated, 70% manual review

Scoring System:

•Scale: 1-5 per dimension (Excellent/Good/Acceptable/Needs Work/Poor)
•Overall Score: Weighted average across dimensions
•Grade: A/B/C/D/F mapping
•Production Readiness: ≥4.5 ready, 4.0-4.4 ready with improvements, 3.5-3.9 needs work, <3.5 not ready

Value Proposition:

•Objective: Evidence-based scoring using detailed rubrics (not subjective opinion)
•Comprehensive: 5 dimensions cover all quality aspects
•Efficient: Automation handles 30-95% of checks depending on dimension
•Actionable: Specific, prioritized improvement recommendations
•Consistent: Standardized checklists ensure repeatable results
•Flexible: 3 review modes (Comprehensive, Fast Check, Custom)

Key Benefits:

•Catch 70% of issues with fast automated checks
•Reduce common quality issues by 30% using checklists
•Ensure production readiness before deployment
•Identify improvement opportunities systematically
•Track quality improvements over time
•Establish quality standards across skill ecosystem

When to Use

Use review-multi when:

•
Pre-Production Validation - Review new skills before deploying to production to catch issues early and ensure quality standards
•
Quality Assurance - Conduct systematic QA on skills to validate they meet ecosystem standards and user needs
•
Identifying Improvements - Discover specific, actionable improvements for existing skills through multi-dimensional assessment
•
Continuous Improvement - Regular reviews throughout development lifecycle, not just at end, to maintain quality
•
Production Readiness Assessment - Determine if skill is ready for production use with objective scoring and grade mapping
•
Skill Ecosystem Standards - Ensure consistency and quality across multiple skills using standardized review framework
•
Post-Update Validation - Review skills after major updates to ensure changes don't introduce issues or degrade quality
•
Learning and Improvement - Use review findings to learn patterns, improve future skills, and refine development practices
•
Team Calibration - Standardize quality assessment across multiple reviewers with objective rubrics

Don't Use When:

•Quick syntax checks (use validate-structure.py directly)
•In-progress drafts (wait until reasonably complete)
•Experimental prototypes (not production-bound)

Prerequisites

Required:

•Skill to review (in .claude/skills/[skill-name]/ format)
•
Time allocation based on review mode:
- •Fast Check: 5-10 minutes
- •Single Operation: 15-60 minutes (varies by dimension)
- •Comprehensive Review: 1.5-2.5 hours

Optional:

•Python 3.7+ (for automation scripts in Structure and Quality reviews)
•PyYAML library (for YAML frontmatter validation)
•Access to skill-under-review documentation
•Familiarity with Claude Code skill patterns (see development-workflow/references/common-patterns.md)

Skills (no required dependencies, complementary):

•development-workflow: Use review-multi after skill development
•skill-updater: Apply review-multi recommendations
•testing-validator: Combine with review-multi for full QA

Scoring System

The review-multi scoring system provides objective, consistent quality assessment across all skill dimensions.

Per-Dimension Scoring (1-5 Scale)

Each dimension is scored independently using a 1-5 integer scale:

5 - Excellent (Exceeds Standards)

•All criteria met perfectly
•Goes beyond minimum requirements
•Exemplary quality that sets the bar
•No issues or concerns identified
•Can serve as example for others

4 - Good (Meets Standards)

•Meets all critical criteria
•1-2 minor, non-critical issues
•Production-ready quality
•Standard expected level
•Small improvements possible

3 - Acceptable (Minor Improvements Needed)

•Meets most criteria
•3-4 issues, some may be critical
•Usable but not optimal
•Several improvements recommended
•Can proceed with noted concerns

2 - Needs Work (Notable Issues)

•Missing several criteria
•5-6 issues, multiple critical
•Not production-ready
•Significant improvements required
•Rework needed before deployment

1 - Poor (Significant Problems)

•Fails most criteria
•7+ issues, fundamentally flawed
•Major quality concerns
•Extensive rework required
•Not viable in current state

Overall Score Calculation

The overall score is a weighted average of the 5 dimension scores:

code

Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
          (Usability × 0.15) + (Integration × 0.15)

Weight Rationale:

•Content & Quality (25% each): Core skill value - what it does and how well
•Structure (20%): Important foundation - organization and compliance
•Usability & Integration (15% each): Supporting factors - user experience and composition

Example Calculations:

•Scores (5, 4, 4, 3, 4) → Overall = (5×0.20 + 4×0.25 + 4×0.25 + 3×0.15 + 4×0.15) = 4.15 → Grade B
•Scores (4, 5, 5, 4, 4) → Overall = (4×0.20 + 5×0.25 + 5×0.25 + 4×0.15 + 4×0.15) = 4.55 → Grade A
•Scores (3, 3, 2, 3, 3) → Overall = (3×0.20 + 3×0.25 + 2×0.25 + 3×0.15 + 3×0.15) = 2.85 → Grade C

Grade Mapping

Overall scores map to letter grades:

•A (4.5-5.0): Excellent - Production ready, high quality
•B (3.5-4.4): Good - Ready with minor improvements
•C (2.5-3.4): Acceptable - Needs improvements before production
•D (1.5-2.4): Poor - Requires significant rework
•F (1.0-1.4): Failing - Major issues, not viable

Production Readiness Assessment

Based on overall score:

•≥4.5 (Grade A): ✅ Production Ready - High quality, deploy with confidence
•4.0-4.4 (Grade B+): ✅ Ready with Minor Improvements - Can deploy, address improvements in next iteration
•3.5-3.9 (Grade B-): ⚠️ Needs Improvements - Address issues before production deployment
•<3.5 (Grade C-F): ❌ Not Ready - Significant rework required before deployment

Decision Framework:

•A Grade: Ship it - exemplary quality
•B Grade (4.0+): Ship it - standard quality, note improvements for future
•B- Grade (3.5-3.9): Hold - fix identified issues first
•C-F Grade: Don't ship - substantial work needed

Operations

Operation 1: Structure Review

Purpose: Validate file organization, naming conventions, YAML frontmatter compliance, and progressive disclosure

When to Use This Operation:

•Always run first (fast automated check catches 70% of issues)
•Before comprehensive review (quick validation of basics)
•During development (continuous structure validation)
•Quick quality checks (5-10 minute validation)

Automation Level: 95% automated via scripts/validate-structure.py

Process:

•
Run Structure Validation Script
bash
```
python3 scripts/validate-structure.py /path/to/skill [--json] [--verbose]
```
Script checks YAML, file structure, naming, progressive disclosure
•
Review YAML Frontmatter
- •Verify name field in kebab-case format
- •Check description has 5+ trigger keywords naturally embedded
- •Validate YAML syntax is correct
•
Verify File Structure
- •Confirm SKILL.md exists
- •Check references/ and scripts/ organization (if present)
- •Verify README.md exists
•
Check Naming Conventions
- •SKILL.md and README.md uppercase
- •references/ files: lowercase-hyphen-case
- •scripts/ files: lowercase-hyphen-case with extension
•
Validate Progressive Disclosure
- •SKILL.md <1,500 lines (warn if >1,200)
- •references/ files 300-800 lines each
- •No monolithic files

Validation Checklist:

Scoring Criteria:

•5 - Excellent: All 10 checks pass, perfect compliance, exemplary structure
•4 - Good: 8-9 checks pass, 1-2 minor non-critical issues (e.g., README missing but optional)
•3 - Acceptable: 6-7 checks pass, 3-4 issues including some critical (e.g., YAML invalid but fixable)
•2 - Needs Work: 4-5 checks pass, 5-6 issues with multiple critical (e.g., no SKILL.md, bad naming)
•1 - Poor: ≤3 checks pass, 7+ issues, fundamentally flawed structure

Outputs:

•Structure score (1-5)
•Pass/fail status for each checklist item
•List of issues found with severity (critical/warning/info)
•Specific improvement recommendations with fix guidance
•JSON report (if using script with --json flag)

Time Estimate: 5-10 minutes (mostly automated)

Example:

bash

$ python3 scripts/validate-structure.py .claude/skills/todo-management

Structure Validation Report
===========================
Skill: todo-management
Date: 2025-11-06

✅ YAML Frontmatter: PASS
   - Name format: valid (kebab-case)
   - Trigger keywords: 8 found (target: 5+)

✅ File Structure: PASS
   - SKILL.md: exists
   - README.md: exists
   - references/: 3 files found
   - scripts/: 1 file found

✅ Naming Conventions: PASS
   - All files follow conventions

⚠️  Progressive Disclosure: WARNING
   - SKILL.md: 569 lines (good)
   - state-management-guide.md: 501 lines (good)
   - BUT: No Quick Reference section detected

Overall Structure Score: 4/5 (Good)
Issues: 1 warning (missing Quick Reference)
Recommendation: Add Quick Reference section to SKILL.md

Operation 2: Content Review

Purpose: Assess section completeness, content clarity, example quality, and documentation comprehensiveness

When to Use This Operation:

•Evaluate documentation quality
•Assess completeness of skill content
•Review example quality and quantity
•Validate information architecture
•Check clarity and organization

Automation Level: 40% automated (section detection, example counting), 60% manual assessment

Process:

•
Check Section Completeness (automated + manual)
- •Verify 5 core sections present: Overview, When to Use, Main Content (workflow/operations), Best Practices, Quick Reference
- •Check optional sections: Prerequisites, Common Mistakes, Troubleshooting
- •Assess if all necessary sections included
•
Assess Content Clarity (manual)
- •Is content understandable?
- •Is organization logical?
- •Are explanations clear without being verbose?
- •Is technical level appropriate for audience?
•
Evaluate Example Quality (automated count + manual quality)
- •Count code/command examples (target: 5+)
- •Check if examples are concrete (not abstract placeholders)
- •Verify examples are executable/copy-pasteable
- •Assess if examples help understanding
•
Review Documentation Completeness (manual)
- •Is all necessary information present?
- •Are there unexplained gaps?
- •Is sufficient detail provided?
- •Are edge cases covered?
•
Check Explanation Depth (manual)
- •Not too brief (insufficient detail)?
- •Not too verbose (unnecessary length)?
- •Balanced depth for complexity?

Validation Checklist:

Scoring Criteria:

•5 - Excellent: All 10 checks pass, exceptional clarity, great examples, comprehensive documentation
•4 - Good: 8-9 checks pass, good content with minor gaps or clarity issues
•3 - Acceptable: 6-7 checks pass, some sections weak or missing, acceptable clarity
•2 - Needs Work: 4-5 checks pass, multiple sections incomplete/unclear, poor examples
•1 - Poor: ≤3 checks pass, major gaps, confusing content, few/no examples

Outputs:

•Content score (1-5)
•Section-by-section assessment (present/missing/weak)
•Example quality rating and count
•Specific content improvement recommendations
•Clarity issues identified with examples

Time Estimate: 15-30 minutes (requires manual review)

Example:

code

Content Review: prompt-builder
==============================

Section Completeness: 9/10 ✅
✅ Overview: Present, clear explanation of purpose
✅ When to Use: 7 scenarios listed
✅ Main Content: 5-step workflow, well-organized
✅ Best Practices: 6 practices documented
✅ Quick Reference: Present
⚠️  Common Mistakes: Not present (optional but valuable)

Example Quality: 8/10 ✅
- Count: 12 examples (exceeds target of 5+)
- Concrete: Yes, all examples executable
- Helpful: Yes, demonstrate key concepts
- Minor: Could use 1-2 edge case examples

Content Clarity: 9/10 ✅
- Well-organized logical flow
- Clear explanations without verbosity
- Technical level appropriate
- Minor: Step 3 could be clearer (add diagram)

Documentation Completeness: 8/10 ✅
- All workflow steps documented
- Validation criteria clear
- Minor gaps: Error handling not covered

Content Score: 4/5 (Good)
Primary Recommendation: Add Common Mistakes section
Secondary: Add error handling guidance to Step 3

Operation 3: Quality Review

Purpose: Evaluate pattern compliance, best practices adherence, anti-pattern detection, and code/script quality

When to Use This Operation:

•Validate standards compliance
•Check pattern implementation
•Detect anti-patterns
•Assess code quality (if scripts present)
•Ensure best practices followed

Automation Level: 50% automated (pattern detection, anti-pattern checking), 50% manual assessment

Process:

•
Detect Architecture Pattern (automated + manual)
- •Identify pattern type: workflow/task/reference/capabilities
- •Verify pattern correctly implemented
- •Check pattern consistency throughout skill
•
Validate Documentation Patterns (automated + manual)
- •Verify 5 core sections present
- •Check consistent structure across steps/operations
- •Validate section formatting
•
Check Best Practices (manual)
- •Validation checklists present and specific?
- •Examples throughout documentation?
- •Quick Reference available?
- •Error cases considered?
•
Detect Anti-Patterns (automated + manual)
- •Keyword stuffing (trigger keywords unnatural)?
- •Monolithic SKILL.md (>1,500 lines, no progressive disclosure)?
- •Inconsistent structure (each section different format)?
- •Vague validation ("everything works")?
- •Missing examples (too abstract)?
- •Placeholders in production ("YOUR_VALUE_HERE")?
- •Ignoring error cases (only happy path)?
- •Over-engineering simple skills?
- •Unclear dependencies?
- •No Quick Reference?
•
Assess Code Quality (manual, if scripts present)
- •Scripts well-documented (docstrings)?
- •Error handling present?
- •CLI interfaces clear?
- •Code style consistent?

Validation Checklist:

Scoring Criteria:

•5 - Excellent: All 10 checks pass, exemplary quality, no anti-patterns, exceeds standards
•4 - Good: 8-9 checks pass, high quality, meets all standards, minor deviations
•3 - Acceptable: 6-7 checks pass, acceptable quality, some standard violations, 2-3 anti-patterns
•2 - Needs Work: 4-5 checks pass, quality issues, multiple standard violations, 4-5 anti-patterns
•1 - Poor: ≤3 checks pass, poor quality, significant problems, 6+ anti-patterns detected

Outputs:

•Quality score (1-5)
•Pattern compliance assessment (pattern detected, compliance level)
•Anti-patterns detected (list with severity)
•Best practices gaps identified
•Code quality assessment (if scripts present)
•Prioritized improvement recommendations

Time Estimate: 20-40 minutes (mixed automated + manual)

Example:

code

Quality Review: workflow-skill-creator
======================================

Pattern Compliance: ✅
- Pattern Detected: Workflow-based
- Implementation: Correct (5 sequential steps with dependencies)
- Consistency: High (all steps follow same structure)

Documentation Patterns: ✅
- 5 Core Sections: All present
- Structure: Consistent across all 5 steps
- Formatting: Proper heading levels

Best Practices Adherence: 8/10 ✅
✅ Validation checklists: Present and specific
✅ Examples throughout: 6 examples included
✅ Quick Reference: Present
⚠️ Error handling: Limited (only happy path in examples)

Anti-Pattern Detection: 1 detected ⚠️
✅ No keyword stuffing (15 natural keywords)
✅ No monolithic file (1,465 lines but has references/)
✅ Consistent structure
✅ Specific validation criteria
✅ Examples complete (no placeholders)
⚠️ Error cases: Only happy path documented
✅ Dependencies: Clearly documented
✅ Not over-engineered

Code Quality: N/A (no scripts)

Quality Score: 4/5 (Good)
Primary Issue: Limited error handling documentation
Recommendation: Add error case examples and recovery guidance

Operation 4: Usability Review

Purpose: Evaluate ease of use, learnability, real-world effectiveness, and user satisfaction through scenario testing

When to Use This Operation:

•Test real-world usage
•Assess user experience
•Evaluate learnability
•Measure effectiveness
•Validate skill achieves stated purpose

Automation Level: 10% automated (basic checks), 90% manual testing

Process:

•
Test in Real-World Scenario
- •Select appropriate use case from "When to Use" section
- •Actually use the skill to complete task
- •Document experience: smooth or friction?
- •Note any confusion or difficulty
•
Assess Navigation/Findability
- •Can you find needed information easily?
- •Is information architecture logical?
- •Are sections well-organized?
- •Is Quick Reference helpful?
•
Evaluate Clarity
- •Are instructions clear and actionable?
- •Are steps easy to follow?
- •Do examples help understanding?
- •Is technical terminology explained?
•
Measure Effectiveness
- •Does skill achieve stated purpose?
- •Does it deliver promised value?
- •Are outputs useful and complete?
- •Would you use it again?
•
Assess Learning Curve
- •How long to understand skill?
- •How long to use effectively?
- •Is learning curve reasonable for complexity?
- •Are first-time users supported well?

Validation Checklist:

• Skill tested in real-world scenario (actual usage, not just reading)
• Users can find information easily (navigation clear, sections logical)
• Instructions are clear and actionable (can follow without confusion)
• Examples help understanding (concrete, demonstrate key concepts)
• Skill achieves stated purpose (delivers promised value)
• Learning curve reasonable (appropriate for skill complexity)
• Error messages helpful (if applicable: clear, actionable guidance)
• Overall user satisfaction high (would use again, recommend to others)

Scoring Criteria:

•5 - Excellent: All 8 checks pass, excellent usability, easy to learn, highly effective, very satisfying
•4 - Good: 6-7 checks pass, good usability, minor friction points, generally effective
•3 - Acceptable: 4-5 checks pass, acceptable usability, some confusion/difficulty, moderately effective
•2 - Needs Work: 2-3 checks pass, usability issues, frustrating or confusing, limited effectiveness
•1 - Poor: ≤1 check passes, poor usability, hard to use, ineffective, unsatisfying

Outputs:

•Usability score (1-5)
•Scenario test results (success/partial/failure)
•User experience assessment (smooth/acceptable/frustrating)
•Specific usability improvements identified
•Learning curve assessment
•Effectiveness rating

Time Estimate: 30-60 minutes (requires actual testing)

Example:

code

Usability Review: skill-researcher
==================================

Real-World Scenario Test: ✅
- Scenario: Research GitHub API integration patterns
- Result: SUCCESS - Found 5 relevant sources, synthesized findings
- Experience: Smooth, operations clearly explained
- Time: 45 minutes (expected 60 min range)

Navigation/Findability: 9/10 ✅
- Information easy to find
- 5 operations clearly separated
- Quick Reference table very helpful
- Minor: Could use table of contents for long doc

Instruction Clarity: 9/10 ✅
- Steps clear and actionable
- Process well-explained
- Examples demonstrate concepts
- Minor: Web search query formulation could be clearer

Effectiveness: 10/10 ✅
- Achieved purpose: Found patterns and synthesized
- Delivered value: Comprehensive research in 45 min
- Would use again: Yes, very helpful

Learning Curve: 8/10 ✅
- Time to understand: 10 minutes
- Time to use effectively: 15 minutes
- Reasonable for complexity
- First-time user: Some concepts need explanation (credibility scoring)

Error Handling: N/A (no errors encountered)

User Satisfaction: 9/10 ✅
- Would use again: Yes
- Would recommend: Yes
- Overall experience: Very positive

Usability Score: 5/5 (Excellent)
Minor Improvement: Add brief explanation of credibility scoring concept

Operation 5: Integration Review

Purpose: Assess dependency documentation, data flow clarity, component integration, and composition patterns

When to Use This Operation:

•Review workflow skills (that compose other skills)
•Validate dependency documentation
•Check integration clarity
•Assess composition patterns
•Verify cross-references valid

Automation Level: 30% automated (dependency checking, cross-reference validation), 70% manual assessment

Process:

•
Review Dependency Documentation (manual)
- •Are required skills documented?
- •Are optional/complementary skills mentioned?
- •Is YAML dependencies field used (if applicable)?
- •Are dependency versions noted (if relevant)?
•
Assess Data Flow Clarity (manual, for workflow skills)
- •Is data flow between skills explained?
- •Are inputs/outputs documented for each step?
- •Do users understand how data moves?
- •Are there diagrams or flowcharts (if helpful)?
•
Evaluate Component Integration (manual)
- •How do component skills work together?
- •Are integration points clear?
- •Are there integration examples?
- •Is composition pattern documented?
•
Verify Cross-References (automated + manual)
- •Do internal links work (references to references/, scripts/)?
- •Are external skill references correct?
- •Are complementary skills mentioned?
•
Check Composition Patterns (manual, for workflow skills)
- •Is composition pattern identified (sequential/parallel/conditional/etc.)?
- •Is pattern correctly implemented?
- •Are orchestration details provided?

Validation Checklist:

• Dependencies documented (if skill requires other skills)
• YAML dependencies field correct (if used)
• Data flow explained (for workflow skills: inputs/outputs clear)
• Integration points clear (how component skills connect)
• Component skills referenced correctly (names accurate, paths valid)
• Cross-references valid (internal links work, external references correct)
• Integration examples provided (if applicable: how to use together)
• Composition pattern documented (if workflow: sequential/parallel/etc.)
• Complementary skills mentioned (optional but valuable related skills)

Scoring Criteria:

•5 - Excellent: All 9 checks pass (applicable ones), perfect integration documentation
•4 - Good: 7-8 checks pass, good integration, minor gaps in documentation
•3 - Acceptable: 5-6 checks pass, some integration unclear, missing details
•2 - Needs Work: 3-4 checks pass, integration issues, poorly documented dependencies/flow
•1 - Poor: ≤2 checks pass, poor integration, confusing or missing dependency documentation

Outputs:

•Integration score (1-5)
•Dependency validation results (required/optional/complementary documented)
•Data flow clarity assessment (for workflow skills)
•Integration clarity rating
•Cross-reference validation results
•Improvement recommendations

Time Estimate: 15-25 minutes (mostly manual)

Example:

code

Integration Review: development-workflow
========================================

Dependency Documentation: 10/10 ✅
- Required Skills: None (workflow is standalone)
- Component Skills: 5 clearly documented (skill-researcher, planning-architect, task-development, prompt-builder, todo-management)
- Optional Skills: 3 complementary skills mentioned (review-multi, skill-updater, testing-validator)
- YAML Field: Not used (not required, skills referenced in content)

Data Flow Clarity: 10/10 ✅ (Workflow Skill)
- Data flow diagram present (skill → output → next skill)
- Inputs/outputs for each step documented
- Users understand how artifacts flow
- Example:

skill-researcher → research-synthesis.md → planning-architect ↓ skill-architecture-plan.md → task-development

code


Component Integration: 10/10 ✅
- Integration method documented for each step (Guided Execution)
- Integration examples provided
- Clear explanation of how skills work together
- Process for using each component skill detailed

Cross-Reference Validation: ✅
- Internal links valid (references/ files exist and reachable)
- External skill references correct (all 5 component skills exist)
- Complementary skills mentioned appropriately

Composition Pattern: 10/10 ✅ (Workflow Skill)
- Pattern: Sequential Pipeline (with one optional step)
- Correctly implemented (Step 1 → 2 → [3 optional] → 4 → 5)
- Orchestration details provided
- Clear flow diagram

Integration Score: 5/5 (Excellent)
Notes: Exemplary integration documentation for workflow skill

Review Modes

Comprehensive Review Mode

Purpose: Complete multi-dimensional assessment across all 5 dimensions with aggregate scoring

When to Use:

•Pre-production validation (ensure skill ready for deployment)
•Major skill updates (validate changes don't degrade quality)
•Quality certification (establish baseline quality score)
•Periodic quality audits (track quality over time)

Process:

•
Run All 5 Operations Sequentially
- •Operation 1: Structure Review (5-10 min, automated)
- •Operation 2: Content Review (15-30 min, manual)
- •Operation 3: Quality Review (20-40 min, mixed)
- •Operation 4: Usability Review (30-60 min, manual)
- •Operation 5: Integration Review (15-25 min, manual)
•
Aggregate Scores
- •Record score (1-5) for each dimension
- •Calculate weighted overall score using formula
- •Map overall score to grade (A/B/C/D/F)
•
Assess Production Readiness
- •≥4.5: Production Ready
- •4.0-4.4: Ready with minor improvements
- •3.5-3.9: Needs improvements before production
- •<3.5: Not ready, significant rework required
•
Compile Improvement Recommendations
- •Aggregate issues from all dimensions
- •Prioritize: Critical → High → Medium → Low
- •Provide specific, actionable fixes
•
Generate Comprehensive Report
- •Executive summary (overall score, grade, readiness)
- •Per-dimension scores and findings
- •Prioritized improvement list
- •Detailed rationale for scores

Output:

•Overall score (1.0-5.0 with one decimal)
•Grade (A/B/C/D/F)
•Production readiness assessment
•Per-dimension scores (Structure, Content, Quality, Usability, Integration)
•Comprehensive improvement recommendations (prioritized)
•Detailed review report

Time Estimate: 1.5-2.5 hours total

Example Output:

code

Comprehensive Review Report: skill-researcher
=============================================

OVERALL SCORE: 4.6/5.0 - GRADE A
STATUS: ✅ PRODUCTION READY

Dimension Scores:
- Structure:   5/5 (Excellent) - Perfect file organization
- Content:     5/5 (Excellent) - Comprehensive, clear documentation
- Quality:     4/5 (Good) - High quality, minor error handling gaps
- Usability:   5/5 (Excellent) - Easy to use, highly effective
- Integration: 4/5 (Good) - Well-documented dependencies

Production Readiness: READY - High quality, deploy with confidence

Recommendations (Priority Order):
1. [Medium] Add error handling examples for web search failures
2. [Low] Consider adding table of contents for long SKILL.md

Strengths:
- Excellent structure and organization
- Comprehensive coverage of 5 research operations
- Strong usability with clear instructions
- Good examples throughout

Overall: Exemplary skill, production-ready quality

Fast Check Mode

Purpose: Quick automated validation for rapid quality feedback during development

When to Use:

•During development (continuous validation)
•Quick quality checks (before detailed review)
•Pre-commit validation (catch issues early)
•Rapid iteration (fast feedback loop)

Process:

•

Run Automated Structure Validation

bash

python3 scripts/validate-structure.py /path/to/skill

•
Check Critical Issues
- •YAML frontmatter valid?
- •Required files present?
- •Naming conventions followed?
- •File sizes appropriate?
•
Generate Pass/Fail Report
- •PASS: Critical checks passed, proceed to development
- •FAIL: Critical issues found, fix before continuing
•
Provide Quick Fixes (if available)
- •Specific commands to fix issues
- •Examples of correct format
- •References to documentation

Output:

•Pass/Fail status
•Critical issues list (if failed)
•Quick fixes or guidance
•Score estimate (if passed)

Time Estimate: 5-10 minutes

Example Output:

bash

$ python3 scripts/validate-structure.py .claude/skills/my-skill

Fast Check Report
=================
Skill: my-skill

❌ FAIL - Critical Issues Found

Critical Issues:
1. YAML frontmatter: Invalid syntax (line 3: unexpected character)
2. Naming convention: File "MyGuide.md" should be "my-guide.md"

Quick Fixes:
1. Fix YAML: Remove trailing comma on line 3
2. Rename file: mv references/MyGuide.md references/my-guide.md

Run full validation after fixes: python3 scripts/validate-structure.py .claude/skills/my-skill

Custom Review

Purpose: Flexible review focusing on specific dimensions or concerns

When to Use:

•Targeted improvements (focus on specific dimension)
•Time constraints (can't do comprehensive review)
•Specific concerns (e.g., only check usability)
•Iterative improvements (focus on one dimension at a time)

Options:

•Select Dimensions: Choose 1-5 operations to run
•Adjust Thoroughness: Quick/Standard/Thorough per dimension
•Focus Areas: Specify particular concerns (e.g., "check examples quality")

Process:

•
Define Custom Review Scope
- •Which dimensions to review?
- •How thorough for each?
- •Any specific focus areas?
•
Run Selected Operations
- •Execute chosen operations
- •Apply thoroughness level
•
Generate Targeted Report
- •Scores for selected dimensions only
- •Focused findings
- •Specific recommendations

Example Scenarios:

Scenario 1: Content-Focused Review

code

Custom Review: Content + Examples
- Operations: Content Review only
- Thoroughness: Thorough
- Focus: Example quality and completeness
- Time: 30 minutes

Scenario 2: Quick Quality Check

code

Custom Review: Structure + Quality (Fast)
- Operations: Structure + Quality
- Thoroughness: Quick
- Focus: Pattern compliance, anti-patterns
- Time: 15-20 minutes

Scenario 3: Workflow Integration Review

code

Custom Review: Integration Deep Dive
- Operations: Integration Review only
- Thoroughness: Thorough
- Focus: Data flow, composition patterns
- Time: 30 minutes

Best Practices

1. Self-Review First

Practice: Run Fast Check mode before requesting comprehensive review

Rationale: Automated checks catch 70% of structural issues in 5-10 minutes, allowing manual review to focus on higher-value assessment

Application: Always run validate-structure.py before detailed review

2. Use Checklists Systematically

Practice: Follow validation checklists item-by-item for each operation

Rationale: Research shows teams using checklists reduce common issues by 30% and ensure consistent results

Application: Print or display checklist, mark each item explicitly

3. Test in Real Scenarios

Practice: Conduct usability review with actual usage, not just documentation reading

Rationale: Real-world testing reveals hidden usability issues that documentation review misses

Application: For Usability Review, actually use the skill to complete a realistic task

4. Focus on Automation

Practice: Let scripts handle routine checks, focus manual effort on judgment-requiring assessment

Rationale: Automation provides 70% reduction in manual review time for routine checks

Application: Use scripts for Structure and partial Quality checks, manual for Content/Usability

5. Provide Actionable Feedback

Practice: Make improvement recommendations specific, prioritized, and actionable

Rationale: Vague feedback ("improve quality") is less valuable than specific guidance ("add error handling examples to Step 3")

Application: For each issue, specify: What, Why, How (to fix), Priority

6. Review Regularly

Practice: Conduct reviews throughout development lifecycle, not just at end

Rationale: Early reviews catch issues before they compound; rapid feedback maintains momentum (37% productivity increase)

Application: Fast Check during development, Comprehensive Review before production

7. Track Improvements

Practice: Document before/after scores to measure improvement over time

Rationale: Tracking demonstrates progress, identifies patterns, validates improvements

Application: Save review reports, compare scores across iterations

8. Iterate Based on Findings

Practice: Use review findings to improve future skills, not just current skill

Rationale: Learnings compound; patterns identified in reviews improve entire skill ecosystem

Application: Document common issues, create guidelines, update templates

Common Mistakes

Mistake 1: Skipping Structure Review

Symptom: Spending time on detailed review only to discover fundamental structural issues

Cause: Assumption that structure is correct, eagerness to assess content

Fix: Always run Structure Review (Fast Check) first - takes 5-10 minutes, catches 70% of issues

Prevention: Make Fast Check mandatory first step in any review process

Mistake 2: Subjective Scoring

Symptom: Inconsistent scores, debate over ratings, difficulty justifying scores

Cause: Using personal opinion instead of rubric criteria

Fix: Use references/scoring-rubric.md - score based on specific criteria, not feeling

Prevention: Print rubric, refer to criteria for each score, document evidence

Mistake 3: Ignoring Usability

Symptom: Skill looks good on paper but difficult to use in practice

Cause: Skipping Usability Review (90% manual, time-consuming)

Fix: Actually test skill in real scenario - reveals hidden issues

Prevention: Allocate 30-60 minutes for usability testing, cannot skip for production

Mistake 4: No Prioritization

Symptom: Long list of improvements, unclear what to fix first, overwhelmed

Cause: Treating all issues equally without assessing impact

Fix: Prioritize issues: Critical (must fix) → High → Medium → Low (nice to have)

Prevention: Tag each issue with priority level during review

Mistake 5: Batch Reviews

Symptom: Discovering major issues late in development, costly rework

Cause: Waiting until end to review, accumulating issues

Fix: Review early and often - Fast Check during development, iterations

Prevention: Continuous validation, rapid feedback, catch issues when small

Mistake 6: Ignoring Patterns

Symptom: Repeating same issues across multiple skills

Cause: Treating each review in isolation, not learning from patterns

Fix: Track common issues, create guidelines, update development process

Prevention: Document patterns, share learnings, improve templates

Quick Reference

The 5 Operations

Operation	Focus	Automation	Time	Key Output
Structure	YAML, files, naming, organization	95%	5-10m	Structure score, compliance report
Content	Completeness, clarity, examples	40%	15-30m	Content score, section assessment
Quality	Patterns, best practices, anti-patterns	50%	20-40m	Quality score, pattern compliance
Usability	Ease of use, effectiveness	10%	30-60m	Usability score, scenario test results
Integration	Dependencies, data flow, composition	30%	15-25m	Integration score, dependency validation

Scoring Scale

Score	Level	Meaning	Action
5	Excellent	Exceeds standards	Exemplary - use as example
4	Good	Meets standards	Production ready - standard quality
3	Acceptable	Minor improvements	Usable - note improvements
2	Needs Work	Notable issues	Not ready - significant improvements
1	Poor	Significant problems	Not viable - extensive rework

Production Readiness

Overall Score	Grade	Status	Decision
4.5-5.0	A	✅ Production Ready	Ship it - high quality
4.0-4.4	B+	✅ Ready (minor improvements)	Ship - note improvements for next iteration
3.5-3.9	B-	⚠️ Needs Improvements	Hold - fix issues first
2.5-3.4	C	❌ Not Ready	Don't ship - substantial work needed
1.5-2.4	D	❌ Not Ready	Don't ship - significant rework
1.0-1.4	F	❌ Not Ready	Don't ship - major issues

Review Modes

Mode	Time	Use Case	Coverage
Fast Check	5-10m	During development, quick validation	Structure only (automated)
Custom	Variable	Targeted review, specific concerns	Selected dimensions
Comprehensive	1.5-2.5h	Pre-production, full assessment	All 5 dimensions + report

Common Commands

bash

# Fast structure validation
python3 scripts/validate-structure.py /path/to/skill

# Verbose output
python3 scripts/validate-structure.py /path/to/skill --verbose

# JSON output
python3 scripts/validate-structure.py /path/to/skill --json

# Pattern compliance check
python3 scripts/check-patterns.py /path/to/skill

# Generate review report
python3 scripts/generate-review-report.py review_data.json --output report.md

# Run comprehensive review
python3 scripts/review-runner.py /path/to/skill --mode comprehensive

Weighted Average Formula

code

Overall = (Structure × 0.20) + (Content × 0.25) + (Quality × 0.25) +
          (Usability × 0.15) + (Integration × 0.15)

Weight Rationale:

•Content & Quality (25% each): Core value
•Structure (20%): Foundation
•Usability & Integration (15% each): Supporting

For More Information

•Structure details: references/structure-review-guide.md
•Content details: references/content-review-guide.md
•Quality details: references/quality-review-guide.md
•Usability details: references/usability-review-guide.md
•Integration details: references/integration-review-guide.md
•Complete scoring rubrics: references/scoring-rubric.md
•Report templates: references/review-report-template.md

For detailed guidance on each dimension, see reference files. For automation tools, see scripts/.