AgentSkillsCN

skills-eval

通过审计手段评估并提升 Claude 技能的质量。适用于审查技能质量、为技能上线做准备,或对现有技能进行审计时使用。请勿在创建新技能时使用此技能(应使用“模块化技能”模块)或撰写书面表达时使用此技能(应使用“清晰简洁地写作”模块)。在将任何技能正式上线前,务必先使用此技能进行质量检测。适用于:质量保证、技能优化、工具使用、性能指标、技能审计、质量评审、合规性检查、改进建议、令牌用量分析、技能评估、技能考核、技能优化、技能标准、技能指标、技能性能、技能质量。

SKILL.md
--- frontmatter
name: skills-eval
description: 'Evaluate and improve Claude skill quality through auditing. Use when
  reviewing skill quality, preparing skills for production, or auditing existing skills.
  Do not use when creating new skills (use modular-skills) or writing prose (use writing-clearly-and-concisely).
  Use this skill before shipping any skill to production. Use when: quality-assurance,
  skills, optimization, tool-use, performance-metrics, skill audit, quality review,
  compliance check, improvement suggestions, token usage analysis, skill evaluation,
  skill assessment, skill optimization, skill standards, skill metrics, skill performance,
  skill quality.'
version: 1.4.0
category: skill-management
tags:
- evaluation
- improvement
- skills
- optimization
- quality-assurance
- tool-use
- performance-metrics
dependencies:
- modular-skills
- performance-optimization
tools:
- skills-auditor
- improvement-suggester
- compliance-checker
- tool-performance-analyzer
- token-usage-tracker
provides:
  infrastructure:
  - evaluation-framework
  - quality-assurance
  - improvement-planning
  patterns:
  - skill-analysis
  - token-optimization
  - modular-design
  sdk_features:
  - agent-sdk-compatibility
  - advanced-metrics
  - dynamic-discovery
estimated_tokens: 1800
usage_patterns:
- skill-audit
- quality-assessment
- improvement-planning
- skills-inventory
- tool-performance-evaluation
- dynamic-discovery-optimization
- advanced-tool-use-analysis
- programmatic-calling-efficiency
- context-preservation-quality
- token-efficiency-optimization
- modular-architecture-validation
- integration-testing
- compliance-reporting
- performance-benchmarking
complexity: advanced
evaluation_criteria:
  structure_compliance: 25
  metadata_quality: 20
  token_efficiency: 25
  tool_integration: 20
  claude_sdk_compliance: 10

Skills Evaluation and Improvement

Table of Contents

  1. Overview
  2. Quick Start
  3. Evaluation Workflow
  4. Evaluation and Optimization
  5. Resources

Overview

This framework audits Claude skills against quality standards to improve performance and reduce token consumption. Automated tools analyze skill structure, measure context usage, and identify specific technical improvements. Run verification commands after each audit to confirm fixes work correctly.

The skills-auditor provides structural analysis, while the improvement-suggester ranks fixes by impact. Compliance is verified through the compliance-checker. Runtime efficiency is monitored by tool-performance-analyzer and token-usage-tracker.

Quick Start

Basic Audit

Run a full audit of all skills or target a specific file to identify structural issues.

bash
# Audit all skills
make audit-all

# Audit specific skill
make audit-skill TARGET=path/to/skill/SKILL.md

Analysis and Optimization

Use skill_analyzer.py for complexity checks and token_estimator.py to verify the context budget.

bash
make analyze-skill TARGET=path/to/skill/SKILL.md
make estimate-tokens TARGET=path/to/skill/SKILL.md

Improvements

Generate a prioritized plan and verify standards compliance using improvement_suggester.py and compliance_checker.py.

bash
make improve-skill TARGET=path/to/skill/SKILL.md
make check-compliance TARGET=path/to/skill/SKILL.md

Evaluation Workflow

Start with make audit-all to inventory skills and identify high-priority targets. For each skill requiring attention, run analysis with analyze-skill to map complexity. Generate an improvement plan, apply fixes, and run check-compliance to verify the skill meets project standards. Finalize by checking the token budget for efficiency.

Evaluation and Optimization

Quality assessments use the skills-auditor and improvement-suggester to generate detailed reports. Performance analysis focuses on token efficiency through the token-usage-tracker and tool performance via tool-performance-analyzer. For standards compliance, the compliance-checker automates common fixes for structural issues.

Scoring and Prioritization

We evaluate skills across five dimensions: structure compliance, content quality, token efficiency, activation reliability, and tool integration. Scores above 90 represent production-ready skills, while scores below 50 indicate critical issues requiring immediate attention.

Improvements are prioritized by impact. Critical issues include security vulnerabilities or broken functionality. High-priority items cover structural flaws that hinder discoverability. Medium and low priorities focus on best practices and minor optimizations.

Resources

Shared Modules: Cross-Skill Patterns

Skill-Specific Modules

  • Trigger Isolation Analysis: See modules/trigger-isolation-analysis.md
  • Skill Authoring Best Practices: See modules/skill-authoring-best-practices.md
  • Authoring Checklist: See modules/authoring-checklist.md
  • Evaluation Workflows: See modules/evaluation-workflows.md
  • Quality Metrics: See modules/quality-metrics.md
  • Advanced Tool Use Analysis: See modules/advanced-tool-use-analysis.md
  • Evaluation Framework: See modules/evaluation-framework.md
  • Integration Patterns: See modules/integration.md
  • Troubleshooting: See modules/troubleshooting.md
  • Pressure Testing: See modules/pressure-testing.md
  • Integration Testing: See modules/integration-testing.md
  • Multi-Metric Evaluation: See modules/multi-metric-evaluation-methodology.md
  • Performance Benchmarking: See modules/performance-benchmarking.md

Tools and Automation

  • Tools: Executable analysis utilities in scripts/ directory.
  • Automation: Setup and validation scripts in scripts/automation/.