Skill Review & Audit

Produce a systematic, multi-dimensional review of any skill directory (a SKILL.md plus optional scripts/, references/, assets/, and install metadata).

Outcomes

•A clear description of what the skill teaches and what it does not.
•A map of tooling + side effects the skill may cause when followed (commands, network, file writes, permissions).
•A risk assessment (security, privacy, safety, supply chain) with mitigations.
•A quality assessment (correctness, completeness, maintainability, UX) with prioritized improvements.
•Optional scoring using references/scoring-rubric.md.
•A report formatted using references/report-template.md.

Inputs To Request (If Missing)

•Skill identifier: name and/or filesystem path to the skill directory.
•Target agent environment (e.g. Codex CLI / Claude Code / other) and any constraints (offline, no web, sandboxed, etc.).
•Intended usage context (what kinds of user prompts should trigger it; what “done” looks like).

Workflow (Do In Order)

0) Scope The Review

•Confirm whether the review is (a) informational only or (b) includes proposing patches to the skill.
•Define what “safe enough” means for the target environment (network allowed? can write files? secrets present?).

1) Inventory & Provenance

•List the full directory tree and file sizes.
•Identify install/provenance files (common: .openskills.json, package.json, pyproject.toml, git submodule markers).
•
Record:
- •Skill root path
- •Total file count
- •Presence of scripts/, references/, assets/
- •Any external source URL + install timestamp (if present)
•Flag anything unexpected (executables, binaries, obfuscated blobs, huge files, symlinks pointing elsewhere).

Optional helper: run scripts/scan_skill.sh (read it first; it is intended to be read-only). Note: scan_skill.sh may surface sensitive strings (e.g., tokens, private keys) depending on the target directory. Treat its output as sensitive; redact before sharing.

2) Trigger Contract (Frontmatter Audit)

Read SKILL.md YAML frontmatter and assess:

•Name: unique, stable, correctly scoped (not overly broad).
•Description (primary trigger): includes concrete triggers/symptoms; avoids vague “does everything”.
•False positives/negatives: prompts it might match incorrectly vs fail to match.
•Overlap risk: collisions with other skills (same domain, similar trigger phrases).

Output: “Trigger Strength” rating + rewrite suggestions.

3) Capability Model (What It Teaches)

Extract and summarize:

•Core tasks it claims to support.
•Preconditions and assumptions (tech stack, tools installed, access levels).
•Deliverables (expected outputs, formats, artifacts).
•Anti-scope (“When NOT to use”) and limitations (explicit or missing).
•Degree-of-freedom: where it’s prescriptive vs heuristic.

If the skill includes references, don’t assume the main SKILL.md is complete—sample or selectively read reference files to confirm scope.

4) Tooling & Side-Effects Map

Build a table of everything the skill instructs the agent to do:

•Shell commands (including examples).
•Network access (curl/wget, HTTP clients, package installs, API calls).
•File system writes (what paths, destructive operations, deletes).
•Privilege/permissions (sudo, elevated access, credential usage).
•External dependencies (libraries, CLIs, SaaS).

For each, record: intent, required permissions, risk, and safe alternatives (sandbox, dry-run, allowlists).

5) Security / Privacy / Safety Risk Assessment

Use references/risk-taxonomy.md to assess:

•Prompt injection exposure (especially if the skill fetches external content).
•Command injection risks (string interpolation into shell; unsafe copy/paste patterns).
•Destructive operations (rm -rf, overwriting, migrations, irreversible actions).
•Secrets handling (API keys, env vars, logs, redaction).
•Supply-chain risks (install scripts, unpinned deps, untrusted sources).
•Data exfiltration (uploading files, telemetry, “paste logs here” patterns).

Output: severity × likelihood per risk + mitigations + “safe-by-default” recommendations.

6) Quality & Correctness Review

•Verify examples for internal consistency (missing imports, wrong prop precedence, mismatched ARIA ids, etc.).
•Check for missing edge cases (cancellation, cleanup, concurrency, accessibility, i18n).
•Check “progressive disclosure” quality: is SKILL.md lean and navigational, with details in references/?
•Check for outdated or unstable advice (versions, APIs likely to change); suggest pinning and dates.

7) Maintainability & Operational Fit

•Structure: clear headings, searchable keywords, minimal duplication.
•Update strategy: versioning, ownership, changelog expectations (even if no file).
•Testability: are scripts tested? is there a validation workflow?
•Portability: OS assumptions, shell assumptions, tool availability.

8) Improvement Plan (Prioritized)

Provide:

•Quick wins (low effort / high impact).
•Structural changes (refactor into references, add scripts, add checklists).
•Safety hardening (guardrails, confirmations, allowlists).
•“Definition of Done” for the next iteration.

9) Produce The Report

Use references/report-template.md and keep:

•Facts separated from recommendations
•Explicit uncertainty markers when you did not verify something
•Concrete examples (commands, paths, prompts) where useful

Red Flags — Stop And Re-check

•Only read SKILL.md and ignored scripts/ / references/.
•Listed risks without mapping concrete commands/side effects.
•No provenance/supply-chain notes.
•No severity/likelihood distinction (everything “risky”).
•Suggested running scripts you did not read.
•Gave recommendations without tying them to a specific observed gap.

Deep Checklist (Optional)

If you need a more exhaustive pass, use references/review-checklist.md and score with references/scoring-rubric.md.