File Categorization Skill

When to Use This Skill

•Processing files in integration pipelines
•Scanning directories for file organization
•Auto-routing files to appropriate locations
•Generating file inventory reports
•Validating repository structure

What This Skill Does

Analyzes file structure and content to accurately categorize files into:

•Commands - Slash command definitions
•Agents - Agent configuration files
•Skills - Reusable workflow automation
•Documentation - General markdown documentation
•Other - Uncategorized files requiring manual review

Categorization Logic

Step 1: Filename Pattern Matching

Commands:

•Filename matches *-command.md or *command.md
•Located in .claude/commands/ directory
•Filename uses verb-noun pattern (e.g., integration-scan.md)

Agents:

•Filename matches *-agent.md or *agent.md
•Located in agents-templates/ directory
•Contains role-based names (architect, builder, validator, etc.)

Skills:

•Filename is SKILL.md or *-SKILL.md or *-skill.md
•Located in skills/*/ directories
•Contains workflow automation content

Documentation:

•Standard .md files
•Located in docs/ directory
•Contains reference or tutorial content

Step 2: Frontmatter Analysis

Read the YAML frontmatter (if present) to identify:

Command Indicators:

yaml

---
description: "..."
allowed-tools: [...]
author: "..."
version: "X.Y"
---

Skill Indicators:

yaml

---
name: skill-name
description: "..."
---

Agent Indicators (less structured, more prose):

markdown

## Agent Identity
**Role**: [Agent Role]
**Version**: X.Y.Z
**Purpose**: [Purpose description]

Step 3: Content Structure Analysis

Commands have:

•Workflow sections with numbered steps
•Bash command examples (prefixed with !)
•allowed-tools restrictions
•Usage examples

Agents have:

•Core Responsibilities section
•Allowed Tools and Permissions section
•Workflow Patterns section
•Context Management section

Skills have:

•"When to Use" section
•"What This Skill Does" section
•Step-by-step process descriptions
•Examples with real data

Documentation has:

•Standard markdown structure
•Tutorial or reference content
•No executable workflows
•Educational purpose

Step 4: Keyword Detection

Scan content for category-specific keywords:

Command Keywords:

•!bash, !git, !npm, etc. (shell commands)
•"allowed-tools"
•"Usage:", "Workflow:", "Steps:"
•Command-line patterns

Agent Keywords:

•"Core Responsibilities"
•"Workflow Patterns"
•"Context Management"
•"Orchestrator", "Sub-Agent"
•"Handoff", "Delegation"

Skill Keywords:

•"When to Use"
•"What This Skill Does"
•"Skill" in self-references
•Reusable workflow language

Documentation Keywords:

•"Introduction", "Overview", "Guide"
•"Tutorial", "Reference", "Best Practices"
•Educational/explanatory language

Categorization Algorithm

code

function categorizeFile(filePath, content):
  // Phase 1: Filename and location
  if filename matches command patterns OR in .claude/commands/:
    category = "Command"
    confidence = "High"

  else if filename == "SKILL.md" OR in skills/*/:
    category = "Skill"
    confidence = "High"

  else if in agents-templates/:
    category = "Agent"
    confidence = "High"

  else if in docs/:
    category = "Documentation"
    confidence = "Medium"

  // Phase 2: Frontmatter analysis (refine)
  frontmatter = extractYAML(content)
  if frontmatter contains "allowed-tools" AND "version":
    category = "Command"
    confidence = "High"

  else if frontmatter contains "name" (no allowed-tools):
    category = "Skill"
    confidence = "High"

  // Phase 3: Content structure (if still uncertain)
  if confidence != "High":
    if content contains "## Agent Identity":
      category = "Agent"
      confidence = "High"

    else if content contains "## When to Use":
      category = "Skill"
      confidence = "Medium"

    else if content contains "!bash" OR "!git":
      category = "Command"
      confidence = "Medium"

  // Phase 4: Fallback
  if category == null:
    category = "Other"
    confidence = "Low"
    reason = "Unable to determine category, manual review needed"

  return {category, confidence, reasoning}

Output Format

For each categorized file, return:

markdown

### [Filename]
- **Category**: [Command|Agent|Skill|Documentation|Other]
- **Confidence**: [High|Medium|Low]
- **Reasoning**: [Why this category was assigned]
- **Frontmatter**: [✅ Valid | ⚠️ Malformed | ❌ Missing]
- **Required Fields**: [List of found/missing fields]
- **Recommended Location**: [Target directory path]

Example Usage

Example 1: Categorizing Integration File

Input:

code

File: USING-GIT-WORKTREES-SKILL.md
Content:
---
name: using-git-worktrees
description: Creates isolated git worktrees...
---

# Using Git Worktrees

## When to Use
...

Output:

markdown

### USING-GIT-WORKTREES-SKILL.md
- **Category**: Skill
- **Confidence**: High
- **Reasoning**: Filename matches skill pattern, frontmatter has 'name' field, content has "When to Use" section
- **Frontmatter**: ✅ Valid
- **Required Fields**: name ✅, description ✅
- **Recommended Location**: skills/using-git-worktrees/SKILL.md

Example 2: Categorizing Command File

Input:

code

File: integration-scan.md
Content:
---
description: "Scan and categorize incoming files"
allowed-tools: ["Read", "Bash(find)"]
author: "Claude Command and Control"
version: "1.0"
---

# Integration Scan

## Purpose
...

Output:

markdown

### integration-scan.md
- **Category**: Command
- **Confidence**: High
- **Reasoning**: Filename uses verb-noun pattern, frontmatter has 'allowed-tools' and 'version'
- **Frontmatter**: ✅ Valid
- **Required Fields**: description ✅, allowed-tools ✅, author ✅, version ✅
- **Recommended Location**: .claude/commands/integration-scan.md

Example 3: Uncategorizable File

Input:

code

File: notes.md
Content:
# Random Notes

Some thoughts about the project...

Output:

markdown

### notes.md
- **Category**: Other
- **Confidence**: Low
- **Reasoning**: No frontmatter, no structural indicators, generic content
- **Frontmatter**: ❌ Missing
- **Required Fields**: N/A
- **Recommended Location**: Manual review required

Integration with Commands

Used By

•/integration-scan - Primary categorization logic
•/integration-process - Determines target directory
•/integration-validate - Validates category-specific structure

Usage Pattern

markdown

# In integration-scan command

For each file in /INTEGRATION/incoming:
  1. Read file content
  2. Use file-categorization skill
  3. Extract category and confidence
  4. Include in scan report
  5. Mark for processing if High confidence
  6. Flag for review if Medium/Low confidence

Category-Specific Validation Rules

Commands

•✅ MUST have: description, allowed-tools, author, version
•✅ SHOULD have: workflow steps, usage examples
•⚠️ Check: Tool permissions not overly broad

Agents

•✅ MUST have: Agent Identity, Core Responsibilities, Allowed Tools
•✅ SHOULD have: Workflow Patterns, Context Management
•⚠️ Check: Role clearly defined

Skills

•✅ MUST have: name, description, "When to Use"
•✅ SHOULD have: Examples, step-by-step process
•⚠️ Check: Examples use real data (not placeholders)

Documentation

•✅ MUST have: Clear title, structured content
•✅ SHOULD have: Table of contents, cross-references
•⚠️ Check: No executable workflows (should be in Command/Skill)

Error Handling

Malformed Frontmatter

code

Issue: YAML syntax error
Action: Note in categorization output
Category: "Other" with reason "Invalid frontmatter"
Recommendation: Fix YAML before processing

Conflicting Indicators

code

Issue: Filename says "command" but structure says "skill"
Action: Confidence = "Medium"
Reasoning: "Filename and content indicators conflict"
Recommendation: Manual review

Missing Content

code

Issue: File is empty or too short (<100 chars)
Action: Category = "Other"
Confidence: "Low"
Reasoning: "Insufficient content for categorization"

Testing Recommendations

Test with:

•Typical files - Standard commands, agents, skills
•Edge cases - Mixed indicators, missing frontmatter
•Malformed files - Syntax errors, incomplete content
•Ambiguous files - Could fit multiple categories

Expected accuracy:

•High confidence: >95% correct
•Medium confidence: >80% correct
•Low confidence: Requires manual review

Version History

1.0 (2025-11-23)

•Initial file categorization skill
•Four-phase categorization algorithm
•Integration with scan/process commands
•Comprehensive validation rules

Skill Status: Production Ready Accuracy Target: >95% for High confidence categorizations Dependencies: None (standalone logic)