AgentSkillsCN

pattern-extractor

从仓库源代码中提取编码模式与规范。当您需要分析代码库模式、识别架构决策,或为技能生成做准备时,请使用本指南。

SKILL.md
--- frontmatter
name: pattern-extractor
description: Extract coding patterns and conventions from repository source code. Use when analyzing codebase patterns, identifying architectural decisions, or preparing for skill generation.
metadata:
  phase: 2
  pipeline: assimilation
  version: 1.0.0

Pattern Extractor

Overview

Perform deep code analysis to extract recurring patterns, coding conventions, and architectural decisions. This is phase 2 of the assimilation pipeline—consuming scan-report.json and producing patterns.json.


When to Use

Use this skill when:

  • Analyzing a scanned repository for patterns
  • Identifying architectural decisions in code
  • Extracting conventions for skill generation
  • Understanding how a codebase handles common concerns

Input

Required: .github/temp/scan-report.json (from repo-scanner)


Output

File: .github/temp/patterns.json


Execution Steps

Step 1: Validate Input

bash
# Check scan-report.json exists
if [ ! -f ".github/temp/scan-report.json" ]; then
    echo "ERROR: scan-report.json not found. Run repo-scanner first."
    exit 1
fi

Step 2: Load Scan Report

Read scan-report.json to understand:

  • Language (determines grep patterns)
  • Framework (provides hints for expected patterns)
  • Domains (focus areas for analysis)
  • Entry points (key files to analyze)

Step 3: Define Pattern Categories

CategoryDescriptionWhat to Look For
architectureDesign patterns, structureMiddleware, MVC, Factory, Singleton, Repository
reliabilityError handling, resilienceTry/catch, retry logic, circuit breaker, graceful degradation
qualityTesting, code qualityTest patterns, assertions, mocking, coverage
securityAuth, validationAuthentication, authorization, input validation, sanitization
conventionsProject-specific styleNaming patterns, file organization, coding style

Step 4: Run Semantic Grep

Use language-appropriate patterns to find code structures.

JavaScript/TypeScript Patterns

bash
# Middleware pattern
grep -rn "middleware\|\.use(\|next()" --include="*.js" --include="*.ts"

# Factory pattern
grep -rn "create[A-Z]\|factory\|Factory" --include="*.js" --include="*.ts"

# Error handling
grep -rn "try.*{$\|\.catch(\|throw new" --include="*.js" --include="*.ts"

# Async patterns
grep -rn "async.*await\|Promise\.\|\.then(" --include="*.js" --include="*.ts"

# Dependency injection
grep -rn "constructor.*private\|@Inject\|inject(" --include="*.js" --include="*.ts"

# Event patterns
grep -rn "\.on(\|\.emit(\|EventEmitter\|addEventListener" --include="*.js" --include="*.ts"

# Repository/DAO pattern
grep -rn "Repository\|findById\|findAll\|save(" --include="*.js" --include="*.ts"

Python Patterns

bash
# Decorator pattern
grep -rn "@.*def \|@staticmethod\|@classmethod\|@property" --include="*.py"

# Context manager
grep -rn "with.*as\|__enter__\|__exit__" --include="*.py"

# Error handling
grep -rn "try:\|except.*:\|raise " --include="*.py"

# Async patterns
grep -rn "async def\|await \|asyncio" --include="*.py"

# Dependency injection
grep -rn "@inject\|Depends(\|dependency" --include="*.py"

Go Patterns

bash
# Interface pattern
grep -rn "type.*interface\|func.*\(.*\).*{" --include="*.go"

# Error handling
grep -rn "if err != nil\|return.*err\|errors\." --include="*.go"

# Middleware pattern
grep -rn "func.*Handler\|http\.Handler\|middleware" --include="*.go"

Rust Patterns

bash
# Result/Error handling
grep -rn "Result<\|\.unwrap()\|\.expect(\|?" --include="*.rs"

# Trait implementations
grep -rn "impl.*for\|trait " --include="*.rs"

# Pattern matching
grep -rn "match.*{\|=>" --include="*.rs"

Step 5: Analyze Key Files

Select max 20 files for deeper analysis:

  1. Entry points from scan-report
  2. Files in detected domains
  3. Files with highest grep hit counts

For each file (max 10KB):

  • Read content
  • Identify patterns used
  • Note line numbers for evidence

Step 6: Calculate Confidence Scores

For each detected pattern, calculate:

code
confidence = (
    frequency     × 0.30 +    # In how many files? (0-1)
    consistency   × 0.30 +    # Same implementation? (0-1)
    documentation × 0.20 +    # Mentioned in docs? (0-1)
    external      × 0.20      # Known pattern? (0-1)
)

Scoring Rules

ComponentScore 1.0Score 0.5Score 0.0
FrequencyFound in >50% of relevant filesFound in 20-50%Found in <20%
ConsistencySame structure everywhereMinor variationsInconsistent
DocumentationExplicitly documentedCommentedNo docs
ExternalGoF/known patternCommon practiceNovel pattern

Step 7: Extract Conventions

Analyze code for project-specific conventions:

ConventionHow to Detect
NamingAnalyze variable/function names: camelCase, snake_case, PascalCase
File structureAnalyze directory layout: by-type, by-feature, flat
Import styleAnalyze imports: absolute, relative, barrel files
Comment styleAnalyze comments: JSDoc, docstrings, inline

Step 8: Require Evidence

MUST have ≥2 evidence locations to include a pattern.

Evidence format: "<file-path>:L<line-number>"

Examples:

  • "src/middleware/auth.ts:L45"
  • "lib/router/index.js:L120-L145"

Step 9: Generate Output

Create .github/temp/patterns.json:

json
{
  "source": {
    "repo": "<repo-name>",
    "language": "<language>",
    "framework": "<framework>"
  },
  "patterns": [
    {
      "name": "<pattern-name>",
      "category": "<architecture|reliability|quality|security|conventions>",
      "confidence": 0.85,
      "scores": {
        "frequency": 0.9,
        "consistency": 0.8,
        "documentation": 0.7,
        "external": 1.0
      },
      "evidence": [
        "<file>:L<line>",
        "<file>:L<line>"
      ],
      "description": "<what the pattern does>",
      "keywords": ["<discovery-keyword>", "<another-keyword>"]
    }
  ],
  "conventions": {
    "naming": "<camelCase|snake_case|PascalCase|kebab-case>",
    "fileStructure": "<by-type|by-feature|flat|monorepo>",
    "importStyle": "<absolute|relative|barrel>",
    "commentStyle": "<jsdoc|docstring|inline|minimal>"
  },
  "metadata": {
    "extracted_at": "<ISO-timestamp>",
    "extractor_version": "1.0.0",
    "files_analyzed": <count>,
    "patterns_found": <count>
  }
}

Step 10: Report Completion

code
✅ Pattern extraction complete
   Patterns found: <count>
   Categories: architecture(<n>), reliability(<n>), quality(<n>), security(<n>), conventions(<n>)
   Files analyzed: <count>
   
   Report: .github/temp/patterns.json

Pattern Recognition Guide

Architecture Patterns

PatternIndicatorsExample Evidence
Middleware Chainuse(), next(), ordered handlersExpress app.use(), Koa middleware
MVCcontrollers/, models/, views/ dirsSeparate concerns by layer
Factorycreate*() functions, *Factory classesObject instantiation abstraction
Repository*Repository classes, CRUD methodsData access abstraction
SingletongetInstance(), module-level instanceSingle instance pattern
Dependency InjectionConstructor injection, @InjectInversion of control

Reliability Patterns

PatternIndicatorsExample Evidence
Error-First Callback(err, result) =>Node.js callback convention
Try-Catch WrapperConsistent error boundariesCentralized error handling
Retry LogicLoop with delay, attempt counterNetwork request retry
Circuit BreakerState tracking, failure thresholdPrevent cascade failures
Graceful DegradationFallback values, default behaviorService unavailable handling

Quality Patterns

PatternIndicatorsExample Evidence
Arrange-Act-AssertTest structure with sectionsTest organization
Mock/Stubjest.mock(), sinon.stub()Test isolation
Fixture PatternShared test data setupTest data management
Integration TestReal dependencies, E2EFull stack testing

Security Patterns

PatternIndicatorsExample Evidence
Input ValidationSchema validation, sanitizationPrevent injection
AuthenticationLogin, token verificationIdentity verification
AuthorizationRole checks, permissionsAccess control
Rate LimitingRequest counting, throttlingAbuse prevention

Error Handling

ConditionAction
scan-report.json not foundFAIL with error message
No patterns foundProduce empty patterns array, warn user
File read errorSkip file, log warning, continue
Confidence below 0.60Include but mark as low-confidence

Examples

Example Output: Express.js API

json
{
  "source": {
    "repo": "my-api",
    "language": "JavaScript",
    "framework": "Express.js"
  },
  "patterns": [
    {
      "name": "middleware-chain",
      "category": "architecture",
      "confidence": 0.92,
      "scores": {
        "frequency": 1.0,
        "consistency": 0.9,
        "documentation": 0.8,
        "external": 1.0
      },
      "evidence": [
        "src/app.js:L15",
        "src/app.js:L22",
        "src/middleware/auth.js:L8"
      ],
      "description": "Request flows through ordered middleware functions with next() chaining",
      "keywords": ["middleware", "request processing", "express", "chain"]
    },
    {
      "name": "error-first-callback",
      "category": "reliability",
      "confidence": 0.78,
      "scores": {
        "frequency": 0.7,
        "consistency": 0.8,
        "documentation": 0.6,
        "external": 1.0
      },
      "evidence": [
        "src/services/db.js:L34",
        "src/services/file.js:L12"
      ],
      "description": "Callbacks receive error as first parameter for consistent error handling",
      "keywords": ["callback", "error handling", "async", "node"]
    }
  ],
  "conventions": {
    "naming": "camelCase",
    "fileStructure": "by-type",
    "importStyle": "relative",
    "commentStyle": "jsdoc"
  },
  "metadata": {
    "extracted_at": "2026-01-25T11:00:00Z",
    "extractor_version": "1.0.0",
    "files_analyzed": 18,
    "patterns_found": 6
  }
}

Conventions

Do:

  • Require ≥2 evidence locations for each pattern
  • Calculate all four confidence components
  • Include keywords for pattern discovery
  • Use relative file paths in evidence

Don't:

  • Include patterns with only 1 evidence location
  • Accept confidence scores outside 0.0-1.0
  • Include repo-specific variable names in descriptions
  • Skip the conventions section

Related Skills