AgentSkillsCN

repo-scanner

深度扫描仓库结构,提取元数据、识别框架,并划定领域边界。当您开始吸收过程、分析新代码库,或收集项目上下文时,请使用本指南。

SKILL.md
--- frontmatter
name: repo-scanner
description: Deep scan repository structure to extract metadata, detect frameworks, and identify domain boundaries. Use when starting assimilation, analyzing a new codebase, or gathering project context.
metadata:
  phase: 1
  pipeline: assimilation
  version: 1.0.0

Repo Scanner

Overview

Scan a repository's structure to produce a comprehensive report without reading all source code. This is phase 1 of the assimilation pipeline—gathering metadata that subsequent phases consume.


When to Use

Use this skill when:

  • Starting assimilation of a new repository
  • Analyzing an unfamiliar codebase
  • Gathering project context before making changes
  • Detecting frameworks and tooling

Output

File: .github/temp/scan-report.json

The scan produces a structured report consumed by pattern-extractor (phase 2).


Execution Steps

Step 1: Create Temp Directory

bash
mkdir -p .github/temp

Step 2: Apply Ignore Patterns

MUST skip these directories entirely:

code
node_modules/    vendor/         .venv/          __pycache__/
dist/            build/          .next/          target/
coverage/        .git/           out/            .cache/
.turbo/          .parcel-cache/  .nuxt/          .output/

Note existence but don't analyze content:

code
package-lock.json    yarn.lock       pnpm-lock.yaml
Cargo.lock           poetry.lock     go.sum

Step 3: Prioritize Files

Read files in this order (most important first):

PriorityFile TypesExamplesWhy
1Config filespackage.json, tsconfig.json, pyproject.toml, Cargo.toml, go.modDefine project type
2Entry pointsmain.ts, index.js, app.py, main.go, lib.rs, src/index.*Core application logic
3READMEREADME.md, README.rst, README.txtProject documentation
4Shallow filesFiles at depth 1-2 from rootHigh-level structure
5Source filessrc/**/*.{ts,js,py,go,rs} (skip test, spec)Implementation

File size limit: SHOULD skip files >10KB (likely generated or data files)

Step 4: Detect Language

Check config files to determine primary language:

Config FileLanguage
package.jsonJavaScript/TypeScript
tsconfig.jsonTypeScript
pyproject.toml, setup.py, requirements.txtPython
Cargo.tomlRust
go.modGo
pom.xml, build.gradleJava
GemfileRuby
composer.jsonPHP

Rule: Prefer config-based detection over file extension counting.

Step 5: Detect Framework

Check dependencies in config files:

JavaScript/TypeScript (package.json)

Check dependencies forFramework
nextNext.js
reactReact
vueVue.js
@angular/coreAngular
expressExpress.js
fastifyFastify
@nestjs/coreNestJS
honoHono
koaKoa

Python (pyproject.toml / requirements.txt)

Check forFramework
djangoDjango
fastapiFastAPI
flaskFlask
starletteStarlette
tornadoTornado

Rust (Cargo.toml)

Check dependencies forFramework
actix-webActix
axumAxum
rocketRocket
warpWarp

Go (go.mod)

Check forFramework
github.com/gin-gonic/ginGin
github.com/labstack/echoEcho
github.com/gofiber/fiberFiber

Step 6: Detect Project Type

Infer project type from structure and config:

IndicatorsType
bin in package.json, CLI-related depscli
main field, no bin, library depslibrary
Framework detected (Next, Express, etc.)web-app
Only test files, no srctest-suite
@types/* only, .d.ts filestypes
Unclearapplication

Step 7: Identify Domain Boundaries

Known domain patterns:

code
api, auth, backend, frontend, core, common, shared,
services, handlers, controllers, models, views, routes,
components, hooks, utils, lib, pkg, internal, cmd,
modules, features, domains, entities, repositories

Algorithm:

  1. List all top-level directories
  2. For each directory:
    • Count files (excluding test files)
    • Check if name matches known pattern
  3. Mark as domain if: matches_pattern AND file_count > 5

Step 8: Detect Tooling

Check package.json scripts or config files:

Tool TypeHow to Detect
Testscripts.test, jest.config, vitest.config, pytest.ini
Lintscripts.lint, .eslintrc, .pylintrc, rustfmt.toml
Buildscripts.build, webpack.config, vite.config, tsconfig.json
Formatscripts.format, .prettierrc, .editorconfig

Step 9: Generate Report

Create .github/temp/scan-report.json:

json
{
  "name": "<repo-name>",
  "type": "<cli|library|web-app|application|types|test-suite>",
  "language": "<JavaScript|TypeScript|Python|Rust|Go|Java|Ruby|PHP>",
  "framework": "<detected-framework-or-null>",
  "structure": {
    "type": "<flat|nested|monorepo>",
    "depth": <max-directory-depth>,
    "mainDirs": ["<top-level-dirs>"],
    "entryPoints": ["<detected-entry-files>"]
  },
  "domains": [
    {
      "name": "<domain-name>",
      "path": "<relative-path>",
      "files": <file-count>
    }
  ],
  "tools": {
    "test": "<test-command-or-null>",
    "lint": "<lint-command-or-null>",
    "build": "<build-command-or-null>",
    "format": "<format-command-or-null>"
  },
  "docs": ["<doc-files-found>"],
  "metadata": {
    "scanned_at": "<ISO-timestamp>",
    "scanner_version": "1.0.0",
    "files_analyzed": <count>,
    "files_skipped": <count>
  }
}

Step 10: Report Completion

After generating the report, output:

code
✅ Scan complete: <repo-name>
   Language: <language>
   Framework: <framework>
   Type: <type>
   Domains: <count>
   Files analyzed: <count>
   
   Report: .github/temp/scan-report.json

Error Handling

ConditionAction
No config files foundSet language="Unknown", continue with file extension analysis
Directory is emptyFAIL with error: "Empty directory, nothing to scan"
Permission deniedSkip file, log warning, continue
File too large (>10KB)Skip file, increment files_skipped

On failure, produce error report:

json
{
  "error": true,
  "message": "<error-description>",
  "phase": "repo-scanner",
  "timestamp": "<ISO-timestamp>"
}

Examples

Example 1: Scanning an Express.js Project

Input: Repository with package.json containing express dependency

Output:

json
{
  "name": "my-api",
  "type": "web-app",
  "language": "JavaScript",
  "framework": "Express.js",
  "structure": {
    "type": "nested",
    "depth": 4,
    "mainDirs": ["src", "tests", "docs"],
    "entryPoints": ["src/index.js", "src/app.js"]
  },
  "domains": [
    { "name": "routes", "path": "src/routes", "files": 8 },
    { "name": "controllers", "path": "src/controllers", "files": 6 },
    { "name": "models", "path": "src/models", "files": 4 }
  ],
  "tools": {
    "test": "npm test",
    "lint": "npm run lint",
    "build": null
  },
  "docs": ["README.md", "API.md"],
  "metadata": {
    "scanned_at": "2026-01-25T10:30:00Z",
    "scanner_version": "1.0.0",
    "files_analyzed": 45,
    "files_skipped": 3
  }
}

Example 2: Scanning a Python CLI

Input: Repository with pyproject.toml and click dependency

Output:

json
{
  "name": "my-cli",
  "type": "cli",
  "language": "Python",
  "framework": null,
  "structure": {
    "type": "flat",
    "depth": 2,
    "mainDirs": ["src", "tests"],
    "entryPoints": ["src/main.py", "src/cli.py"]
  },
  "domains": [
    { "name": "commands", "path": "src/commands", "files": 5 }
  ],
  "tools": {
    "test": "pytest",
    "lint": "ruff check .",
    "build": "python -m build"
  },
  "docs": ["README.md"],
  "metadata": {
    "scanned_at": "2026-01-25T10:35:00Z",
    "scanner_version": "1.0.0",
    "files_analyzed": 18,
    "files_skipped": 0
  }
}

Conventions

Do:

  • Check config files before inferring from extensions
  • Skip large files to avoid wasting context
  • Include metadata for debugging
  • Report both analyzed and skipped counts

Don't:

  • Read file contents of all source files (that's pattern-extractor's job)
  • Analyze node_modules, vendor, or other dependency directories
  • Fail silently—always produce output or error report
  • Include absolute paths in report (use relative paths)

Related Skills