Codebase Recon
Build context fast using scripts. Read files only after scripts tell you WHERE to look.
Core Principle
Scripts extract structure. LLMs interpret meaning.
Wrong approach: Read files → Hope to find what you need → Miss things → Waste tokens Right approach: Run targeted scripts → Get structured data → Read only relevant files → Complete picture
When to Use This Skill
Trigger recon when:
- •Starting work on an unfamiliar codebase
- •Planning a feature that touches multiple areas
- •Debugging and need to trace data flow
- •Asked "how does X work" or "where is Y used"
- •Need to understand dependencies or impact of changes
The Toolbelt
Scripts organized by what you're trying to learn. Run the script, interpret the output, THEN decide what files to read.
1. PROJECT DISCOVERY — What kind of project is this?
Detect project type and structure:
# Show project root structure (2 levels, ignore noise) find . -maxdepth 2 -type f -name "*.json" -o -name "*.toml" -o -name "*.yaml" -o -name "*.yml" -o -name "Makefile" -o -name "Dockerfile" -o -name "*.mod" -o -name "*.lock" 2>/dev/null | head -30 # Quick file type census find . -type f -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" -o -name "*.py" -o -name "*.go" -o -name "*.rs" -o -name "*.java" 2>/dev/null | sed 's/.*\.//' | sort | uniq -c | sort -rn # Size of codebase (line count by extension) find . -type f \( -name "*.ts" -o -name "*.tsx" -o -name "*.py" -o -name "*.go" \) -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | xargs wc -l 2>/dev/null | tail -1
Interpret: This tells you the tech stack, project structure, and scale before reading any code.
2. ENTRY POINTS — Where does execution start?
Find main entry points:
# Package.json scripts (JS/TS) cat package.json 2>/dev/null | grep -A 20 '"scripts"' # Main files find . -maxdepth 3 -type f \( -name "main.*" -o -name "index.*" -o -name "app.*" -o -name "server.*" \) -not -path "*/node_modules/*" 2>/dev/null # Python entry points find . -maxdepth 3 -type f -name "__main__.py" -o -name "main.py" -o -name "app.py" -o -name "wsgi.py" 2>/dev/null # Go entry points grep -rl "func main()" --include="*.go" . 2>/dev/null | head -10
Interpret: These are the files to read FIRST — they show the application's skeleton.
3. DEPENDENCY MAP — What depends on what?
External dependencies:
# JS/TS - direct dependencies cat package.json 2>/dev/null | grep -A 100 '"dependencies"' | grep -B 100 '"devDependencies"' | head -50 # Python - requirements cat requirements.txt 2>/dev/null || cat pyproject.toml 2>/dev/null | grep -A 50 '\[project.dependencies\]' # Go modules cat go.mod 2>/dev/null | grep -v "^//"
Internal import graph (who imports whom):
# JS/TS - find all imports of a specific module grep -rn "from ['\"].*modulename" --include="*.ts" --include="*.tsx" . 2>/dev/null # Find the most-imported internal modules (hot files) grep -roh "from ['\"]\..*['\"]" --include="*.ts" --include="*.tsx" . 2>/dev/null | sort | uniq -c | sort -rn | head -20 # Python imports grep -rn "^from \|^import " --include="*.py" . 2>/dev/null | grep -v "__pycache__" | head -30
Interpret: Most-imported files are the core abstractions. Read those before anything else.
4. SYMBOL SEARCH — Where is X defined? Where is X used?
Find definition of a function/class/type:
# Find function definitions (JS/TS) grep -rn "function FUNCNAME\|const FUNCNAME\|export.*FUNCNAME\|class FUNCNAME" --include="*.ts" --include="*.tsx" . 2>/dev/null # Find function definitions (Python) grep -rn "def FUNCNAME\|class FUNCNAME" --include="*.py" . 2>/dev/null # Find all usages of a symbol grep -rn "SYMBOLNAME" --include="*.ts" --include="*.tsx" --include="*.py" . 2>/dev/null | grep -v "node_modules"
Find type/interface definitions (TypeScript):
grep -rn "^export type\|^export interface\|^type \|^interface " --include="*.ts" --include="*.tsx" . 2>/dev/null | grep -v node_modules | head -30
Interpret: Now you know exactly which file(s) to read for a specific symbol.
5. API SURFACE — What endpoints/routes exist?
REST endpoints:
# Express.js routes grep -rn "app\.\(get\|post\|put\|delete\|patch\)\|router\.\(get\|post\|put\|delete\|patch\)" --include="*.ts" --include="*.js" . 2>/dev/null | grep -v node_modules # FastAPI/Flask routes (Python) grep -rn "@app\.\(get\|post\|put\|delete\|route\)\|@router\." --include="*.py" . 2>/dev/null # Next.js API routes (file-based) find . -path "*/api/*" -name "*.ts" -o -path "*/api/*" -name "*.js" 2>/dev/null | grep -v node_modules
GraphQL:
# Find schema definitions find . -name "*.graphql" -o -name "*.gql" 2>/dev/null | head -10 # Find resolvers grep -rn "Query:\|Mutation:\|Resolver" --include="*.ts" --include="*.js" . 2>/dev/null | grep -v node_modules | head -20
Interpret: This maps the external interface before you read implementation details.
6. DATA LAYER — How is data stored/accessed?
Database schemas:
# Find migration files
find . -type d -name "migrations" -o -name "migrate" 2>/dev/null
find . -name "*.sql" -not -path "*/node_modules/*" 2>/dev/null | head -10
# Prisma schema
find . -name "schema.prisma" 2>/dev/null
# SQLAlchemy models
grep -rln "class.*Base\|Column(\|relationship(" --include="*.py" . 2>/dev/null | head -10
# TypeORM/Sequelize models
grep -rln "@Entity\|@Column\|Model.init" --include="*.ts" . 2>/dev/null | head -10
Interpret: Schema files define the data model — critical context for any data-touching feature.
7. CHANGE HISTORY — What's been touched recently?
Hot files (most frequently changed):
git log --pretty=format: --name-only --since="3 months ago" 2>/dev/null | sort | uniq -c | sort -rn | head -20
Recent changes to a specific area:
git log --oneline --since="1 month ago" -- "path/to/directory" 2>/dev/null | head -20
Who knows this code best:
git shortlog -sn -- "path/to/file" 2>/dev/null | head -5
Interpret: Hot files often contain bugs or are under active development. Recent changes show current focus areas.
8. ERROR PATTERNS — Where are errors handled?
Find error handling:
# Try/catch blocks
grep -rn "try {" --include="*.ts" --include="*.tsx" . 2>/dev/null | grep -v node_modules | wc -l
# Custom error classes
grep -rn "extends Error\|class.*Error" --include="*.ts" --include="*.py" . 2>/dev/null | grep -v node_modules
# Error boundaries (React)
grep -rln "componentDidCatch\|ErrorBoundary" --include="*.tsx" --include="*.jsx" . 2>/dev/null
Find logging:
grep -rn "console\.\(log\|error\|warn\)\|logger\.\|logging\." --include="*.ts" --include="*.py" . 2>/dev/null | grep -v node_modules | head -20
Interpret: Error handling patterns show how the codebase expects to fail — useful for debugging.
9. TEST COVERAGE — What's tested?
Find test files:
find . -name "*.test.ts" -o -name "*.spec.ts" -o -name "test_*.py" -o -name "*_test.py" -o -name "*_test.go" 2>/dev/null | grep -v node_modules | head -20
Test to source ratio:
# Count test files vs source files echo "Test files:" && find . -name "*.test.*" -o -name "*.spec.*" -o -name "test_*" 2>/dev/null | grep -v node_modules | wc -l echo "Source files:" && find . -name "*.ts" -o -name "*.py" 2>/dev/null | grep -v node_modules | grep -v test | grep -v spec | wc -l
Interpret: Test files show expected behavior and edge cases — often clearer than reading implementation.
10. CONFIG & ENVIRONMENT — How is it configured?
Find all config files:
find . -maxdepth 2 -name "*.config.*" -o -name ".env*" -o -name "*.toml" -o -name "*.yaml" -o -name "*.yml" 2>/dev/null | grep -v node_modules | head -20
Environment variables used:
grep -roh "process\.env\.[A-Z_]*\|os\.environ\[.*\]\|os\.getenv" --include="*.ts" --include="*.js" --include="*.py" . 2>/dev/null | sort | uniq
Interpret: Config files reveal deployment modes, feature flags, and integration points.
Recon Workflow
When you need codebase context, follow this sequence:
- •
Project Discovery (30 seconds)
- •Run project type detection
- •Get file census
- •Understand scale
- •
Map the skeleton (1-2 minutes)
- •Find entry points
- •Map dependencies (most-imported files)
- •Identify API surface
- •
Targeted deep-dive (as needed)
- •Symbol search for specific functions
- •Read only the files scripts pointed you to
- •Use git history to understand evolution
- •
Build mental model
- •Entry point → Core abstractions → Data layer → External interfaces
- •Now you understand the codebase without reading every file
Script Selection Guide
| I need to understand... | Run these scripts |
|---|---|
| What kind of project this is | Project Discovery |
| Where execution starts | Entry Points |
| The core abstractions | Dependency Map (most-imported) |
| Where a function is defined | Symbol Search (definition) |
| What uses a function | Symbol Search (usages) |
| The API surface | API Surface scripts |
| The data model | Data Layer scripts |
| What's actively being worked on | Change History (hot files) |
| How errors are handled | Error Patterns |
| Expected behavior | Test Coverage (read tests) |
Anti-Patterns
| Don't do this | Do this instead |
|---|---|
| Read files hoping to find what you need | Run symbol search, then read specific files |
| Start with implementation details | Start with entry points and work outward |
| Read all files in a directory | Find most-imported files, read those first |
| Guess at project structure | Run project discovery first |
| Ignore test files | Tests document expected behavior clearly |
| Read code without history | Check git log for context on why |
Adapting Scripts
The scripts above are templates. Adapt them:
- •Replace file extensions for your stack
- •Adjust
greppatterns for your framework - •Add project-specific patterns (your error classes, your route patterns)
- •Combine with
ripgrep(rg) if available — much faster