Analyze Git Contributions
This skill analyzes a user's git contributions and intelligently groups them by functional components using AI-powered semantic analysis, producing a comprehensive markdown report.
Core Principles
- •Semantic Grouping: Group commits by functionality and purpose, not directory structure
- •Comprehensive Context: Include commit messages, file changes, statistics, and temporal patterns
- •Actionable Output: Generate markdown reports that clearly communicate contribution themes
- •Reusability: Work across different repositories and authors without modification
Workflow
Phase 1: Setup and Validation
- •Determine repository path (use current directory or prompt for path)
- •Validate git repository using scripts
- •Detect or prompt for author name/email
- •Verify author has commits in repository
Phase 2: Data Collection
Step 1: Count commits
- •Run
scripts/count-user-commits.shto get total commit count - •Store count for decision logic
Step 2: Decide strategy based on count
If count < 500:
- •Proceed with fast batch analysis
- •Run
fast-extract-commits.shto gather all commit data in one pass - •Uses cached data if available (subsequent runs ~5 seconds)
If count 500-2000:
- •Present user with options using AskUserQuestion:
- •Segmented analysis (Recommended) - Create multiple reports, one per time period (e.g., quarterly)
- •Intelligent sampling - Analyze 200 representative commits
- •Date range filter - Specify a time period (e.g., last 6 months)
- •Full analysis - Process all commits (may take longer)
If count > 2000:
- •Inform user: "Found X commits. Please narrow the scope:"
- •Options:
- •Segmented analysis (Recommended) - Multiple reports by time period
- •Date range filter - Specify a time period
- •Intelligent sampling - 200 commits across entire timeline
- •Do not offer "full analysis" option (would hit token limits)
Step 3: Collect based on strategy
For full/fast analysis:
# Fast batch extraction with caching
bash ~/.claude/skills/analyze-git-contributions/scripts/fast-extract-commits.sh \
"$REPO_PATH" "$AUTHOR" --use-cache
# Output: JSONL format with all commit data including file changes
# Cached: Second runs complete in ~5 seconds
# First run: ~10-30 seconds depending on commit count
For segmented analysis:
- •Run
scripts/calculate-time-segments.sh <repo> <author> <target_commits_per_segment>- •Script calculates date ranges and divides into segments
- •Target: 250-400 commits per segment (default: 300)
- •Returns: List of date ranges with commit counts
- •For each segment:
- •Run
fast-extract-commits.sh <repo> <author> --since=<start> --until=<end> - •Analyze commits for that period (Phase 3)
- •Generate separate markdown file:
git-contributions-analysis-2024-Q1.md - •Include header: "Part X of Y - Period: YYYY-MM-DD to YYYY-MM-DD"
- •Run
- •Create index file listing all segments with links
- •Each segment is analyzed independently (avoids token limits)
For intelligent sampling:
- •Run
scripts/sample-commits.sh <repo> <author> 200to get commit hashes - •Then use
fast-extract-commits.shto extract details for sampled commits - •Add note in report: "Analysis based on 200 representative commits sampled across YYYY-MM-DD to YYYY-MM-DD"
- •Proceed to Phase 3 with sampled dataset
For date range filter:
- •Run
fast-extract-commits.sh <repo> <author> --since=<date> --until=<date> - •Process filtered commits
- •Add note in report: "Analysis for period: YYYY-MM-DD to YYYY-MM-DD (X commits out of Y total)"
- •Proceed to Phase 3 with filtered dataset
Step 4: Token budget check (safety)
- •Before processing commit details, estimate token usage
- •Rule of thumb: ~65 tokens per commit on average
- •If estimated tokens > 20,000, reduce sample size or warn user
- •This prevents edge cases (commits with huge diffs)
- •For segmented analysis, each segment is checked independently
Phase 3: AI Analysis
- •Analyze commit semantics: Read commit messages, file changes, and timing patterns
- •Identify functional components: Group commits by related functionality
- •Examples: "Authentication System", "Payment Processing", "UI/UX Improvements", "Bug Fixes", "Database Schema", "API Endpoints", "Test Coverage"
- •Look beyond directory structure - related changes across commits indicate a component
- •Consider temporal relationships (commits close in time often relate to same feature)
- •Examine file patterns (same files modified = related work)
- •Generate component descriptions: Summarize each component's purpose and impact
- •Identify contribution themes: High-level overview of major work areas
Phase 4: Report Generation
- •Create markdown structure with metadata section
- •Add summary of contribution themes
- •For each component:
- •Component name and description
- •List commits with hash, date, subject
- •Show file changes with additions/deletions
- •Add statistics section:
- •Total files changed
- •Total additions/deletions
- •Most active areas
- •Write report to file or output directly
Usage Patterns
From Within Repository
# Run in current directory /analyze-git-contributions # Script auto-detects git config user.name
External Repository
# Specify path /analyze-git-contributions /path/to/repo # Or when prompted # Agent asks: "Enter repository path:" # User provides: /Users/seanreed/projects/my-app
Different Author
# Agent auto-detects from git config # If not found or wrong author, agent asks: # "Enter author name or email pattern:" # User provides: "John Doe" or "john@example.com"
Output Structure Example
Full Analysis (< 500 commits)
# Git Contributions Analysis **Repository:** /path/to/repo **Author:** Sean Reed **Date Range:** 2025-01-01 to 2026-01-25 **Total Commits:** 342 **Analysis Scope:** All commits ## Summary Implemented comprehensive real-time communication system with WebSocket support, authentication mechanisms, and extensive test coverage. Major focus on payment integration and API stability. ## Component: Real-Time Communication System **Description:** Built WebSocket-based real-time messaging with presence detection and reconnection logic. **Commits (8):** - `a1b2c3d` (2026-01-20) Add WebSocket connection manager - Files: websocket/manager.py (+245, -0), websocket/events.py (+120, -0) - `e4f5g6h` (2026-01-19) Implement presence detection - Files: websocket/presence.py (+180, -0), tests/test_presence.py (+95, -0) [... more commits ...] ## Component: Authentication & Security **Description:** JWT-based authentication with session management and rate limiting. **Commits (6):** [... commits with details ...] ## Component: Bug Fixes & Maintenance **Description:** Various bug fixes and code maintenance across the codebase. **Commits (5):** [... commits ...] ## Statistics - Total files changed: 127 - Total additions: 3,450 lines - Total deletions: 890 lines - Most active areas: websocket/ (15 commits), auth/ (12 commits), api/ (10 commits)
Sampled Analysis
# Git Contributions Analysis **Repository:** /path/to/repo **Author:** Sean Reed **Analysis Scope:** 200 sampled commits (out of 1,379 total) **Sampling Strategy:** Time-stratified (evenly distributed across contribution timeline) **Date Range:** 2024-01-15 to 2026-01-25 **Total Commits Analyzed:** 200 ## Summary [Analysis based on representative sample...]
Date-Filtered Analysis
# Git Contributions Analysis **Repository:** /path/to/repo **Author:** Sean Reed **Analysis Scope:** Commits from 2025-06-01 to 2026-01-25 **Total Commits in Period:** 342 **Total Commits (All Time):** 1,379 ## Summary [Analysis of recent work...]
Segmented Analysis (Index File)
# Git Contributions Analysis - Index **Repository:** /path/to/repo **Author:** Sean Reed **Total Commits:** 1,379 **Analysis Period:** 2024-01-15 to 2026-01-25 **Number of Segments:** 4 ## Segments ### [Part 1: 2024-Q1-Q2 (Jan-Jun 2024)](git-contributions-analysis-2024-Q1-Q2.md) - Period: 2024-01-15 to 2024-06-30 - Commits: 312 - Focus: Initial project setup, authentication system, core API ### [Part 2: 2024-Q3 (Jul-Sep 2024)](git-contributions-analysis-2024-Q3.md) - Period: 2024-07-01 to 2024-09-30 - Commits: 298 - Focus: Payment integration, database migrations ### [Part 3: 2024-Q4 (Oct-Dec 2024)](git-contributions-analysis-2024-Q4.md) - Period: 2024-10-01 to 2024-12-31 - Commits: 387 - Focus: WebSocket implementation, real-time features ### [Part 4: 2025 (Jan 2025)](git-contributions-analysis-2025.md) - Period: 2025-01-01 to 2025-01-25 - Commits: 382 - Focus: Performance optimization, bug fixes, testing
Segmented Analysis (Individual Segment)
# Git Contributions Analysis - Part 1 of 4 **Repository:** /path/to/repo **Author:** Sean Reed **Period:** 2024-01-15 to 2024-06-30 **Commits in This Period:** 312 **Total Commits (All Time):** 1,379 [See index file](git-contributions-analysis-index.md) for all segments ## Summary [Analysis for this time period...]
Error Handling
Invalid Repository
# If not a git repo echo "Error: Not a git repository. Please specify a valid git repository path." # Offer to navigate to correct directory or provide path
No Commits Found
# If author has no commits echo "No commits found for author: $AUTHOR" # List top 5 contributors and ask user to select: # 1. John Doe (john@example.com) - 150 commits # 2. Jane Smith (jane@example.com) - 87 commits # ...
Large Repository (500-2000 commits)
# If 500-2000 commits for author echo "Found 1,379 commits for Sean Reed (spanning 2024-01-15 to 2026-01-25)." echo "" echo "How would you like to proceed?" echo "" echo "1. Segmented analysis (Recommended) - Create 4 reports, one per time period" echo " (~345 commits each, organized chronologically)" echo "" echo "2. Intelligent sampling - Analyze 200 representative commits" echo " evenly distributed across your contribution timeline" echo "" echo "3. Date range filter - Specify a time period" echo " (e.g., last 6 months: --since='2025-07-01')" echo "" echo "4. Full analysis - Process all 1,379 commits" echo " (may take longer)"
Very Large Repository (>2000 commits)
# If >2000 commits for author echo "Found 3,450 commits for Sean Reed (spanning 2020-03-10 to 2026-01-25)." echo "" echo "To provide focused analysis, please narrow the scope:" echo "" echo "1. Segmented analysis (Recommended) - Create 11 reports, one per" echo " time period (~314 commits each)" echo "" echo "2. Recent work - Last 6 months" echo "" echo "3. Recent work - Last year" echo "" echo "4. Custom date range - Specify dates" echo "" echo "5. Intelligent sampling - 200 commits across entire timeline"
Author Matching
# Multiple email addresses detected for same author echo "Detected multiple identities for this author:" echo " - Sean Reed <sean@work.com> - 250 commits" echo " - Sean Reed <sean@personal.com> - 45 commits" echo "Group all commits together? (yes/no)"
Script Usage
count-user-commits.sh
# Usage ./scripts/count-user-commits.sh <repo_path> <author_pattern> # Example ./scripts/count-user-commits.sh /path/to/repo "Sean Reed" # Output: Single number (commit count) # 1379 # Exit codes # 0 - Success # 1 - Invalid repository # 2 - No commits found
collect-user-commits.sh
# Usage ./scripts/collect-user-commits.sh <repo_path> <author_pattern> [since_date] [until_date] # Example - All commits ./scripts/collect-user-commits.sh /path/to/repo "Sean Reed" # Example - With date filter ./scripts/collect-user-commits.sh /path/to/repo "Sean Reed" "2025-01-01" # Example - With date range ./scripts/collect-user-commits.sh /path/to/repo "Sean Reed" "2024-04-14" "2025-05-26" # Output format (one line per commit) # hash|author_name|author_email|date|subject # a1b2c3d4e5f6|Sean Reed|sean@example.com|2026-01-20 14:30:00 -0800|Add WebSocket manager # Date formats supported # - YYYY-MM-DD (e.g., 2025-01-01) # - @timestamp (e.g., @1704067200) # Exit codes # 0 - Success # 1 - Invalid repository # 2 - No commits found
sample-commits.sh
# Usage ./scripts/sample-commits.sh <repo_path> <author_pattern> <sample_size> # Example ./scripts/sample-commits.sh /path/to/repo "Sean Reed" 200 # Output: Same format as collect-user-commits.sh (subset of commits) # Uses time-stratified sampling for even distribution across timeline # Strategy # - Divides timeline into 10 buckets # - Samples evenly from each bucket # - Ensures coverage across entire contribution history # - If total commits <= sample_size, returns all commits # Exit codes # 0 - Success # 1 - Invalid repository or arguments # 2 - No commits found
calculate-time-segments.sh
# Usage ./scripts/calculate-time-segments.sh <repo_path> <author_pattern> <target_commits_per_segment> # Example ./scripts/calculate-time-segments.sh /path/to/repo "Sean Reed" 300 # Output format (one line per segment) # start_date|end_date|commit_count # 2024-04-14|2025-04-13|398 # 2025-04-14|2025-08-03|366 # 2025-08-04|2025-10-17|295 # 2025-10-18|2025-12-03|320 # Algorithm # - Calculates optimal time segments to keep commits per segment in target range # - Uses adaptive boundaries based on commit density # - Busy periods get shorter time segments, quiet periods get longer segments # - Target range: target_commits ± 100 commits # - Recommended target: 300 (safe range: 250-400) # Exit codes # 0 - Success # 1 - Invalid repository or arguments # 2 - No commits found
extract-commit-details.sh
# Usage ./scripts/extract-commit-details.sh <repo_path> <commit_hash> # Example ./scripts/extract-commit-details.sh /path/to/repo a1b2c3d4e5f6 # Output format # === COMMIT MESSAGE === # [full commit message] # === FILES CHANGED === # filename|additions|deletions # [one line per file] # Exit codes # 0 - Success # 1 - Invalid commit hash
fast-extract-commits.sh
# Usage
./scripts/fast-extract-commits.sh <repo_path> <author_pattern> [options]
# Options
# --since=DATE Only commits after this date
# --until=DATE Only commits before this date
# --use-cache Use cached data if available (default)
# --no-cache Force fresh extraction
# --output=FILE Write to file instead of stdout
# Example - Fast extraction with caching
./scripts/fast-extract-commits.sh /path/to/repo "Sean Reed"
# Example - Force fresh extraction
./scripts/fast-extract-commits.sh /path/to/repo "Sean Reed" --no-cache
# Example - With date filter
./scripts/fast-extract-commits.sh /path/to/repo "Sean Reed" --since="2025-01-01"
# Output format: JSONL (one JSON object per commit)
# {"hash":"...","author":"...","email":"...","date":"...","subject":"...","body":"...","files":[...]}
# Performance:
# - First run (173 commits): ~5-10 seconds
# - Cached run: ~1-2 seconds
# - Compared to original: 3-5x faster
# Exit codes
# 0 - Success
# 1 - Invalid repository
# 2 - No commits found
cache-manage.sh
# Usage ./scripts/cache-manage.sh <command> [repo_path] # Commands ./scripts/cache-manage.sh status /path/to/repo # Show cache info ./scripts/cache-manage.sh clear /path/to/repo # Clear cache ./scripts/cache-manage.sh info /path/to/repo # Show metadata # Example output for 'status': # Cache Directory: /path/to/repo/.git-analysis-cache # Cache File: /path/to/repo/.git-analysis-cache/commit-details.jsonl # Size: 44K # Commits cached: 54 # Last updated: 2026-01-25 13:17:22 # Cache location: <repo>/.git-analysis-cache/ # Recommended: Add .git-analysis-cache/ to .gitignore
batch-extract-commits.sh
# Usage (internal, used by fast-extract-commits.sh) ./scripts/batch-extract-commits.sh <repo_path> <author_pattern> [since] [until] # Extracts all commit data in a single git command # Output: JSONL format # Much faster than calling extract-commit-details.sh per commit # Example ./scripts/batch-extract-commits.sh /path/to/repo "Sean Reed" # Exit codes # 0 - Success # 1 - Invalid repository # 2 - No commits found
Implementation Details
Step 1: Detect Repository
# If no argument provided, use current directory
REPO_PATH=${1:-.}
# Validate it's a git repository
if ! git -C "$REPO_PATH" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
echo "Error: Not a git repository: $REPO_PATH"
# Ask user for correct path
fi
Step 2: Detect Author
# Try to auto-detect from git config
AUTHOR=$(git -C "$REPO_PATH" config user.name 2>/dev/null)
# If not found or need confirmation
if [ -z "$AUTHOR" ]; then
# Ask user for author name or email
# Use AskUserQuestion tool to prompt
fi
Step 3: Collect Data
# Run fast extraction with caching
commits_jsonl=$(bash ~/.claude/skills/analyze-git-contributions/scripts/fast-extract-commits.sh "$REPO_PATH" "$AUTHOR")
# Check exit code
if [ $? -eq 2 ]; then
# No commits found - list top contributors
fi
# The commits are now in JSONL format, ready for AI analysis
# Each line contains: hash, author, email, date, subject, body, files[]
# No need to run extract-commit-details.sh for each commit - it's all included!
# Can parse with jq if needed: echo "$commits_jsonl" | jq '.hash'
Step 4: AI Analysis
- •Parse all commit data into structured format
- •Analyze commit messages for semantic patterns
- •Look for file co-modification patterns
- •Identify temporal clusters (related commits close in time)
- •Group commits into functional components
- •Generate descriptions for each component
Step 5: Generate Report
- •Create markdown with metadata
- •Add summary section
- •For each component, list commits with details
- •Calculate and add statistics
- •Write to file (default:
git-contributions-analysis.md) or output directly
Reusability Features
Flexible Repository Input
- •Current directory (default)
- •Absolute path
- •Relative path
- •Validates and resolves to absolute path
Smart Author Detection
- •Auto-detect from
git config user.name - •Pattern matching for partial names
- •Email-based filtering
- •Case-insensitive matching
- •Handle multiple email addresses
Output Options
- •Generate markdown file in repository root
- •Or output to stdout for piping
- •Filename:
git-contributions-analysis-{author}-{date}.md
Token Budget Management
Understanding Token Limits
- •Read tool has a 25,000 token limit for file contents
- •Commit details average ~65 tokens per commit
- •Safe processing limit: ~230 commits (15,000 tokens)
- •Leave ~10,000 tokens for AI analysis and context
Scaling Strategies
Tier 1: Auto-Process (<500 commits)
- •Process all commits with full details
- •No user intervention needed
- •Estimated tokens: 500 × 65 = 32,500 tokens
- •Safe due to incremental processing
Tier 2: User Choice (500-2000 commits)
- •Present all four strategies to user
- •Recommended: Segmented analysis for complete coverage
- •Alternative: Sampling for quick overview
- •Full analysis still available but may take longer
Tier 3: Require Scoping (>2000 commits)
- •Require user to choose strategy
- •Do not offer "process all" option
- •Prevent token overflow
- •Segmented analysis recommended
Token Budget Examples
Segmented Analysis (300 commits per segment):
- •Per segment: 300 × 65 = 19,500 tokens
- •Safely under 25,000 token limit
- •Each segment generates separate report
- •No accumulated token pressure
Intelligent Sampling (200 commits):
- •Total: 200 × 65 = 13,000 tokens
- •Well under limit with room for analysis
- •Representative across timeline
Date Filtering (6 months, ~300 commits):
- •Total: 300 × 65 = 19,500 tokens
- •Safe for focused period analysis
Optional Enhancements (Future)
Branch Filtering
/analyze-git-contributions --branch=main
Multiple Authors
/analyze-git-contributions --authors="John Doe,Jane Smith"
Export Formats
- •JSON for programmatic use
- •HTML with interactive visualization
- •CSV for spreadsheet analysis
Success Criteria
- •✅ Auto-discovers repository and author
- •✅ Efficiently collects git data using scripts
- •✅ AI produces semantically meaningful component groupings
- •✅ Markdown output is readable and well-organized
- •✅ Works across different repositories without modification
- •✅ Handles edge cases gracefully (no commits, large repos, binary files)
- •✅ Scripts have proper error handling and exit codes
- •✅ Reusable for different authors and repositories
- •✅ Scales to repositories with 500-10,000+ commits without token errors
- •✅ Provides user control over analysis scope (sampling, filtering, segmentation)
- •✅ Segmented analysis covers ALL commits across multiple manageable reports
- •✅ Maintains meaningful AI analysis quality across all strategies
- •✅ Token usage stays under limits for all analysis types
- •✅ Clear communication about what was analyzed (scope metadata)
Reference
For bash script patterns and error handling, refer to existing skills:
- •
~/.claude/skills/reading-logs/SKILL.md- "Count first, filter/sample, then read" principle - •
~/.claude/skills/reading-logs/scripts/aggregate-errors.sh- Error handling patterns - •Standard git command patterns and output parsing
Core Principle (from reading-logs skill)
"Count first, filter/sample, then read" - Never process all data without checking volume first
This skill applies the same principle:
- •Count commits before processing
- •Choose appropriate strategy based on count
- •Filter, sample, or segment as needed
- •Process within token budget
- •Maintain analysis quality throughout