Analyze Error Logs Skill
Purpose
Systematically analyze application logs to identify errors, understand patterns, and pinpoint root causes for debugging.
Log Analysis Process
1. Collect Relevant Logs
bash
# Recent application logs
tail -n 500 /var/log/app.log
# Filter by error level
grep -i "error\|exception\|fatal" /var/log/app.log
# Time-specific logs
grep "2024-02-09 14:" /var/log/app.log
# Multi-file search
find /var/log -name "*.log" -exec grep -l "OutOfMemory" {} \;
###2. Identify Error Patterns
Look for:
- •Repeated errors: Same error appearing multiple times
- •Error clusters: Errors occurring in rapid succession
- •Timing patterns: Errors at specific times (e.g., midnight batch jobs)
- •User patterns: Errors for specific users or actions
- •Cascading failures: One error triggering others
3. Extract Key Information
From each error, extract:
- •Timestamp: When did it occur?
- •Error type: Exception name, error code
- •Error message: Human-readable description
- •Stack trace: Code path that led to error
- •Context: User ID, request ID, session ID
- •Input data: What data caused the issue?
4. Common Error Patterns
Database Connection Errors
code
Error: ECONNREFUSED 127.0.0.1:5432 → Database not running or connection refused → Check database service status
Authentication Errors
code
Error: jwt expired → Token expiration issue → Check token TTL and refresh logic
Null/Undefined Errors
code
TypeError: Cannot read property 'id' of undefined → Missing data validation → Check data flow and null checks
Rate Limiting
code
Error: 429 Too Many Requests → Rate limit exceeded → Check rate limiter configuration
5. Correlation Analysis
bash
# Find all errors with same correlation ID
grep "correlation-id: abc-123" /var/log/app.log
# Timeline view of errors
grep "ERROR" /var/log/app.log | awk '{print $1, $2, $NF}'
6. Root Cause Identification
Ask:
- •What triggered this error? (User action, cron job, external event)
- •Why did it fail? (Missing data, invalid state, network issue)
- •Where in the code? (Stack trace shows exact location)
- •Has this happened before? (Search historical logs)
- •What changed recently? (Recent deployments, config changes)
Log Analysis Tools
- •grep/awk: Command-line log analysis
- •jq: JSON log parsing
- •Splunk/ELK: Enterprise log aggregation
- •CloudWatch/Stackdriver: Cloud-native logging
- •Sentry/Rollbar: Error tracking platforms
Example Analysis
Scenario: Users reporting login failures
code
Step 1: Search for login errors $ grep "login" /var/log/app.log | grep -i "error" Step 2: Identify pattern - All errors occur for OAuth login - Error message: "Invalid token" - Started at 2024-02-09 14:30:00 Step 3: Correlate with deployments $ git log --since="2024-02-09 14:00" --until="2024-02-09 15:00" - Found: OAuth token validation logic changed in recent deploy Step 4: Root cause - Recent deployment changed token validation - New validation is stricter, rejecting valid tokens - Solution: Revert validation change or fix validation logic
Best Practices
- •Structure logs: Use structured logging (JSON) for easier parsing
- •Add context: Include correlation IDs, user IDs, request IDs
- •Log levels: Use appropriate levels (DEBUG, INFO, WARN, ERROR, FATAL)
- •Centralize logs: Aggregate logs from all services
- •Set up alerts: Proactive notification for critical errors
Related Skills
- •
propose_fix- Propose solutions based on analysis