Log Analysis
Overview
Logs are critical for debugging and monitoring. Effective log analysis quickly identifies issues and enables root cause analysis.
When to Use
- •Troubleshooting errors
- •Performance investigation
- •Security incident analysis
- •Auditing user actions
- •Monitoring application health
Instructions
1. Structured Logging
javascript
// Good: Structured logs (machine-readable)
logger.info({
level: 'INFO',
timestamp: '2024-01-15T10:30:00Z',
service: 'auth-service',
user_id: '12345',
action: 'user_login',
status: 'success',
duration_ms: 150,
ip_address: '192.168.1.1'
});
// Bad: Unstructured logs (hard to parse)
console.log('User 12345 logged in successfully in 150ms from 192.168.1.1');
// JSON Format (Elasticsearch friendly)
{
"@timestamp": "2024-01-15T10:30:00Z",
"level": "ERROR",
"service": "api-gateway",
"trace_id": "abc123",
"message": "Database connection failed",
"error": {
"type": "ConnectionError",
"code": "ECONNREFUSED"
},
"context": {
"database": "users",
"operation": "SELECT"
}
}
2. Log Levels & Patterns
yaml
Log Levels: DEBUG: Detailed diagnostic info - Variable values - Function entry/exit - Intermediate calculations - Use: Development only INFO: General informational messages - Startup/shutdown - User actions - Configuration changes - Use: Production (normal operations) WARN: Warning messages (potential issues) - Deprecated API usage - Performance degradation - Resource limits approaching - Use: Production (investigate soon) ERROR: Error conditions - Failed operations - Exceptions - Failed requests - Use: Production (action required) FATAL/CRITICAL: System unusable - Critical failures - Out of memory - Data corruption - Use: Production (immediate action) --- Log Patterns: Request Logging: - Request ID (trace_id) - Method + Path - Status code - Duration - Request size / response size Error Logging: - Error type/code - Error message - Stack trace - Context (user_id, session_id) - Timestamp Business Events: - Event type - User involved - Impact/importance - Timestamp - Relevant context
3. Log Analysis Tools
yaml
Log Aggregation:
ELK Stack (Elasticsearch, Logstash, Kibana):
- Logstash: Parse and process logs
- Elasticsearch: Search and analyze
- Kibana: Visualization and dashboards
- Use: Large scale, complex queries
Splunk:
- Comprehensive log management
- Real-time search and analysis
- Dashboards and alerts
- Use: Enterprise (expensive)
CloudWatch (AWS):
- Integrated with AWS services
- Log Insights for querying
- Dashboards
- Use: AWS-based systems
Datadog:
- Application performance monitoring
- Log management
- Real-time alerts
- Use: SaaS monitoring
---
Log Analysis Techniques:
Grep/Awk:
grep "ERROR" app.log
awk '{print $1, $4}' app.log
Filtering:
Filter by timestamp
Filter by service
Filter by error type
Filter by user
Searching:
Search for error patterns
Search for user actions
Search trace IDs
Search IP addresses
Aggregation:
Count occurrences
Group by error type
Calculate duration percentiles
Rate of errors over time
4. Common Log Analysis Queries
yaml
Find errors in past hour: timestamp: last_1h AND level: ERROR Track user activity: user_id: 12345 AND action: * Find slow requests: duration_ms: >1000 AND level: INFO Analyze error rate by service: level: ERROR | stats count by service Find failed database operations: error.type: "DatabaseError" | stats count Trace request flow: trace_id: "abc123" | sort by timestamp --- Checklist: [ ] Structured logging implemented [ ] All errors logged with context [ ] Request IDs/trace IDs used [ ] Sensitive data not logged (passwords, tokens) [ ] Log levels used appropriately [ ] Log retention policy set [ ] Log sampling for high-volume events [ ] Alerts configured for errors [ ] Dashboards created [ ] Regular log review scheduled [ ] Log analysis tools accessible [ ] Team trained on querying logs
Key Points
- •Use structured JSON logging
- •Include trace IDs for request tracking
- •Log appropriate levels (DEBUG/INFO/ERROR)
- •Never log sensitive data (passwords, tokens)
- •Aggregate logs centrally
- •Create dashboards for key metrics
- •Alert on error rates and critical issues
- •Retain logs appropriately
- •Search logs by trace ID for troubleshooting
- •Review logs regularly for patterns