Log Analysis
Parse and analyze logs to identify errors, patterns, and issues.
Instructions
- •Identify log source (journalctl, file, application)
- •Establish time range of interest
- •Filter for relevant entries (errors, specific service)
- •Identify patterns and root causes
- •Summarize findings with evidence
Common log sources
bash
# Systemd journal journalctl -u <service> --since "1 hour ago" journalctl -p err --since today journalctl -b # current boot # System logs /var/log/syslog /var/log/messages /var/log/auth.log # Application logs /var/log/nginx/error.log /var/log/apache2/error.log /var/log/postgresql/
Journalctl patterns
bash
# Errors only journalctl -p err -b # Specific service with context journalctl -u nginx --since "2024-01-01" --until "2024-01-02" # Follow live journalctl -f -u myapp # Kernel messages journalctl -k # JSON output for parsing journalctl -o json -u myapp | jq . # Disk usage journalctl --disk-usage
Analysis patterns
bash
# Count errors by type
grep -oP 'ERROR: \K[^:]+' app.log | sort | uniq -c | sort -rn
# Find IPs with most errors
grep "error" access.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn
# Time distribution of errors
grep "ERROR" app.log | grep -oP '^\d{4}-\d{2}-\d{2} \d{2}' | uniq -c
# Errors around a specific time
grep -A5 -B5 "15:30:" error.log
Common error patterns
| Pattern | Indicates |
|---|---|
| OOM, "Killed" | Out of memory |
| ENOSPC | Disk full |
| ECONNREFUSED | Service not running/listening |
| ETIMEDOUT | Network/firewall issue |
| Permission denied | File permissions or SELinux |
| "too many open files" | ulimit exhausted |
Output format
code
## Summary [Brief description of what was found] ## Errors Found - [timestamp] [error message] (occurred N times) ## Root Cause Analysis [Explanation of likely cause] ## Recommendations 1. [Action to fix] 2. [Preventive measure]
Rules
- •MUST establish time range before analyzing
- •MUST quantify error frequency (not just "found errors")
- •MUST provide specific log excerpts as evidence
- •Never expose sensitive data from logs (passwords, tokens)
- •Always check for time correlation between errors