Reading Logs
IRON LAW: Filter first, then read. Never open a large log file without narrowing it first.
Core Principles
- •Filter first - Search/filter before reading
- •Iterative narrowing - Start broad (severity), refine with patterns/time
- •Small context windows - Fetch 5-10 lines around matches, not entire files
- •Summaries over dumps - Present findings concisely, not raw output
Tool Strategy
1. Find Logs (Glob)
bash
**/*.log **/logs/** **/*.log.* # Rotated logs
2. Filter with Grep
bash
# Severity search grep -Ei "error|warn" app.log # Exclude noise grep -i "ERROR" app.log | grep -v "known-benign" # Context around matches grep -C 5 "ERROR" app.log # 5 lines before/after # Time window grep "2025-12-04T11:" app.log | grep "ERROR" # Count occurrences grep -c "connection refused" app.log
3. Chain with Bash
bash
# Recent only tail -n 2000 app.log | grep -Ei "error" # Top recurring grep -i "ERROR" app.log | sort | uniq -c | sort -nr | head -20
4. Read Last
Only after narrowing with Grep. Use context flags (-C, -A, -B) to grab targeted chunks.
Investigation Workflows
Single Incident
- •Get time window, error text, correlation IDs
- •Find logs covering that time (
Glob) - •Time-window grep:
grep "2025-12-04T11:" service.log | grep -i "timeout" - •Trace by ID:
grep "req-abc123" *.log - •Expand context:
grep -C 10 "req-abc123" app.log
Recurring Patterns
- •Filter by severity:
grep -Ei "error|warn" app.log - •Group and count:
grep -i "ERROR" app.log | sort | uniq -c | sort -nr | head - •Exclude known noise
- •Drill into top patterns with context
Red Flags
- •Opening >10MB file without filtering
- •Using Read before Grep
- •Dumping raw output without summarizing
- •Searching without time bounds on multi-day logs
Utility Scripts
For complex operations, use the scripts in scripts/:
bash
# Aggregate errors by frequency (normalizes timestamps/IDs) bash scripts/aggregate-errors.sh app.log "ERROR" 20 # Extract and group stack traces by type bash scripts/extract-stack-traces.sh app.log "NullPointer" # Parse JSON logs with jq filter bash scripts/parse-json-logs.sh app.log 'select(.level == "error")' # Show error distribution over time (hourly/minute buckets) bash scripts/timeline.sh app.log "ERROR" hour # Trace a request ID across multiple log files bash scripts/trace-request.sh req-abc123 logs/ # Find slow operations by duration bash scripts/slow-requests.sh app.log 1000 20
Output Format
- •State what you searched (files, patterns)
- •Provide short snippets illustrating the issue
- •Explain what likely happened and why
- •Suggest next steps