AgentSkillsCN

log-analyze

解析并分析系统日志与应用日志。当用户说“在日志中查找错误”、“分析日志”、“检查 journalctl”、“日志里有什么”、“从日志中进行调试”,或要求深入调查日志文件时,即可使用此技能。

SKILL.md
--- frontmatter
name: log-analyze
description: Parse and analyze system and application logs. Use when the user says "find errors in logs", "analyze logs", "check journalctl", "what's in the logs", "debug from logs", or asks to investigate log files.
allowed-tools: Bash, Read, Grep

Log Analysis

Parse and analyze logs to identify errors, patterns, and issues.

Instructions

  1. Identify log source (journalctl, file, application)
  2. Establish time range of interest
  3. Filter for relevant entries (errors, specific service)
  4. Identify patterns and root causes
  5. Summarize findings with evidence

Common log sources

bash
# Systemd journal
journalctl -u <service> --since "1 hour ago"
journalctl -p err --since today
journalctl -b  # current boot

# System logs
/var/log/syslog
/var/log/messages
/var/log/auth.log

# Application logs
/var/log/nginx/error.log
/var/log/apache2/error.log
/var/log/postgresql/

Journalctl patterns

bash
# Errors only
journalctl -p err -b

# Specific service with context
journalctl -u nginx --since "2024-01-01" --until "2024-01-02"

# Follow live
journalctl -f -u myapp

# Kernel messages
journalctl -k

# JSON output for parsing
journalctl -o json -u myapp | jq .

# Disk usage
journalctl --disk-usage

Analysis patterns

bash
# Count errors by type
grep -oP 'ERROR: \K[^:]+' app.log | sort | uniq -c | sort -rn

# Find IPs with most errors
grep "error" access.log | grep -oP '\d+\.\d+\.\d+\.\d+' | sort | uniq -c | sort -rn

# Time distribution of errors
grep "ERROR" app.log | grep -oP '^\d{4}-\d{2}-\d{2} \d{2}' | uniq -c

# Errors around a specific time
grep -A5 -B5 "15:30:" error.log

Common error patterns

PatternIndicates
OOM, "Killed"Out of memory
ENOSPCDisk full
ECONNREFUSEDService not running/listening
ETIMEDOUTNetwork/firewall issue
Permission deniedFile permissions or SELinux
"too many open files"ulimit exhausted

Output format

code
## Summary
[Brief description of what was found]

## Errors Found
- [timestamp] [error message] (occurred N times)

## Root Cause Analysis
[Explanation of likely cause]

## Recommendations
1. [Action to fix]
2. [Preventive measure]

Rules

  • MUST establish time range before analyzing
  • MUST quantify error frequency (not just "found errors")
  • MUST provide specific log excerpts as evidence
  • Never expose sensitive data from logs (passwords, tokens)
  • Always check for time correlation between errors