GASP Diagnostics
Enables comprehensive Linux system diagnostics using GASP's AI-optimized monitoring output. Actively fetches metrics from hosts and provides intelligent analysis with context-aware interpretation.
Fetching GASP Metrics
When user mentions a host or requests a system check:
- •
Fetch the metrics endpoint
codeweb_fetch("http://{hostname}:8080/metrics") - •
Hostname formats supported
- •mDNS/local:
accelerated.local,hyperion.local - •DNS names:
proxmox1,dev-server,workstation - •IP addresses:
192.168.1.100
- •mDNS/local:
- •
Default port: 8080 (unless user specifies otherwise)
- •
Error handling
- •Host unreachable: Inform user, suggest checking if GASP is running
- •Port closed/refused: Try suggesting
systemctl status gaspon the host - •JSON parse error: GASP may not be installed or wrong endpoint
- •Timeout: Network issue or host down
- •
Multi-host queries: If user mentions multiple hosts, fetch each in sequence and compare
Quick Diagnosis Workflow
For any system check request:
- •Fetch metrics from specified host(s)
- •Check summary first: Look at
summary.healthandsummary.concerns[] - •Identify issues using metric correlations below
- •Report findings with severity and specific recommendations
Trigger Examples
These user messages should trigger this skill and active fetching:
- •"Check hyperion for me"
- •"What's going on with accelerated.local?"
- •"Is proxmox1 having issues?"
- •"Compare hyperion and proxmox1"
- •"Why is my system slow?" (fetch localhost)
- •"Diagnose 192.168.1.50"
- •"Check all my proxmox nodes"
Metric Interpretation
Health Summary
- •
summary.health: Quick assessment- •"healthy": No action needed
- •"degraded": Issues present but not critical
- •"critical": Immediate attention required
- •
summary.concerns[]: Pre-analyzed issues to investigate first - •
summary.recent_changes[]: Context for current state
CPU Analysis
Load ratio = load_avg_1m / cores:
- •< 0.7: Normal usage
- •0.7-1.0: Busy but healthy
- •1.0-2.0: Saturated (may cause slowness)
- •> 2.0: Severe overload
Key indicators:
- •
trend: "increasing" is concerning even if current load is acceptable - •
baseline_load: Delta from baseline is more important than absolute value - •
top_processes[]: Check for unexpected CPU hogs
Memory Analysis
Red flags (priority order):
- •
oom_kills_recent > 0: CRITICAL - system killed processes, find memory hog immediately - •
swap_used_mb > 0: Performance degradation in progress - •
pressure_pct > 5%: System struggling with memory contention - •
usage_percent > 90%: Getting close to limits
Important: Linux uses memory for cache, so high usage_percent alone is normal. Focus on pressure and swap.
Disk I/O
Saturation indicators:
- •
io_wait_ms > 10: Significant disk bottleneck - •
queue_depthconsistently high: Disk can't keep up - •High
read_iopsorwrite_iopswith slow response: Disk performance issue
Storage capacity:
- •
usage_percent > 90%: Running out of space - •
usage_percent > 95%: Critical - will cause failures soon
Network
- •
rx_bytes_per_sec/tx_bytes_per_sec: Check for unexpected traffic spikes - •
errors > 0ordrops > 0: Network hardware/configuration issue - •Large number of
time_waitconnections: May indicate connection leak
Process Intelligence
- •
zombie > 0: Process management bug (usually benign but indicates issue) - •Processes in
D state: Stuck in uninterruptible sleep (disk or kernel issue) - •
new_since_last[]: Check for unexpected process spawning
Systemd Services
- •
units_failed > 0: Checkfailed_units[]array - •
recent_restarts[]: May indicate instability
Log Summary
- •
errors_last_interval: Elevated error rate indicates problems - •
message_rate_per_min: Spikes suggest logging storm or serious issue - •Review
recent_errors[]for specific problems
Desktop Metrics (when present)
- •
gpu.utilization_pctvs CPU: Identify GPU-bound vs CPU-bound workloads - •
gpu.temperature_c > 85: Thermal throttling likely - •
active_window: Provides context for resource usage
Common System Patterns
Development Workstation (Expected)
- •High memory usage from IDEs, browsers
- •Firefox/Chrome often in top memory consumers
- •Docker daemon using CPU/memory
- •VSCode, JetBrains IDEs in top processes
- •Baseline load: 10-30% of cores
Container Host (Expected)
- •Elevated baseline load (many processes)
- •dockerd/containerd in top processes
- •50-70% memory usage normal
- •Many processes in top list
Proxmox/Virtualization Host (Expected)
- •Baseline load proportional to VM count
- •Consistent low-level resource usage
- •~2GB overhead for Proxmox itself
- •Multiple QEMU/KVM processes
GPU Workload (Expected)
- •High GPU utilization with lower CPU
- •Significant GPU memory usage
- •Common for: rendering, ML inference, gaming
Multi-Host Analysis
When checking multiple hosts:
- •Fetch all hosts first (parallel thinking)
- •Compare baselines: Identify outliers
- •Look for correlations: Network event vs individual host issue
- •Check recent_changes: Migrations, deployments, package updates
- •Identify the odd one out: Which host differs from the pattern?
Example analysis pattern:
Host 1: Load 2.1/8 cores (26%), normal Host 2: Load 7.8/8 cores (97%), ATTENTION NEEDED ← outlier Host 3: Load 1.9/8 cores (24%), normal Focus on Host 2 - investigate top_processes
Diagnosis Strategies
"System is slow"
- •Check load ratio (CPU saturation?)
- •Check io_wait (disk bottleneck?)
- •Check memory pressure (swapping?)
- •Identify top consumer in relevant category
- •Assess if consumption is expected for that process
"High memory usage"
- •First: Check pressure_pct (real issue or just cache?)
- •Check swap_used_mb (actual problem?)
- •Find top memory consumers
- •Check process uptime (leak or normal?)
- •Compare to baseline (delta more important than absolute)
"Unexpected behavior"
- •Check recent_changes for clues
- •Review systemd failed units
- •Check recent_errors in logs
- •Look for new processes since last snapshot
- •Compare current metrics to baseline
Reporting Guidelines
When reporting findings:
- •Start with verdict: "Healthy", "Issue found", "Critical problem"
- •Be specific: Name the process/service causing issues
- •Provide context: Is this expected for this host type?
- •Give actionable recommendations: What should user do?
- •Include relevant metrics: Back up findings with data
Good example:
"Issue found on accelerated.local: Memory pressure at 8.2%. The postgres container started swapping 2 hours ago and is now using 12GB RAM (up from 4GB baseline). This likely indicates a query leak. Recommend checking recent queries and restarting the container."
Bad example:
"Memory usage is high. You might want to look into it."
Advanced Diagnostics
For complex issues or when initial analysis is unclear, consult:
- •references/diagnostic-workflows.md - Detailed diagnostic procedures
- •references/common-patterns.md - Infrastructure-specific patterns
Using with Provided JSON
If user pastes GASP JSON instead of requesting a fetch:
- •Analyze the provided JSON using all guidance above
- •Don't attempt to fetch (data already provided)
- •Apply same interpretation and reporting guidelines