<Use_When>
- •User says "heartbeat", "system health", "check status", "is everything ok"
- •User says "monitor", "what's running", "system check", "how's the system"
- •User wants to set up periodic health monitoring on a cron schedule
- •User asks "are there any errors?", "check CI", "check Sentry"
- •User wants a morning briefing with system status, calendar, and alerts
- •A long-running process needs periodic health verification
- •User says "alert me if CPU goes above 80%" or similar threshold-based monitoring </Use_When>
<Do_Not_Use_When>
- •One-time quick check of a single metric -- use Bash directly (e.g.,
df -h,top -l 1) - •Monitoring a specific test run -- use Bash with the test command directly
- •Sending a manual message to Telegram -- use telegram-control skill instead
- •Checking memory database health -- use memory-mgr skill with sc_memory_stats </Do_Not_Use_When>
<Why_This_Exists> Developers often miss slow-building problems: disk filling up, memory leaks in dev servers, failing CI pipelines they forgot about, new Sentry errors from last night's deploy, or calendar conflicts for upcoming meetings. Without proactive monitoring, these issues become emergencies. Heartbeat runs collectors on a schedule (or on-demand), evaluates against thresholds, and pushes alerts to Telegram so the developer knows about problems even when they are away from the terminal. It provides the "ops awareness" that solo developers lack. </Why_This_Exists>
<Execution_Policy>
- •Run all 7 collectors in parallel for maximum speed (independent data sources)
- •Timeout per collector: 15 seconds (prevent one slow collector from blocking the report)
- •If a collector fails: log the error, mark that section as "unavailable", continue with others
- •Max report frequency: no more than 1 full heartbeat per 5 minutes (prevent spam)
- •Alert deduplication: same alert not re-sent within 30 minutes
- •Critical alerts: always send to Telegram immediately
- •Warning alerts: batch and send every 15 minutes or with next heartbeat
- •Info level: include in report only, no Telegram notification </Execution_Policy>
Collector 1 - System Metrics:
- •Bash:
top -l 1 -s 0 | head -12for CPU and memory - •Bash:
df -h /for disk usage - •Bash:
uptimefor load average and uptime - •Bash:
sysctl hw.memsizefor total memory - •Output: CPU%, memory%, disk%, load average, uptime
Collector 2 - Dev Environment:
- •Bash:
node --version,npm --version,python3 --version - •Bash:
git status --porcelain | wc -lfor uncommitted changes - •Bash:
npm test 2>&1 | tail -5or project test command (if configured) - •Bash:
npx tsc --noEmit 2>&1 | tail -10for TypeScript errors - •Output: tool versions, uncommitted file count, test status, type errors
Collector 3 - GitHub CI:
- •Bash:
gh run list --limit 5 --json status,conclusion,name,createdAt - •Bash:
gh pr list --json number,title,mergeable,reviews - •Output: recent CI run results, open PR status, merge conflicts
Collector 4 - Sentry Errors:
- •Bash:
curl -s -H "Authorization: Bearer $SENTRY_AUTH_TOKEN" "https://sentry.io/api/0/projects/$SENTRY_ORG/$SENTRY_PROJECT/issues/?query=is:unresolved&limit=5" - •Output: unresolved error count, top error titles, first/last seen
Collector 5 - Calendar:
- •sc_osascript: query Calendar.app for today's events via AppleScript
- •Script:
tell application "Calendar" to get summary of events of calendar "Work" whose start date >= (current date) and start date <= ((current date) + 1 * days) - •Output: today's remaining events with times
Collector 6 - Process Health:
- •Bash:
ps aux | head -20sorted by CPU usage - •Bash:
lsof -i -P | grep LISTENfor open ports - •Bash: check if key services are running (node, docker, postgres, etc.)
- •Output: top CPU processes, listening ports, service status
Collector 7 - Custom Collectors:
- •Read user-defined collector scripts from
~/superclaw/collectors/ - •Each script is a shell script that outputs JSON:
{"status":"ok|warn|critical","metric":value,"message":"..."} - •Execute each script with 10-second timeout
- •Output: custom metric results
- •
Phase 2 - Evaluate Against Thresholds: Compare collected metrics to alert rules
- •Default thresholds (overridable in config):
Metric Warning Critical CPU % > 70% > 90% Memory % > 75% > 90% Disk % > 80% > 95% Load Average > 4.0 > 8.0 Uncommitted Files > 20 > 50 Failed CI Runs >= 1 >= 3 Unresolved Sentry >= 5 >= 20 TypeScript Errors >= 1 >= 10 - •Each metric evaluated independently
- •Overall status = worst individual status (any critical -> overall critical)
- •
Phase 3 - Generate Report: Format results into a structured heartbeat report
- •Format:
code=== SuperClaw Heartbeat === Time: 2026-02-12 10:30:00 Overall: OK | WARN | CRITICAL [System] CPU: 23% (ok) | Memory: 61% (ok) | Disk: 45% (ok) Load: 1.2 | Uptime: 14d 3h [Dev Environment] Node: v22.1.0 | TypeScript Errors: 0 (ok) Uncommitted: 3 files | Tests: passing [GitHub CI] Last 5 runs: 4 passed, 1 failed Open PRs: 2 (1 needs review) [Sentry] Unresolved: 3 issues (warn) Top: "TypeError: Cannot read property 'id' of undefined" (12 events) [Calendar] 14:00 - Team standup (30min) 16:00 - Design review (1hr) [Processes] Top CPU: node (8.2%), postgres (3.1%), docker (2.4%) Listening: :3000 (node), :5432 (postgres), :6379 (redis) [Alerts] WARN: 1 failed CI run on main branch WARN: 3 unresolved Sentry issues
- •
Phase 4 - Alert on Critical/Warn: Send notifications for threshold violations
- •Critical: immediately send to Telegram via
sc_send_message - •Warning: include in report, send to Telegram if configured for warn-level
- •Info: include in report only
- •Alert message format: "[CRITICAL] CPU at 92% on hostname | Heartbeat 10:30"
- •Deduplicate: skip if same alert was sent within the last 30 minutes
- •Critical: immediately send to Telegram via
- •
Phase 5 - Send to Telegram (if configured):
- •Tool:
sc_send_messagewith channel="telegram" - •Full report sent as formatted text (truncated to 4096 chars if needed)
- •Critical alerts sent as separate urgent messages
- •Respect quiet hours if configured (no non-critical alerts between 22:00-08:00)
- •Tool:
- •
Phase 6 - Store Results for Trending:
- •Write heartbeat result to
~/superclaw/heartbeat/history/YYYY-MM-DD-HH-mm.json - •Keep last 7 days of history (auto-prune older files)
- •Format: JSON with all collector results, thresholds, alert states
- •Enables "show me CPU trend for the last week" queries
- •Write heartbeat result to
- •
Phase 7 - Schedule Next Run (if periodic monitoring requested):
- •Tool:
sc_cron_addwith params:- •
name: "heartbeat" or user-specified name - •
schedule: cron expression (e.g., "*/30 * * * *" for every 30 minutes) - •
command: "/run heartbeat" (routes through channel router)
- •
- •Tool:
sc_cron_listto verify the job was registered </Steps>
- •Tool:
<Tool_Usage> Gateway & Messaging (2 tools):
- •
sc_gateway_status-- Check if OpenClaw gateway is running before attempting Telegram alerts; no params - •
sc_send_message-- Send heartbeat report or alert to Telegram; params:channel(string, "telegram"),text(string, formatted report/alert)
Scheduling (2 tools):
- •
sc_cron_add-- Schedule periodic heartbeat runs; params:name(string),schedule(string, cron expression),command(string) - •
sc_cron_list-- List active cron jobs to verify heartbeat schedule; no params
System Data (via Bash):
- •
top -l 1 -s 0 | head -12-- CPU and memory usage snapshot - •
df -h /-- Disk usage for root volume - •
uptime-- System uptime and load averages - •
ps aux --sort=-%cpu | head -10-- Top CPU-consuming processes - •
lsof -i -P | grep LISTEN-- Listening network ports
GitHub Data (via Bash with gh CLI):
- •
gh run list --limit 5 --json status,conclusion,name,createdAt-- Recent CI runs - •
gh pr list --json number,title,mergeable-- Open pull requests
Calendar Data (via SuperClaw):
- •
sc_osascript-- Query Calendar.app via AppleScript for today's events
Notification Fallback:
- •
sc_notify-- Send macOS notification if Telegram is unavailable; params:title,message</Tool_Usage>
<Escalation_And_Stop_Conditions>
- •Stop if all collectors fail -- system may be in a broken state, inform user to check manually
- •Stop if sc_cron_add fails repeatedly -- OpenClaw cron subsystem may not be running
- •Escalate if critical alerts persist across 3+ consecutive heartbeats -- problem is not self-resolving
- •Escalate if disk usage is above 95% -- immediate user action required
- •Escalate if GitHub CI has been failing for more than 24 hours -- may indicate a broken main branch
- •Warn if Sentry collector fails with auth error -- token may have expired
- •Warn if Calendar collector returns permission error -- Automation permission needed for Calendar.app
- •Fallback to sc_notify (macOS notification) if Telegram gateway is unreachable for alerts </Escalation_And_Stop_Conditions>
<Final_Checklist>
- • All enabled collectors executed (failed ones marked as "unavailable", not blocking)
- • Thresholds evaluated for every collected metric
- • Report formatted with clear status indicators (ok/warn/critical)
- • Critical alerts sent to Telegram immediately
- • Warning alerts included in report and sent if configured
- • Results stored to history file for trending
- • Cron schedule verified if periodic monitoring was requested
- • No collector timeout exceeded 15 seconds </Final_Checklist>
Heartbeat configuration in ~/superclaw/superclaw.json:
heartbeat:
enabled: true
defaultSchedule: "*/30 * * * *" # Every 30 minutes
quietHours:
start: "22:00"
end: "08:00"
timezone: "America/New_York"
alertChannel: "telegram"
historyRetentionDays: 7
collectors:
system: true
dev: true
github: true
sentry: false # Requires SENTRY_AUTH_TOKEN
calendar: true
process: true
custom: true
thresholds:
cpu_warn: 70
cpu_critical: 90
memory_warn: 75
memory_critical: 90
disk_warn: 80
disk_critical: 95
load_warn: 4.0
load_critical: 8.0
uncommitted_warn: 20
uncommitted_critical: 50
ci_failures_warn: 1
ci_failures_critical: 3
sentry_unresolved_warn: 5
sentry_unresolved_critical: 20
typescript_errors_warn: 1
typescript_errors_critical: 10
Environment variables for external collectors:
export SENTRY_AUTH_TOKEN="sntrys_..." export SENTRY_ORG="my-org" export SENTRY_PROJECT="my-project" export GITHUB_TOKEN="ghp_..." # Usually set by gh CLI auth
Custom Collector Creation
Create executable scripts in ~/superclaw/collectors/:
#!/bin/bash
# ~/superclaw/collectors/docker-health.sh
# Custom collector for Docker container health
RUNNING=$(docker ps --format '{{.Names}}' 2>/dev/null | wc -l | tr -d ' ')
STOPPED=$(docker ps -a --filter "status=exited" --format '{{.Names}}' 2>/dev/null | wc -l | tr -d ' ')
if [ "$STOPPED" -gt 2 ]; then
STATUS="warn"
elif [ "$STOPPED" -gt 5 ]; then
STATUS="critical"
else
STATUS="ok"
fi
echo "{\"status\":\"$STATUS\",\"running\":$RUNNING,\"stopped\":$STOPPED,\"message\":\"$RUNNING running, $STOPPED stopped\"}"
Make it executable: chmod +x ~/superclaw/collectors/docker-health.sh
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| GitHub collector returns empty | gh CLI not authenticated | Run gh auth login |
| Sentry collector fails | Missing or expired token | Set SENTRY_AUTH_TOKEN env var |
| Calendar collector empty | No automation permission | Grant Terminal access to Calendar in System Settings > Privacy > Automation |
| Cron jobs don't fire | OpenClaw daemon not running | Run superclaw daemon start |
| Report too long for Telegram | Many alerts or verbose collectors | Report auto-truncates at 4096 chars; reduce collector verbosity |
| History files accumulating | Auto-prune not running | Manually: find ~/superclaw/heartbeat/history -mtime +7 -delete |
| Process collector slow | Too many processes | Limit to top 10 by CPU; avoid ps aux without head |
Collector-Specific Notes
System Collector (macOS):
- •Uses
top -l 1 -s 0which takes a 1-second sample (not instantaneous) - •
vm_statprovides more detailed memory breakdown if needed - •Disk check uses root volume
/by default; add more mount points in config
GitHub Collector:
- •Requires
ghCLI installed and authenticated - •Rate limited to 5000 requests/hour with authenticated token
- •Only checks the current repository (determined by git remote)
Sentry Collector:
- •Requires SENTRY_AUTH_TOKEN with project:read scope
- •Queries unresolved issues only (resolved issues are excluded)
- •Limited to 5 most recent issues to keep report concise
Calendar Collector:
- •Uses AppleScript to query Calendar.app
- •Only returns events for the current day
- •Respects calendar visibility settings in Calendar.app
- •May require "Full Disk Access" on some macOS versions
Common Patterns
Morning Briefing Pipeline:
1. Run full heartbeat (all 7 collectors) 2. Format as morning summary with calendar first 3. Send to Telegram with "Good morning" header 4. Schedule: sc_cron_add(name="morning-brief", schedule="0 8 * * 1-5", command="/run heartbeat")
CI Watcher:
1. Enable only GitHub collector 2. Set ci_failures_warn=1, ci_failures_critical=1 3. Schedule every 10 minutes: schedule="*/10 * * * *" 4. Immediate Telegram alert on any CI failure
Disk Space Guard:
1. Enable only system collector 2. Set disk_warn=80, disk_critical=90 3. Schedule hourly: schedule="0 * * * *" 4. Alert triggers cleanup recommendations