Works with infrastructure-monitor.sh script, systemd timer, ntfy.sh push notifications,
Infrastructure Monitoring Setup Skill
Complete setup and configuration of automated infrastructure monitoring with mobile push notifications and auto-recovery capabilities.
Quick Start
Quick setup for monitoring (5 minutes):
# 1. Create unique ntfy topic TOPIC="infra-$(openssl rand -hex 8)" echo "Your topic: $TOPIC" # 2. Add to .env echo "ALERT_ENABLED=true" >> /home/dawiddutoit/projects/network/.env echo "NTFY_SERVER=https://ntfy.sh" >> /home/dawiddutoit/projects/network/.env echo "NTFY_TOPIC=$TOPIC" >> /home/dawiddutoit/projects/network/.env echo "AUTO_RECOVER=true" >> /home/dawiddutoit/projects/network/.env # 3. Install systemd service sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.* /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable --now infrastructure-monitor.timer # 4. Test /home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh
Then install ntfy app on phone and subscribe to your topic.
Table of Contents
- •When to Use This Skill
- •What This Skill Does
- •Instructions
- •3.1 Install ntfy Mobile App
- •3.2 Configure Monitoring in .env
- •3.3 Install Systemd Timer
- •3.4 Test Monitoring and Alerts
- •3.5 Configure Home Assistant Integration (Optional)
- •3.6 Verify Auto-Recovery
- •3.7 View Monitoring Logs
- •Supporting Files
- •Expected Outcomes
- •Requirements
- •Red Flags to Avoid
When to Use This Skill
Explicit Triggers:
- •"Setup monitoring"
- •"Configure mobile alerts"
- •"Enable auto-recovery"
- •"Setup ntfy notifications"
- •"Configure Home Assistant alerts"
Implicit Triggers:
- •Want to be notified of infrastructure failures
- •Need automated recovery for common issues
- •Infrastructure has been down without detection
- •Want proactive monitoring
Debugging Triggers:
- •"Why am I not getting alerts?"
- •"Is monitoring working?"
- •"How to test notifications?"
What This Skill Does
- •Mobile Alerts - Configures ntfy.sh push notifications to phone
- •Auto-Recovery - Enables automatic fixes for common failures
- •HA Integration - Optional Home Assistant notification integration
- •Systemd Service - Installs timer to run monitoring every 5 minutes
- •Tests Setup - Verifies notifications and recovery work
- •Logs Access - Shows how to view monitoring logs
- •Troubleshooting - Diagnoses alert delivery issues
Instructions
3.1 Install ntfy Mobile App
Install app:
- •Android: https://play.google.com/store/apps/details?id=io.heckel.ntfy
- •iOS: https://apps.apple.com/us/app/ntfy/id1625396347
Subscribe to topic:
- •Open ntfy app
- •Tap "+" to add subscription
- •Enter topic:
infra-YOUR-RANDOM-ID(you'll generate this in step 3.2) - •Server:
https://ntfy.sh - •Tap "Subscribe"
Note: You need the topic ID from step 3.2 before subscribing. Come back here after generating it.
3.2 Configure Monitoring in .env
Generate unique topic ID:
TOPIC="infra-$(openssl rand -hex 8)" echo "Your unique topic: $TOPIC"
Save this topic ID - you'll use it in the ntfy app.
Add monitoring configuration to .env:
# Navigate to project directory cd /home/dawiddutoit/projects/network # Add monitoring variables cat >> .env << EOF # Monitoring & Alerts ALERT_ENABLED=true NTFY_SERVER=https://ntfy.sh NTFY_TOPIC=$TOPIC AUTO_RECOVER=true EOF
Verify configuration:
grep -A4 "Monitoring & Alerts" /home/dawiddutoit/projects/network/.env
Expected:
# Monitoring & Alerts ALERT_ENABLED=true NTFY_SERVER=https://ntfy.sh NTFY_TOPIC=infra-a3f7d92b4c8e1f56 AUTO_RECOVER=true
Configuration options:
| Variable | Purpose | Default |
|---|---|---|
ALERT_ENABLED | Enable mobile push notifications | false |
NTFY_SERVER | ntfy.sh server URL | https://ntfy.sh |
NTFY_TOPIC | Unique topic for your alerts | None (required) |
AUTO_RECOVER | Enable automatic recovery | true |
To disable auto-recovery but keep alerts:
# Edit .env nano /home/dawiddutoit/projects/network/.env # Change: AUTO_RECOVER=false
3.3 Install Systemd Timer
Install systemd service and timer to run monitoring every 5 minutes:
# Copy service files sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.service /etc/systemd/system/ sudo cp /home/dawiddutoit/projects/network/systemd/infrastructure-monitor.timer /etc/systemd/system/ # Reload systemd sudo systemctl daemon-reload # Enable and start timer sudo systemctl enable infrastructure-monitor.timer sudo systemctl start infrastructure-monitor.timer
Verify timer is active:
# Check timer status systemctl list-timers infrastructure-monitor.timer # Check service status sudo systemctl status infrastructure-monitor.timer
Expected:
● infrastructure-monitor.timer - Run infrastructure monitoring every 5 minutes
Loaded: loaded (/etc/systemd/system/infrastructure-monitor.timer; enabled)
Active: active (waiting) since...
Timer configuration:
- •Runs every 5 minutes
- •Starts 1 minute after boot
- •Persistent (survives reboots)
3.4 Test Monitoring and Alerts
Test monitoring script:
# Run monitoring manually /home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh
Expected output shows:
- •Docker containers checked
- •Tunnel connectivity tested
- •Service health verified
- •Network interface status
- •Alert sent to ntfy topic
Test alert delivery:
Within 30 seconds, you should receive push notification on phone with infrastructure status.
If no notification received:
Check ntfy topic subscription:
# Test sending to topic directly curl -d "Test from infrastructure monitoring" https://ntfy.sh/$TOPIC
If direct curl works but monitoring doesn't:
- •Check ALERT_ENABLED=true in .env
- •Verify NTFY_TOPIC matches app subscription
- •Check script has network access
3.5 Configure Home Assistant Integration (Optional)
Why use Home Assistant integration:
- •Centralized home automation alerts
- •Can trigger automations based on infrastructure status
- •Redundancy with ntfy.sh
- •Integration with existing HA notifications
Prerequisites:
- •Home Assistant running and accessible
- •HA mobile app installed (for notify.mobile_app_* service)
Step 1: Create Long-Lived Access Token
- •Go to Home Assistant: http://192.168.68.123:8123
- •Click your profile (bottom left)
- •Scroll to "Long-Lived Access Tokens"
- •Click "Create Token"
- •Name: "Infrastructure Monitoring"
- •Copy token (shown only once)
Step 2: Find Notification Service Name
- •In Home Assistant: Developer Tools → Services
- •Filter by "notify"
- •Find your mobile app service:
notify.mobile_app_your_phone
Step 3: Add to .env
# Edit .env nano /home/dawiddutoit/projects/network/.env # Add HA configuration HA_NOTIFICATIONS_ENABLED=true HA_BASE_URL=http://192.168.68.123:8123 HA_ACCESS_TOKEN=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9... HA_NOTIFY_SERVICE=notify.mobile_app_your_phone
Step 4: Test HA Notifications
# Run monitoring (should send to both ntfy and HA) /home/dawiddutoit/projects/network/scripts/infrastructure-monitor.sh
Check you receive notification in Home Assistant companion app.
Troubleshooting HA notifications:
# Test HA API access
curl -H "Authorization: Bearer YOUR_TOKEN" \
http://192.168.68.123:8123/api/
# Test notification service
curl -X POST \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message": "Test from infrastructure monitoring"}' \
http://192.168.68.123:8123/api/services/notify/mobile_app_your_phone
3.6 Verify Auto-Recovery
Monitor logs to see auto-recovery in action:
# View live monitoring logs sudo journalctl -u infrastructure-monitor.service -f # Or check persistent log tail -f /var/log/infrastructure-monitor.log
Auto-recovery capabilities:
| Issue | Detection | Recovery Action |
|---|---|---|
| Stuck cloudflared | No registrations in 10 min | Restart cloudflared container |
| Docker network isolation | Ping fails between containers | Recreate bridge network |
| Inactive Ethernet | WiFi used instead of eth0 | Activate Ethernet connection |
| Service failures | HTTP health checks fail | Restart affected containers |
Test auto-recovery:
# Simulate stuck tunnel docker stop cloudflared # Wait 5 minutes (next monitoring run) # Check logs - should show tunnel restarted # Verify tunnel recovered docker ps | grep cloudflared docker logs cloudflared | grep "Registered tunnel"
3.7 View Monitoring Logs
View systemd service logs:
# Live monitoring logs sudo journalctl -u infrastructure-monitor.service -f # Last 50 lines sudo journalctl -u infrastructure-monitor.service -n 50 # Logs from today sudo journalctl -u infrastructure-monitor.service --since today # Logs with timestamps sudo journalctl -u infrastructure-monitor.service -o short-iso
View persistent log file:
# Live tail tail -f /var/log/infrastructure-monitor.log # Last 100 lines tail -100 /var/log/infrastructure-monitor.log # Search for errors grep -i error /var/log/infrastructure-monitor.log # Search for recoveries grep -i "recovered" /var/log/infrastructure-monitor.log
Check timer schedule:
# Show next run time systemctl list-timers infrastructure-monitor.timer # Show timer configuration systemctl cat infrastructure-monitor.timer
Monitoring controls:
# Stop monitoring temporarily sudo systemctl stop infrastructure-monitor.timer # Restart monitoring sudo systemctl start infrastructure-monitor.timer # Disable monitoring (survives reboot) sudo systemctl disable infrastructure-monitor.timer # Re-enable monitoring sudo systemctl enable infrastructure-monitor.timer
Supporting Files
| File | Purpose |
|---|---|
references/reference.md | Monitoring architecture, recovery strategies, ntfy.sh details |
examples/examples.md | Example configurations, alert formats, log outputs |
scripts/test-notifications.sh | Test script for alert delivery |
Expected Outcomes
Success:
- •ntfy app receives push notifications
- •Monitoring runs every 5 minutes
- •Auto-recovery fixes common failures within 5 minutes
- •Logs show monitoring activity
- •Home Assistant notifications working (if configured)
Partial Success:
- •Monitoring runs but alerts not received (check topic subscription)
- •Alerts received but auto-recovery disabled (set AUTO_RECOVER=true)
Failure Indicators:
- •No notifications received after 10 minutes
- •Timer not running (check systemctl status)
- •Script fails with errors (check logs)
- •HA notifications not working (check token/service name)
Requirements
- •Infrastructure server running Linux with systemd
- •Mobile device with ntfy app installed
- •Internet connectivity for ntfy.sh
- •.env file with monitoring configuration
- •Home Assistant (optional, for HA integration)
Red Flags to Avoid
- • Do not use public/guessable ntfy topic (security risk)
- • Do not share ntfy topic publicly (anyone can subscribe)
- • Do not disable monitoring without alternative alerting
- • Do not ignore persistent alerts (investigate root cause)
- • Do not run monitoring script too frequently (causes noise)
- • Do not commit .env with ntfy topic to git (privacy)
- • Do not use AUTO_RECOVER=false without manual monitoring
Notes
- •Monitoring checks run every 5 minutes via systemd timer
- •ntfy.sh is free and doesn't require account
- •Topic ID should be random and private (security by obscurity)
- •Auto-recovery attempts fixes before alerting as critical
- •Alert levels: 🔴 Critical (manual intervention), ⚠️ Warning (recovery in progress)
- •HA integration is optional and works alongside ntfy.sh
- •Logs persist across reboots at /var/log/infrastructure-monitor.log
- •Maximum detection time: 5 minutes (timer interval)
- •Monitoring survives server reboots (systemd timer enabled)
- •Use infrastructure-health-check skill for manual on-demand checks