Model Fallback Skill

Automatically switch to backup AI models when your primary model fails, times out, or degrades in performance.

Overview

This skill provides a robust fallback system for nanobot that:

•Proactively monitors model health (response times, error rates)
•Automatically switches to backup models when issues are detected
•Reacts to failures and triggers model changes
•Logs all events for analysis and debugging
•Works with any model configured in your nanobot config

Features

Feature	Description
Health Check Monitor	Runs periodically to test model performance
Automatic Fallback	Switches to next model in chain when thresholds exceeded
Reactive Trigger	Handles actual failures and crashes
Configurable Thresholds	Customize response time, error rate, and timeout limits
Cascading Fallback	Multiple backup models for maximum reliability
Comprehensive Logging	Track all health checks and fallback events

Installation

Quick Install

bash

cd /Users/awalker/.nanobot/workspace/skills/model-fallback
./install.sh

Manual Install

•

Copy the skill to your workspace:

bash

cp -r model-fallback ~/.nanobot/workspace/skills/

•

Make scripts executable:

bash

chmod +x ~/.nanobot/workspace/skills/model-fallback/scripts/*.py
chmod +x ~/.nanobot/workspace/skills/model-fallback/install.sh

•

Configure your nanobot config: Add fallback settings to ~/.nanobot/config.json:

json

{
  "agents": {
    "defaults": {
      "model": "your-primary-model",
      "fallbackModels": [
        "backup-model-1",
        "backup-model-2",
        "backup-model-3"
      ],
      "fallback": {
        "enabled": true,
        "healthCheckInterval": 300,
        "maxResponseTime": 30,
        "maxTimeoutRate": 0.2,
        "maxErrorRate": 0.1
      }
    }
  }
}

•

Start the health check monitor:

bash

python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py start

Configuration

Fallback Chain

Configure backup models in priority order:

json

{
  "fallbackModels": [
    "openrouter/anthropic/claude-3.5-sonnet",  # First fallback
    "openrouter/glm-4.7",                      # Second fallback
    "openrouter/google/gemini-2.0-flash-exp"   # Third fallback
  ]
}

Health Check Settings

Setting	Default	Description
`healthCheckInterval`	300 (5 min)	How often to check model health (seconds)
`maxResponseTime`	30	Maximum acceptable response time (seconds)
`maxTimeoutRate`	0.2 (20%)	Maximum timeout rate before switching
`maxErrorRate`	0.1 (10%)	Maximum error rate before switching

Example Config

json

{
  "agents": {
    "defaults": {
      "model": "minimax/m2.5",
      "fallbackModels": [
        "openrouter/anthropic/claude-3.5-sonnet",
        "openrouter/glm-4.7",
        "openrouter/google/gemini-2.0-flash-exp:free"
      ],
      "fallback": {
        "enabled": true,
        "healthCheckInterval": 300,
        "maxResponseTime": 30,
        "maxTimeoutRate": 0.2,
        "maxErrorRate": 0.1
      }
    }
  },
  "providers": {
    "minimax": {
      "apiKey": "YOUR_MINIMAX_API_KEY"
    },
    "openrouter": {
      "apiKey": "YOUR_OPENROUTER_API_KEY"
    }
  }
}

Usage

Start Health Check Monitor

bash

python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py start

Stop Health Check Monitor

bash

python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py stop

Check Fallback Status

bash

python3 ~/.nanobot/workspace/skills/model-fallback/scripts/fallback-trigger.py status

Manually Trigger Fallback

bash

# Test fallback with a specific reason
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/fallback-trigger.py trigger --reason "Manual test"

# Trigger fallback without reason
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/fallback-trigger.py trigger

View Logs

bash

# Health check logs
tail -f ~/.nanobot/logs/model-health.log

# Fallback event logs
tail -f ~/.nanobot/logs/model-fallback.log

# Combined logs
tail -f ~/.nanobot/logs/model-*.log

How It Works

Proactive Monitoring (Health Check)

code

Every 5 minutes:
  1. Send test prompt to current model
  2. Measure response time
  3. Check for errors/timeouts
  4. Calculate error/timeout rates
  5. If thresholds exceeded:
     - Log warning
     - Trigger fallback to next model
     - Restart nanobot if needed

Reactive Fallback

code

When failure detected:
  1. Identify current model
  2. Select next model in fallback chain
  3. Update config with new model
  4. Log the switch
  5. Trigger nanobot restart (via wrapper)

Restart Integration

This skill works best with the nanobot wrapper script for automatic restarts:

bash

# Wrapper script monitors for restart trigger
~/nanobot-wrapper.sh

# Fallback script creates trigger file
touch /tmp/nanobot-restart

Supported Models

This skill works with any model configured in your nanobot:

Minimax

•minimax/m2.5
•minimax/m1
•Other Minimax models

OpenRouter

•openrouter/anthropic/claude-3.5-sonnet
•openrouter/google/gemini-2.0-flash-exp
•openrouter/glm-4.7
•Any OpenRouter-hosted model

Other Providers

Add your provider to config.json and use the model identifier in your fallback chain.

Troubleshooting

Health Check Not Running

bash

# Check if process is running
ps aux | grep health-check

# Restart manually
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py start

Fallback Not Triggering

•Check config has fallback.enabled: true
•Verify fallback models are listed

•Check logs for errors:

bash

tail -50 ~/.nanobot/logs/model-fallback.log

Model Not Switching

•Ensure nanobot is running with wrapper script
•
Check restart trigger exists:
bash
```
ls -la /tmp/nanobot-restart
```
•
Verify config file is writable:
bash
```
ls -la ~/.nanobot/config.json
```

API Key Issues

Make sure your provider API keys are configured in ~/.nanobot/config.json:

json

{
  "providers": {
    "minimax": {
      "apiKey": "your-key-here"
    },
    "openrouter": {
      "apiKey": "your-key-here"
    }
  }
}

Logs

Health Check Log

Location: ~/.nanobot/logs/model-health.log

Contains:

•Health check timestamps
•Response times
•Error/timeout rates
•Model performance metrics

Fallback Log

Location: ~/.nanobot/logs/model-fallback.log

Contains:

•Fallback trigger events
•Model switches
•Reasons for switching
•Config updates

Advanced Usage

Custom Health Check Prompt

Edit health-check.py to customize the test prompt:

python

TEST_PROMPT = "Respond with 'OK' if you're working."

Adjusting Sensitivity

For less sensitive monitoring (fewer switches):

json

{
  "maxResponseTime": 60,
  "maxTimeoutRate": 0.5,
  "maxErrorRate": 0.3
}

For more sensitive monitoring (faster switches):

json

{
  "maxResponseTime": 15,
  "maxTimeoutRate": 0.1,
  "maxErrorRate": 0.05
}

Multiple Instances

Run health checks for different models:

bash

# Check primary model
python3 health-check.py --model "minimax/m2.5"

# Check fallback model
python3 health-check.py --model "openrouter/claude-3.5-sonnet"

Contributing

To improve this skill:

•Test with different model providers
•Add new health check metrics
•Improve error handling
•Add more configuration options
•Share your configurations!

License

This skill is provided as-is for use with nanobot. Feel free to modify and distribute.

Support

For issues or questions:

•Check logs in ~/.nanobot/logs/model-*.log
•Verify your config is valid JSON
•Ensure API keys are correct
•Check nanobot is running with wrapper script

Changelog

v1.0.0 (2025-02-12)

•Initial release
•Health check monitoring
•Automatic fallback
•Reactive trigger
•Comprehensive logging
•Installation script