AgentSkillsCN

Skill

技能

SKILL.md

Model Fallback Skill

Automatically switch to backup AI models when your primary model fails, times out, or degrades in performance.

Overview

This skill provides a robust fallback system for nanobot that:

  • Proactively monitors model health (response times, error rates)
  • Automatically switches to backup models when issues are detected
  • Reacts to failures and triggers model changes
  • Logs all events for analysis and debugging
  • Works with any model configured in your nanobot config

Features

FeatureDescription
Health Check MonitorRuns periodically to test model performance
Automatic FallbackSwitches to next model in chain when thresholds exceeded
Reactive TriggerHandles actual failures and crashes
Configurable ThresholdsCustomize response time, error rate, and timeout limits
Cascading FallbackMultiple backup models for maximum reliability
Comprehensive LoggingTrack all health checks and fallback events

Installation

Quick Install

bash
cd /Users/awalker/.nanobot/workspace/skills/model-fallback
./install.sh

Manual Install

  1. Copy the skill to your workspace:

    bash
    cp -r model-fallback ~/.nanobot/workspace/skills/
    
  2. Make scripts executable:

    bash
    chmod +x ~/.nanobot/workspace/skills/model-fallback/scripts/*.py
    chmod +x ~/.nanobot/workspace/skills/model-fallback/install.sh
    
  3. Configure your nanobot config: Add fallback settings to ~/.nanobot/config.json:

    json
    {
      "agents": {
        "defaults": {
          "model": "your-primary-model",
          "fallbackModels": [
            "backup-model-1",
            "backup-model-2",
            "backup-model-3"
          ],
          "fallback": {
            "enabled": true,
            "healthCheckInterval": 300,
            "maxResponseTime": 30,
            "maxTimeoutRate": 0.2,
            "maxErrorRate": 0.1
          }
        }
      }
    }
    
  4. Start the health check monitor:

    bash
    python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py start
    

Configuration

Fallback Chain

Configure backup models in priority order:

json
{
  "fallbackModels": [
    "openrouter/anthropic/claude-3.5-sonnet",  # First fallback
    "openrouter/glm-4.7",                      # Second fallback
    "openrouter/google/gemini-2.0-flash-exp"   # Third fallback
  ]
}

Health Check Settings

SettingDefaultDescription
healthCheckInterval300 (5 min)How often to check model health (seconds)
maxResponseTime30Maximum acceptable response time (seconds)
maxTimeoutRate0.2 (20%)Maximum timeout rate before switching
maxErrorRate0.1 (10%)Maximum error rate before switching

Example Config

json
{
  "agents": {
    "defaults": {
      "model": "minimax/m2.5",
      "fallbackModels": [
        "openrouter/anthropic/claude-3.5-sonnet",
        "openrouter/glm-4.7",
        "openrouter/google/gemini-2.0-flash-exp:free"
      ],
      "fallback": {
        "enabled": true,
        "healthCheckInterval": 300,
        "maxResponseTime": 30,
        "maxTimeoutRate": 0.2,
        "maxErrorRate": 0.1
      }
    }
  },
  "providers": {
    "minimax": {
      "apiKey": "YOUR_MINIMAX_API_KEY"
    },
    "openrouter": {
      "apiKey": "YOUR_OPENROUTER_API_KEY"
    }
  }
}

Usage

Start Health Check Monitor

bash
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py start

Stop Health Check Monitor

bash
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py stop

Check Fallback Status

bash
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/fallback-trigger.py status

Manually Trigger Fallback

bash
# Test fallback with a specific reason
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/fallback-trigger.py trigger --reason "Manual test"

# Trigger fallback without reason
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/fallback-trigger.py trigger

View Logs

bash
# Health check logs
tail -f ~/.nanobot/logs/model-health.log

# Fallback event logs
tail -f ~/.nanobot/logs/model-fallback.log

# Combined logs
tail -f ~/.nanobot/logs/model-*.log

How It Works

Proactive Monitoring (Health Check)

code
Every 5 minutes:
  1. Send test prompt to current model
  2. Measure response time
  3. Check for errors/timeouts
  4. Calculate error/timeout rates
  5. If thresholds exceeded:
     - Log warning
     - Trigger fallback to next model
     - Restart nanobot if needed

Reactive Fallback

code
When failure detected:
  1. Identify current model
  2. Select next model in fallback chain
  3. Update config with new model
  4. Log the switch
  5. Trigger nanobot restart (via wrapper)

Restart Integration

This skill works best with the nanobot wrapper script for automatic restarts:

bash
# Wrapper script monitors for restart trigger
~/nanobot-wrapper.sh

# Fallback script creates trigger file
touch /tmp/nanobot-restart

Supported Models

This skill works with any model configured in your nanobot:

Minimax

  • minimax/m2.5
  • minimax/m1
  • Other Minimax models

OpenRouter

  • openrouter/anthropic/claude-3.5-sonnet
  • openrouter/google/gemini-2.0-flash-exp
  • openrouter/glm-4.7
  • Any OpenRouter-hosted model

Other Providers

Add your provider to config.json and use the model identifier in your fallback chain.

Troubleshooting

Health Check Not Running

bash
# Check if process is running
ps aux | grep health-check

# Restart manually
python3 ~/.nanobot/workspace/skills/model-fallback/scripts/health-check.py start

Fallback Not Triggering

  1. Check config has fallback.enabled: true
  2. Verify fallback models are listed
  3. Check logs for errors:
    bash
    tail -50 ~/.nanobot/logs/model-fallback.log
    

Model Not Switching

  1. Ensure nanobot is running with wrapper script
  2. Check restart trigger exists:
    bash
    ls -la /tmp/nanobot-restart
    
  3. Verify config file is writable:
    bash
    ls -la ~/.nanobot/config.json
    

API Key Issues

Make sure your provider API keys are configured in ~/.nanobot/config.json:

json
{
  "providers": {
    "minimax": {
      "apiKey": "your-key-here"
    },
    "openrouter": {
      "apiKey": "your-key-here"
    }
  }
}

Logs

Health Check Log

Location: ~/.nanobot/logs/model-health.log

Contains:

  • Health check timestamps
  • Response times
  • Error/timeout rates
  • Model performance metrics

Fallback Log

Location: ~/.nanobot/logs/model-fallback.log

Contains:

  • Fallback trigger events
  • Model switches
  • Reasons for switching
  • Config updates

Advanced Usage

Custom Health Check Prompt

Edit health-check.py to customize the test prompt:

python
TEST_PROMPT = "Respond with 'OK' if you're working."

Adjusting Sensitivity

For less sensitive monitoring (fewer switches):

json
{
  "maxResponseTime": 60,
  "maxTimeoutRate": 0.5,
  "maxErrorRate": 0.3
}

For more sensitive monitoring (faster switches):

json
{
  "maxResponseTime": 15,
  "maxTimeoutRate": 0.1,
  "maxErrorRate": 0.05
}

Multiple Instances

Run health checks for different models:

bash
# Check primary model
python3 health-check.py --model "minimax/m2.5"

# Check fallback model
python3 health-check.py --model "openrouter/claude-3.5-sonnet"

Contributing

To improve this skill:

  1. Test with different model providers
  2. Add new health check metrics
  3. Improve error handling
  4. Add more configuration options
  5. Share your configurations!

License

This skill is provided as-is for use with nanobot. Feel free to modify and distribute.

Support

For issues or questions:

  • Check logs in ~/.nanobot/logs/model-*.log
  • Verify your config is valid JSON
  • Ensure API keys are correct
  • Check nanobot is running with wrapper script

Changelog

v1.0.0 (2025-02-12)

  • Initial release
  • Health check monitoring
  • Automatic fallback
  • Reactive trigger
  • Comprehensive logging
  • Installation script