Troubleshooting gpt-oss and vLLM Errors
When to Use This Skill
Invoke this skill when you encounter:
- •
openai_harmony.HarmonyErrormessages in any context - •gpt-oss tool calling failures or unexpected behavior
- •Token parsing errors with vLLM serving gpt-oss models
- •Users asking about gpt-oss compatibility with frameworks like llama-stack
Critical First Step: Identify Error Source
IMPORTANT: openai_harmony.HarmonyError messages originate from the vLLM server, NOT from client applications (like llama-stack, LangChain, etc.).
Error Source Identification
- •
Check the error origin:
- •If error contains
openai_harmony.HarmonyError, it's from vLLM's serving layer - •The client application is just reporting what vLLM returned
- •Do NOT search the client codebase for fixes
- •If error contains
- •
Correct investigation path:
- •Search vLLM GitHub issues and PRs
- •Check openai/harmony repository for parser issues
- •Review vLLM server configuration and startup flags
- •Examine HuggingFace model files (generation_config.json)
Common Error Patterns
Token Mismatch Errors
Error Pattern: Unexpected token X while expecting start token Y
Example: Unexpected token 12606 while expecting start token 200006
Meaning:
- •vLLM expects special Harmony format control tokens
- •Model is generating regular text tokens instead
- •Token 12606 = "comment" (indicates model generating reasoning text instead of tool calls)
Known Issues:
- •vLLM #22519: gpt-oss-20b tool_call token errors
- •vLLM #22515: Same error, fixed by updating generation_config.json
Fixes:
- •Update model files from HuggingFace (see reference/model-updates.md)
- •Verify vLLM server flags for tool calling
- •Check generation_config.json EOS tokens
Tool Calling Not Working
Symptoms:
- •Model describes tools in text but doesn't call them
- •Empty
tool_calls=[]arrays - •Tool responses in wrong format
Root Causes:
- •Missing vLLM server flags
- •Outdated model configuration files
- •Configuration mismatch between client and server
Configuration Requirements:
vLLM server must be started with:
--tool-call-parser openai --enable-auto-tool-choice
For demo tool server:
--tool-server demo
For MCP tool servers:
--tool-server ip-1:port-1,ip-2:port-2
Important: Only tool_choice='auto' is supported.
Investigation Workflow
- •
Identify the error message:
- •Copy the exact error text
- •Note any token IDs mentioned
- •
Search vLLM GitHub:
- •Use error text in issue search
- •Include "gpt-oss" and model size (20b/120b)
- •Check both open and closed issues
- •
Check model configuration:
- •Verify generation_config.json is current
- •Compare against latest HuggingFace version
- •Look for recent commits that updated config
- •
Review server configuration:
- •Check vLLM startup flags
- •Verify tool-call-parser settings
- •Confirm vLLM version compatibility
- •
Check vLLM version:
- •Many tool calling issues resolved in recent vLLM releases
- •Update to latest version if encountering errors
- •Check vLLM changelog for gpt-oss-specific fixes
Quick Reference
Key Resources
- •vLLM gpt-oss recipe: https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html
- •Common issues: See reference/known-issues.md
- •Model update procedure: See reference/model-updates.md
Diagnostic Commands
Check vLLM server health:
curl http://localhost:8000/health
List available models:
curl http://localhost:8000/v1/models
Check vLLM version:
pip show vllm
Progressive Disclosure
For detailed information:
- •Known GitHub issues: See reference/known-issues.md
- •Model file updates: See reference/model-updates.md
- •Tool calling configuration: See reference/tool-calling-setup.md
Validation Steps
After implementing fixes:
- •Test simple tool calling with single function
- •Verify Harmony format tokens in responses
- •Check for token mismatch errors in logs
- •Test multi-turn conversations with tools
- •Monitor for "unexpected token" errors
If errors persist:
- •Update vLLM to latest version
- •Check vLLM GitHub for recent fixes and PRs
- •Try different model variant (120b vs 20b)
- •Review vLLM logs for additional error context