vLLM Studio Backend Architecture

Name: vllm-studio-backend
Rating: 65
Author: 0xSero

Overview

This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.

When To Use

•Modifying controller routes or run streaming.
•Debugging OpenAI-compatible endpoint behavior.
•Updating Pi-mono agent runtime or tool execution.
•Understanding how inference + LiteLLM fit together.

Quick Start

•Read references/backend-architecture.md for the component map and data flow.
•Read references/openai-compat.md for /v1/models and /v1/chat/completions behavior.
•Read references/backend-commands.md for useful run/debug commands.

Core Guarantees

•Keep OpenAI-compatible endpoints stable (/v1/models, /v1/chat/completions).
•/chat UI uses controller run stream (/chats/:id/turn) and Pi-mono runtime.
•Tool execution happens server-side (MCP + AgentFS + plan tools).

References

•references/backend-architecture.md
•references/openai-compat.md
•references/backend-commands.md