AgentSkillsCN

vllm-studio-backend

适用于 vLLM Studio 后端架构的开发与优化工作——包括控制器运行时、Pi-mono 代理循环、OpenAI 兼容端点、LiteLLM 网关、推理流程以及调试命令等环节。

SKILL.md
--- frontmatter
name: vllm-studio-backend
description: Use when working on vLLM Studio backend architecture (controller runtime, Pi-mono agent loop, OpenAI-compatible endpoints, LiteLLM gateway, inference process, and debugging commands).

vLLM Studio Backend Architecture

Overview

This skill explains how the backend is wired: controller runtime, OpenAI-compatible proxy, Pi-mono agent loop, LiteLLM gateway, and inference process management.

When To Use

  • Modifying controller routes or run streaming.
  • Debugging OpenAI-compatible endpoint behavior.
  • Updating Pi-mono agent runtime or tool execution.
  • Understanding how inference + LiteLLM fit together.

Quick Start

  • Read references/backend-architecture.md for the component map and data flow.
  • Read references/openai-compat.md for /v1/models and /v1/chat/completions behavior.
  • Read references/backend-commands.md for useful run/debug commands.

Core Guarantees

  • Keep OpenAI-compatible endpoints stable (/v1/models, /v1/chat/completions).
  • /chat UI uses controller run stream (/chats/:id/turn) and Pi-mono runtime.
  • Tool execution happens server-side (MCP + AgentFS + plan tools).

References

  • references/backend-architecture.md
  • references/openai-compat.md
  • references/backend-commands.md