AgentSkillsCN

autoscale

管理 Vast.ai 的自动扩展端点与生产环境中的工作节点组。适用于设置 GPU 推理的自动扩展功能、管理工作池,或部署各类服务时使用。

SKILL.md
--- frontmatter
name: autoscale
description: "Manage Vast.ai autoscaling endpoints and worker groups for production deployments. Use when setting up auto-scaling GPU inference, managing worker pools, or deploying services."
argument-hint: "[action]"
allowed-tools: Bash

Vast.ai Autoscaling & Endpoints

Manage production deployments with auto-scaling worker pools.

User Request

$ARGUMENTS

Concepts

  • Endpoint: A deployment target that manages load and scaling policy
  • Worker Group: A pool of instances (workers) tied to an endpoint, auto-scaled based on load

Endpoints

Create

bash
vastai create endpoint \
  --endpoint_name '<NAME>' \
  --target_util 0.9 \
  --max_workers 20 \
  --cold_workers 5 \
  --cold_mult 2.5 \
  --min_load 0.0
OptionDescriptionDefault
--endpoint_nameName for the endpoint(required)
--target_utilTarget utilization 0–10.9
--max_workersMax workers20
--cold_workersMin cold/standby workers5
--cold_multCold capacity multiplier2.5
--min_loadMinimum floor load (perf units/s)0.0
--min_cold_loadMinimum cold load0.0

Manage

bash
vastai show endpoints
vastai update endpoint <ID> [--target_util 0.85 --max_workers 50 ...]
vastai delete endpoint <ID>
vastai get endpt-logs <ID> [--level 0-3 --tail N]

Worker Groups

Create

bash
vastai create workergroup \
  --template_hash '<HASH>' \
  --endpoint_name '<NAME>' \
  --test_workers 3 \
  --cold_workers 2 \
  --target_util 0.9 \
  --search_params 'gpu_name=RTX_4090 reliability>0.9'
OptionDescription
--template_hashTemplate for worker instances
--template_idTemplate ID (alternative)
--endpoint_name / --endpoint_idTarget endpoint
--test_workersWorkers for perf estimation
--cold_workersMin cold workers
--target_utilTarget utilization
--cold_multCold capacity multiplier
--search_paramsSearch query for selecting machines
--gpu_ramEstimated GPU RAM requirement
--launch_argsExtra args for instance creation
-nDisable default search params

Manage

bash
vastai show workergroups
vastai update workergroup <ID> [--target_util --cold_workers ...]
vastai delete workergroup <ID>
vastai get wrkgrp-logs <ID> [--level 0-3 --tail N]

Typical Setup Flow

  1. Create a template with your Docker image and config
  2. Create an endpoint with scaling policy
  3. Create a worker group linking the template to the endpoint
  4. Monitor with show endpoints and show workergroups
  5. Check logs with get endpt-logs and get wrkgrp-logs