AgentSkillsCN

acestep

利用ACE-Step API,根据文本描述与歌词生成音乐。支持文本转音乐、歌词创作、音频续写以及音频重绘等功能。当用户提及音乐生成、歌曲创作、音乐制作、混音或音频续写时,可使用此技能。

SKILL.md
--- frontmatter
name: acestep
description: Use ACE-Step API to generate music from text descriptions and lyrics. Supports text-to-music, lyrics generation, audio continuation, and audio repainting. Use this skill when users mention generating music, creating songs, music production, remix, or audio continuation.

ACE-Step Music Generation — AI Integration

Use ACE-Step V1.5 REST API for AI-driven music generation. This document provides instructions for any AI assistant, agent framework, or orchestrator that can make HTTP calls.

Prerequisites

  • ACE-Step Docker container running in API mode (ACESTEP_MODE=api in .env)
  • API available at http://localhost:8501
  • Tools: curl and jq (for shell-based workflows)

Health Check

bash
curl -s http://localhost:8501/health
# Should return: {"data":{"status":"ok","service":"ACE-Step API","version":"1.0"},...}

If health check fails, the container may be in gradio mode or not running. Check with docker compose ps and verify ACESTEP_MODE=api in .env.


Workflow

For user requests involving music generation, follow this workflow:

  1. Understand the request — What genre, mood, language, vocal style does the user want?
  2. Consult the Music Creation Guide — Use it to write captions, lyrics, and choose parameters
  3. Write a detailed caption — Style, instruments, emotion, vocal characteristics, production quality
  4. Write complete lyrics with structure tags[Verse], [Chorus], [Bridge], etc.
  5. Calculate parameters — Duration (based on lyrics length), BPM (based on genre), key, time signature
  6. Submit the task via POST /release_task
  7. Poll for results via POST /query_result until status is 1 (success) or 2 (failed)
  8. Download audio via the URL in the result

Generation Modes

ModeWhen to UseHow
Caption (Recommended)For vocal songs — write lyrics yourself firstprompt + lyrics + thinking: true
Simple/DescriptionQuick exploration, LM generates everythingsample_mode: true + sample_query
RandomRandom generation for inspirationPOST /create_random_sample

Always prefer Caption mode for the best results. Write the lyrics yourself rather than letting the LM generate them.


API Endpoints

All responses are wrapped: {"data": <payload>, "code": 200, "error": null, "timestamp": ...}

EndpointMethodDescription
/healthGETHealth check
/release_taskPOSTSubmit music generation task
/query_resultPOSTQuery task status (batch)
/v1/audio?path={path}GETDownload audio file
/v1/modelsGETList available DiT models
/v1/statsGETServer statistics (queue, jobs, avg time)
/format_inputPOSTLLM-enhanced caption/lyrics formatting
/create_random_samplePOSTGet random sample parameters

Quick Example: Full Generation Flow

bash
# 1. Submit task
TASK_ID=$(curl -s -X POST http://localhost:8501/release_task \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "Symphonic black metal, epic orchestral arrangements, blast beats, tremolo picking, aggressive male vocals, dark atmosphere",
    "lyrics": "[Intro - orchestral]\n\n[Verse 1 - aggressive]\nThrough frozen wastelands we march\nBeneath the blackened sky\nThe ancient ones await\nAs mortals fade and die\n\n[Chorus - powerful]\nWE ARE THE STORM\nWE ARE THE NIGHT\nRISING FROM DARKNESS\nINTO ETERNAL LIGHT\n\n[Outro - fade out]",
    "thinking": true,
    "param_obj": {
      "duration": 120,
      "bpm": 160,
      "key_scale": "D Minor",
      "time_signature": "4",
      "language": "en"
    }
  }' | jq -r '.data.task_id')

echo "Task: $TASK_ID"

# 2. Poll for result (repeat until status != 0)
curl -s -X POST http://localhost:8501/query_result \
  -H 'Content-Type: application/json' \
  -d "{\"task_id_list\": [\"$TASK_ID\"]}" | jq .

# 3. Download audio (use the file URL from the result)
# curl -o output.mp3 "http://localhost:8501/v1/audio?path=<path-from-result>"

Request Parameters (/release_task)

Core Parameters

ParameterTypeDefaultDescription
promptstring""Music style description (alias: caption)
lyricsstring""Complete lyrics — pass ALL lyrics without omission. Use [inst] or [Instrumental] for instrumental sections
thinkingboolfalseEnable 5Hz LM for audio code generation (higher quality, recommended)
sample_modeboolfalseEnable description-driven mode (LM generates everything)
sample_querystring""Description for sample mode (alias: description, desc)
use_formatboolfalseUse LM to enhance caption/lyrics
modelstring-DiT model name (use /v1/models to list)
batch_sizeint1Number of audio files to generate (max 8)

Music Attributes (in param_obj or top-level)

ParameterTypeDefaultDescription
durationfloat-Duration in seconds (alias: audio_duration)
bpmint-Tempo (30-300)
key_scalestring""Key (e.g., "C Major", "D Minor")
time_signaturestring""Time signature ("2", "3", "4", "6" for 2/4, 3/4, 4/4, 6/8)
languagestring"en"Vocal language (alias: vocal_language)
audio_formatstring"mp3"Output format (mp3/wav/flac)

Generation Control

ParameterTypeDefaultDescription
inference_stepsint8Diffusion steps (turbo: 1-20, base: 1-200)
guidance_scalefloat7.0CFG scale (base model only)
seedint-1Random seed (-1 for random)

Audio Task Parameters

ParameterTypeDefaultDescription
task_typestring"text2music"text2music / cover / repaint / continuation
src_audio_pathstring-Source audio path (for continuation/repainting)
repainting_startfloat0.0Repainting start position (seconds)
repainting_endfloat-Repainting end position (seconds)

Query Result Response

json
{
  "data": [{
    "task_id": "xxx",
    "status": 1,
    "result": "[{\"file\":\"/v1/audio?path=...\",\"metas\":{\"bpm\":120,\"duration\":60,\"keyscale\":\"C Major\"}}]"
  }]
}

Status codes: 0 = processing, 1 = success, 2 = failed

Important: The result field is a JSON string that must be parsed. It contains an array of result objects, each with a file field containing the download URL.


Tips for AI Assistants

  1. Always use thinking: true — This enables the 5Hz LM for much better quality
  2. Write lyrics yourself — Don't rely on sample_mode for serious requests. Write complete, well-structured lyrics with proper structure tags
  3. Be generous with duration — Too short is worse than too long. Calculate based on lyrics length (3-5 sec per line + intro/outro)
  4. Match caption and lyrics — Instruments mentioned in caption should appear as tags in lyrics. Don't contradict yourself
  5. Use uppercase for intensityWE ARE THE CHAMPIONS generates louder, more powerful vocals than we are the champions
  6. Poll patiently — Generation can take 30 seconds to several minutes depending on duration and model settings. Poll every 5-10 seconds
  7. Check actual output — When thinking: true, the LM may enhance your caption/lyrics. Check the result JSON for what was actually used

For detailed guidance on writing captions, lyrics, and choosing parameters, see music-creation-guide.md. For the complete API reference with all parameters and examples, see API.md.