ACE-Step Music Generation — AI Integration
Use ACE-Step V1.5 REST API for AI-driven music generation. This document provides instructions for any AI assistant, agent framework, or orchestrator that can make HTTP calls.
Prerequisites
- •ACE-Step Docker container running in API mode (
ACESTEP_MODE=apiin.env) - •API available at
http://localhost:8501 - •Tools:
curlandjq(for shell-based workflows)
Health Check
curl -s http://localhost:8501/health
# Should return: {"data":{"status":"ok","service":"ACE-Step API","version":"1.0"},...}
If health check fails, the container may be in gradio mode or not running. Check with docker compose ps and verify ACESTEP_MODE=api in .env.
Workflow
For user requests involving music generation, follow this workflow:
- •Understand the request — What genre, mood, language, vocal style does the user want?
- •Consult the Music Creation Guide — Use it to write captions, lyrics, and choose parameters
- •Write a detailed caption — Style, instruments, emotion, vocal characteristics, production quality
- •Write complete lyrics with structure tags —
[Verse],[Chorus],[Bridge], etc. - •Calculate parameters — Duration (based on lyrics length), BPM (based on genre), key, time signature
- •Submit the task via
POST /release_task - •Poll for results via
POST /query_resultuntilstatusis1(success) or2(failed) - •Download audio via the URL in the result
Generation Modes
| Mode | When to Use | How |
|---|---|---|
| Caption (Recommended) | For vocal songs — write lyrics yourself first | prompt + lyrics + thinking: true |
| Simple/Description | Quick exploration, LM generates everything | sample_mode: true + sample_query |
| Random | Random generation for inspiration | POST /create_random_sample |
Always prefer Caption mode for the best results. Write the lyrics yourself rather than letting the LM generate them.
API Endpoints
All responses are wrapped: {"data": <payload>, "code": 200, "error": null, "timestamp": ...}
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Health check |
/release_task | POST | Submit music generation task |
/query_result | POST | Query task status (batch) |
/v1/audio?path={path} | GET | Download audio file |
/v1/models | GET | List available DiT models |
/v1/stats | GET | Server statistics (queue, jobs, avg time) |
/format_input | POST | LLM-enhanced caption/lyrics formatting |
/create_random_sample | POST | Get random sample parameters |
Quick Example: Full Generation Flow
# 1. Submit task
TASK_ID=$(curl -s -X POST http://localhost:8501/release_task \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Symphonic black metal, epic orchestral arrangements, blast beats, tremolo picking, aggressive male vocals, dark atmosphere",
"lyrics": "[Intro - orchestral]\n\n[Verse 1 - aggressive]\nThrough frozen wastelands we march\nBeneath the blackened sky\nThe ancient ones await\nAs mortals fade and die\n\n[Chorus - powerful]\nWE ARE THE STORM\nWE ARE THE NIGHT\nRISING FROM DARKNESS\nINTO ETERNAL LIGHT\n\n[Outro - fade out]",
"thinking": true,
"param_obj": {
"duration": 120,
"bpm": 160,
"key_scale": "D Minor",
"time_signature": "4",
"language": "en"
}
}' | jq -r '.data.task_id')
echo "Task: $TASK_ID"
# 2. Poll for result (repeat until status != 0)
curl -s -X POST http://localhost:8501/query_result \
-H 'Content-Type: application/json' \
-d "{\"task_id_list\": [\"$TASK_ID\"]}" | jq .
# 3. Download audio (use the file URL from the result)
# curl -o output.mp3 "http://localhost:8501/v1/audio?path=<path-from-result>"
Request Parameters (/release_task)
Core Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | "" | Music style description (alias: caption) |
lyrics | string | "" | Complete lyrics — pass ALL lyrics without omission. Use [inst] or [Instrumental] for instrumental sections |
thinking | bool | false | Enable 5Hz LM for audio code generation (higher quality, recommended) |
sample_mode | bool | false | Enable description-driven mode (LM generates everything) |
sample_query | string | "" | Description for sample mode (alias: description, desc) |
use_format | bool | false | Use LM to enhance caption/lyrics |
model | string | - | DiT model name (use /v1/models to list) |
batch_size | int | 1 | Number of audio files to generate (max 8) |
Music Attributes (in param_obj or top-level)
| Parameter | Type | Default | Description |
|---|---|---|---|
duration | float | - | Duration in seconds (alias: audio_duration) |
bpm | int | - | Tempo (30-300) |
key_scale | string | "" | Key (e.g., "C Major", "D Minor") |
time_signature | string | "" | Time signature ("2", "3", "4", "6" for 2/4, 3/4, 4/4, 6/8) |
language | string | "en" | Vocal language (alias: vocal_language) |
audio_format | string | "mp3" | Output format (mp3/wav/flac) |
Generation Control
| Parameter | Type | Default | Description |
|---|---|---|---|
inference_steps | int | 8 | Diffusion steps (turbo: 1-20, base: 1-200) |
guidance_scale | float | 7.0 | CFG scale (base model only) |
seed | int | -1 | Random seed (-1 for random) |
Audio Task Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
task_type | string | "text2music" | text2music / cover / repaint / continuation |
src_audio_path | string | - | Source audio path (for continuation/repainting) |
repainting_start | float | 0.0 | Repainting start position (seconds) |
repainting_end | float | - | Repainting end position (seconds) |
Query Result Response
{
"data": [{
"task_id": "xxx",
"status": 1,
"result": "[{\"file\":\"/v1/audio?path=...\",\"metas\":{\"bpm\":120,\"duration\":60,\"keyscale\":\"C Major\"}}]"
}]
}
Status codes: 0 = processing, 1 = success, 2 = failed
Important: The result field is a JSON string that must be parsed. It contains an array of result objects, each with a file field containing the download URL.
Tips for AI Assistants
- •Always use
thinking: true— This enables the 5Hz LM for much better quality - •Write lyrics yourself — Don't rely on
sample_modefor serious requests. Write complete, well-structured lyrics with proper structure tags - •Be generous with duration — Too short is worse than too long. Calculate based on lyrics length (3-5 sec per line + intro/outro)
- •Match caption and lyrics — Instruments mentioned in caption should appear as tags in lyrics. Don't contradict yourself
- •Use uppercase for intensity —
WE ARE THE CHAMPIONSgenerates louder, more powerful vocals thanwe are the champions - •Poll patiently — Generation can take 30 seconds to several minutes depending on duration and model settings. Poll every 5-10 seconds
- •Check actual output — When
thinking: true, the LM may enhance your caption/lyrics. Check the result JSON for what was actually used
For detailed guidance on writing captions, lyrics, and choosing parameters, see music-creation-guide.md. For the complete API reference with all parameters and examples, see API.md.