Austn Tools Skill
Purpose
Access Austin's local GPU-powered AI services at austn.net for content generation:
- •Text-to-Speech (Chatterbox TTS)
- •Image Generation (ComfyUI)
- •Background Removal
- •Vector Tracing
- •Audio Stem Separation
- •And more
Available Services
1. Text-to-Speech (/tts)
URL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU
⚠️ CRITICAL CONSTRAINT: 40-second maximum duration
- •Audio caps at 40 seconds regardless of text length
- •For longer content: split into multiple clips with separate share links
- •Estimate: ~100-120 words = ~40 seconds
Parameters:
| Field | Description | Default |
|---|---|---|
| text | Text to speak (keep under ~120 words) | Required |
| voice | Voice selection | "Default voice" |
| exaggeration | Emotional intensity (0-1) | 0.5 |
| cfg_weight | Voice adherence (0-1) | 1.0 |
Expression Tags (add inline to text):
- •
[laughter]- Laughing - •
[giggle]- Giggling - •
[sigh]- Sighing - •
[gasp]- Gasping - •
[whisper]- Whispering - •
[cough]- Coughing - •
[clear_throat]- Throat clearing - •
[groan]- Groaning - •
[humming]- Humming - •
[UH],[UM]- Filler sounds
Example Text:
Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?
2. Image Generation (/images)
URL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU
Parameters:
| Field | Description | Default |
|---|---|---|
| prompt | Image description | Required |
| negative_prompt | What to avoid | "blurry, low quality, distorted" |
| seed | Reproducibility seed | Random |
| size | Image dimensions | 512x512 |
| batch_size | Number of images | 1 |
| publish | Show in gallery 10min | false |
3. Background Removal (/rembg)
URL: https://austn.net/rembg Remove backgrounds from images.
4. Vector Tracing (/vtracer)
URL: https://austn.net/vtracer Convert raster images to SVG vectors.
5. Audio Stems (/stems)
URL: https://austn.net/stems Separate audio into vocal/instrument tracks.
6. 3D Tools (/3d)
URL: https://austn.net/3d 3D content generation.
7. MIDI Generation (/midi)
URL: https://austn.net/midi Generate MIDI sequences.
Usage via Browser Automation
Since these are web UIs, use browser automation to interact:
TTS Generation
# 1. Navigate to TTS
navigate("https://austn.net/tts/new")
# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")
# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed
# 4. Click Generate Speech
click(generate_button)
# 5. Wait for audio, then download
Image Generation
# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")
# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")
# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch
# 4. Click Generate Image
click(generate_button)
# 5. Wait for result, download
Browser Automation Tips
Field Locations (approximate)
TTS Page (/tts/new):
- •Text input: Center of page, large textarea
- •Voice dropdown: Below text input
- •Advanced options checkbox: Below voice dropdown
- •Exaggeration slider: After checkbox expanded
- •CFG Weight slider: Below exaggeration
- •Generate button: Green button at bottom
Image Page (/images/ai_generate):
- •Prompt textarea: Top of form
- •Advanced options checkbox: Below prompt
- •Negative prompt: First advanced field
- •Seed input: Below negative prompt
- •Size dropdown: Below seed
- •Batch size dropdown: Below size
- •Generate button: Green button at bottom
Downloading Results
- •TTS: Audio player appears, right-click to save or use download button
- •Images: Image appears in result area, right-click to save
Integration with Video Pipeline
These tools combine well for autonomous video creation:
- •Script → Write narration text
- •TTS → Generate voiceover audio
- •Images → Generate visuals/thumbnails
- •Combine → Use ffmpeg or video editor
Example Workflow
1. Generate narration: /austn-tools tts "Welcome to austnomaton..." 2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art" 3. Record screen session with browser automation 4. Combine audio + video with ffmpeg 5. Export final video
Output Locations
Save generated content to:
- •Audio:
content/audio/ - •Images:
content/images/ - •Videos:
content/videos/
Service Status & Dependencies
| Service | Backend | Requires Local GPU |
|---|---|---|
| TTS | Chatterbox TTS | Yes (but often available) |
| Images | ComfyUI | Yes - needs server running |
| Rembg | Python | Likely |
| VTracer | Rust | Likely |
| Stems | Demucs | Yes |
| 3D | Unknown | Yes |
| MIDI | Unknown | Yes |
Connection Details
- •Services route to local GPU via Tailscale
- •Image generation connects to
100.68.94.33:8188(ComfyUI) - •If generation fails with "TCP connection" error, the backend server isn't running
Verified Working (2026-02-02)
- •✅ TTS - Generated 8.4s audio in 6.9s
- •❌ Images - Failed (ComfyUI server not running)
Notes
- •Services depend on Austin's local GPU being online
- •No API keys needed - it's Austin's own infrastructure
- •TTS has "Share Link" that lasts 7 days
- •Gallery publish is optional and temporary (10 min)
- •Large batches may take time depending on GPU load