Austn Tools Skill

Purpose

Access Austin's local GPU-powered AI services at austn.net for content generation:

•Text-to-Speech (Chatterbox TTS)
•Image Generation (ComfyUI)
•Background Removal
•Vector Tracing
•Audio Stem Separation
•And more

Available Services

1. Text-to-Speech (`/tts`)

URL: https://austn.net/tts/new Backend: Chatterbox TTS on local GPU

⚠️ CRITICAL CONSTRAINT: 40-second maximum duration

•Audio caps at 40 seconds regardless of text length
•For longer content: split into multiple clips with separate share links
•Estimate: ~100-120 words = ~40 seconds

Parameters:

Field	Description	Default
text	Text to speak (keep under ~120 words)	Required
voice	Voice selection	"Default voice"
exaggeration	Emotional intensity (0-1)	0.5
cfg_weight	Voice adherence (0-1)	1.0

Expression Tags (add inline to text):

•[laughter] - Laughing
•[giggle] - Giggling
•[sigh] - Sighing
•[gasp] - Gasping
•[whisper] - Whispering
•[cough] - Coughing
•[clear_throat] - Throat clearing
•[groan] - Groaning
•[humming] - Humming
•[UH], [UM] - Filler sounds

Example Text:

code

Hello! [sigh] This is austnomaton speaking. [laughter] Pretty wild, right?

2. Image Generation (`/images`)

URL: https://austn.net/images/ai_generate Backend: ComfyUI on local GPU

Parameters:

Field	Description	Default
prompt	Image description	Required
negative_prompt	What to avoid	"blurry, low quality, distorted"
seed	Reproducibility seed	Random
size	Image dimensions	512x512
batch_size	Number of images	1
publish	Show in gallery 10min	false

3. Background Removal (`/rembg`)

URL: https://austn.net/rembg Remove backgrounds from images.

4. Vector Tracing (`/vtracer`)

URL: https://austn.net/vtracer Convert raster images to SVG vectors.

5. Audio Stems (`/stems`)

URL: https://austn.net/stems Separate audio into vocal/instrument tracks.

6. 3D Tools (`/3d`)

URL: https://austn.net/3d 3D content generation.

7. MIDI Generation (`/midi`)

URL: https://austn.net/midi Generate MIDI sequences.

Usage via Browser Automation

Since these are web UIs, use browser automation to interact:

TTS Generation

python

# 1. Navigate to TTS
navigate("https://austn.net/tts/new")

# 2. Click text field and enter text
click(text_field)
type("Hello world! [laughter] This is a test.")

# 3. Optionally expand advanced options
click(advanced_options_checkbox)
# Adjust sliders if needed

# 4. Click Generate Speech
click(generate_button)

# 5. Wait for audio, then download

Image Generation

python

# 1. Navigate to image generator
navigate("https://austn.net/images/ai_generate")

# 2. Enter prompt
click(prompt_field)
type("A robot writing code in a cozy office, digital art")

# 3. Optionally set advanced options
click(advanced_options_checkbox)
# Set negative prompt, seed, size, batch

# 4. Click Generate Image
click(generate_button)

# 5. Wait for result, download

Browser Automation Tips

Field Locations (approximate)

TTS Page (/tts/new):

•Text input: Center of page, large textarea
•Voice dropdown: Below text input
•Advanced options checkbox: Below voice dropdown
•Exaggeration slider: After checkbox expanded
•CFG Weight slider: Below exaggeration
•Generate button: Green button at bottom

Image Page (/images/ai_generate):

•Prompt textarea: Top of form
•Advanced options checkbox: Below prompt
•Negative prompt: First advanced field
•Seed input: Below negative prompt
•Size dropdown: Below seed
•Batch size dropdown: Below size
•Generate button: Green button at bottom

Downloading Results

•TTS: Audio player appears, right-click to save or use download button
•Images: Image appears in result area, right-click to save

Integration with Video Pipeline

These tools combine well for autonomous video creation:

•Script → Write narration text
•TTS → Generate voiceover audio
•Images → Generate visuals/thumbnails
•Combine → Use ffmpeg or video editor

Example Workflow

code

1. Generate narration: /austn-tools tts "Welcome to austnomaton..."
2. Generate thumbnail: /austn-tools image "Robot mascot, friendly, digital art"
3. Record screen session with browser automation
4. Combine audio + video with ffmpeg
5. Export final video

Output Locations

Save generated content to:

•Audio: content/audio/
•Images: content/images/
•Videos: content/videos/

Service Status & Dependencies

Service	Backend	Requires Local GPU
TTS	Chatterbox TTS	Yes (but often available)
Images	ComfyUI	Yes - needs server running
Rembg	Python	Likely
VTracer	Rust	Likely
Stems	Demucs	Yes
3D	Unknown	Yes
MIDI	Unknown	Yes

Connection Details

•Services route to local GPU via Tailscale
•Image generation connects to 100.68.94.33:8188 (ComfyUI)
•If generation fails with "TCP connection" error, the backend server isn't running

Verified Working (2026-02-02)

•✅ TTS - Generated 8.4s audio in 6.9s
•❌ Images - Failed (ComfyUI server not running)

Notes

•Services depend on Austin's local GPU being online
•No API keys needed - it's Austin's own infrastructure
•TTS has "Share Link" that lasts 7 days
•Gallery publish is optional and temporary (10 min)
•Large batches may take time depending on GPU load