Media Generation
Image Generation
bash
uv run ~/.claude/skills/media-generation/scripts/generate_image.py \ --prompt "description or editing instructions" \ --filename "output.png" \ [--input-image "source.png"] \ [--resolution 1K|2K|4K]
Resolution
- •
1K(default) — also for: "low res", "1080p" - •
2K— also for: "medium", "2048" - •
4K— also for: "high res", "hi-res", "ultra"
Video Generation
bash
uv run ~/.claude/skills/media-generation/scripts/generate_video.py \ --prompt "video description" \ --filename "output.mp4" \ [--model veo-3.0-generate-preview] \ [--negative "things to avoid"] \ [--input-image "first-frame.png"]
Models
- •
veo-3.0-generate-001(default) — stable, video only - •
veo-3.0-fast-generate-001— faster, lower cost - •
veo-3.1-generate-preview— supports video extend, audio sync - •
veo-3.1-fast-generate-preview— fast with extend support
Prompting Tips
- •Specify camera movements:
"slow zoom in", "pan left", "close-up" - •Add
"no talking, no dialogue"if character shouldn't speak - •Describe atmosphere:
"rain outside", "purple mystical energy"
Note: Veo requires paid tier. ~$0.40/sec standard, ~$0.15/sec fast.
Music Video from Image + Audio
Overview
- •Start with character image + audio track (e.g., from Suno)
- •Transcribe audio to get timestamps
- •Generate clip 1 from image (veo-3.1)
- •Extend each subsequent clip from previous (maintains continuity)
- •Stitch clips + overlay audio with ffmpeg
Step 1: Transcribe audio for timing
bash
whisper-ctranslate2 "song.mp3" --model large-v3 --output_dir /tmp --output_format srt
Step 2: Generate first clip from image
python
# Use veo-3.1 (required for extend feature)
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
image=types.Image(image_bytes=img_data, mime_type="image/jpeg"),
prompt="character description, scene action, no talking",
)
video1 = operation.result.generated_videos[0]
Step 3: Extend from previous clip
python
operation = client.models.generate_videos(
model="veo-3.1-generate-preview",
video=previous_video.video, # Pass previous video object
prompt="next scene description, continuous action, no talking",
)
Step 4: Stitch clips + add audio
bash
# Create concat list printf "file 'clip_01.mp4'\nfile 'clip_02.mp4'\n..." > concat.txt # Stitch video clips ffmpeg -f concat -safe 0 -i concat.txt -c copy combined.mp4 # Add audio track ffmpeg -i combined.mp4 -i song.mp3 -c:v copy -c:a aac -map 0:v -map 1:a final.mp4
Cost estimate
- •~8 sec per clip × $0.40/sec = $3.20/clip
- •4-min song ≈ 30 clips ≈ $96
Audio Generation
- •Music: Use Suno (external service)
- •Speech: Gemini 2.5 TTS (Flash or Pro) - TBD script
API Key
Uses GEMINI_API_KEY env var, or pass --api-key KEY.