Skill: Music Video Storyboard & Production
Purpose
Generate complete music video storyboards, scene-by-scene AI image/video prompts, and assembly guides for full-length music videos using AI generation tools (Kling, Sora, Runway Gen-3, Hailuo, Luma).
Trigger
When the user wants a full music video or visual album, or when generating premium content for a high-performing track.
When to Make a Full Music Video
Only invest in full videos for tracks that justify it:
- •Breakout tier (100+ plays) — always
- •High engagement (>30% like/play ratio) — always
- •Lead singles for album releases — always
- •Persona flagship tracks — always
- •Growing tier (20-50 plays) — only if genre benefits from visuals (J-rock, metal, cinematic)
Storyboard Framework
Song Analysis (Pre-Production)
Before any visual planning, analyze the full song:
yaml
Song Analysis:
title: "Trust in Yourself"
duration: "2:18"
bpm: 140
structure:
- intro: "0:00-0:12" (instrumental build)
- verse1: "0:12-0:38" (story setup)
- prechorus: "0:38-0:50" (tension building)
- chorus1: "0:50-1:14" (release, main hook)
- verse2: "1:14-1:34" (deeper story)
- chorus2: "1:34-1:54" (bigger, layered)
- bridge: "1:54-2:06" (emotional shift)
- final_chorus: "2:06-2:18" (climax + resolve)
emotional_arc: [tension → hope → release → power → transcendence]
visual_keywords: [arena, spotlight, rising, crowd, triumph]
Scene Count Formula
code
Duration / Average scene length = Scene count Under 2:30 → 8-12 scenes (10-15 sec each) 2:30-3:30 → 12-18 scenes (12-15 sec each) 3:30-5:00 → 18-25 scenes (12-15 sec each)
Storyboard Template
For each scene:
yaml
Scene 3:
timestamp: "0:38-0:50"
song_section: "Pre-Chorus"
duration: 12s
description: "Camera follows protagonist walking through a neon-lit corridor. Walls pulse with the beat. Each step triggers a light ripple on the floor."
emotional_beat: "Building anticipation"
camera:
shot_type: "tracking medium shot"
movement: "steady forward dolly"
angle: "eye level, slightly low"
lighting: "Cyan side-light, purple overhead, dark shadows"
subject: "Silhouetted figure walking toward bright light at end of corridor"
props: "Glass walls with circuit patterns, floating data particles"
color_palette: ["#0F172A", "#43BFE3", "#AB47C7"]
transition_to_next: "Light burst fills frame → dissolve to chorus scene"
ai_prompt: |
A person walking through a futuristic glass corridor with neon cyan
and purple lighting, circuit patterns on transparent walls, dark
atmospheric, cinematic tracking shot, volumetric fog, cyberpunk
aesthetic, 4K film quality, dramatic lighting
generation_tool: "Kling 1.6 / Sora"
generation_settings:
aspect_ratio: "16:9"
duration: "5s"
motion_type: "camera forward dolly"
AI Video Generation Tools
Tool Comparison
| Tool | Strengths | Duration | Best For |
|---|---|---|---|
| Kling 1.6 | Consistent motion, good faces | 5-10s | Character scenes, narrative |
| Sora | Cinematic quality, complex scenes | 5-20s | Establishing shots, landscapes |
| Runway Gen-3 | Fast, good style consistency | 4-16s | Abstract, stylized footage |
| Hailuo | Free tier, decent quality | 5s | Quick tests, secondary footage |
| Luma Dream Machine | Good 3D, object rotation | 5s | Product shots, abstract |
| Stable Video | Image-to-video, consistent | 4-8s | Animated album covers |
Prompt Engineering for Video Gen
Structure: [Action] [Subject] [Environment] [Camera] [Style] [Quality]
code
A figure rising from darkness into golden light, arms spreading wide, in a massive amphitheater with glowing audience silhouettes, slow upward crane shot, cinematic color grading, epic film quality, dramatic volumetric lighting, 4K
Motion Keywords (add for dynamic scenes):
code
"slow motion", "time lapse", "tracking shot", "crane shot", "dolly zoom", "parallax", "camera orbit", "steadicam follow"
Quality Keywords:
code
"cinematic", "film grain", "anamorphic lens", "4K quality", "professional color grading", "shallow depth of field", "ARRI Alexa look", "35mm film"
Assembly Guide
Video Editor Timeline
After all scenes are generated, provide a complete timeline:
code
TIMELINE: "Trust in Yourself" — Full Music Video Audio Track: trust-in-yourself.wav (2:18, 44.1kHz) Video Track 1 (Primary footage): [0:00-0:12] Scene 1: Arena at dawn (5s gen + 7s slow-mo) [0:12-0:38] Scene 2-4: Verse montage (3x 8s clips) [0:38-0:50] Scene 5: Corridor walk (12s gen) [0:50-1:14] Scene 6-8: Chorus performance (3x 8s clips) [1:14-1:34] Scene 9-10: Deeper narrative (2x 10s clips) [1:34-1:54] Scene 11-13: Big chorus (3x 7s clips) [1:54-2:06] Scene 14: Bridge — emotional shift (12s gen) [2:06-2:18] Scene 15: Final climax + fade (12s gen) Overlay Track: [0:00-0:03] Title card fade in: "FrankX" [2:14-2:18] End card: "Trust in Yourself — Stream everywhere" Effects: - Cross dissolves between all scenes (0.5s) - Speed ramp on chorus hits (0.8x → 1.2x) - Color grade: teal and orange cinematic - Vignette on verse sections - Light leak transitions on chorus entries
Export Settings
yaml
YouTube: resolution: 3840x2160 (4K) or 1920x1080 fps: 30 or 24 (cinematic) codec: H.264 or H.265 bitrate: 20-50 Mbps audio: AAC 320kbps TikTok/Reels (vertical cut): resolution: 1080x1920 fps: 30 duration: 30-60s (best moments) Spotify Canvas: resolution: 720x1280 (9:16) duration: 3-8s loop format: MP4 size: <10MB
Visual Consistency Rules
Color Grading
- •All scenes in one video share the same color grade
- •Grade must match album cover palette
- •Use LUT references: "Teal & Orange", "Cyberpunk Neon", "Dark Film Noir"
Character Consistency
When a character appears across scenes:
- •Use the same reference image for all AI generations
- •Specify consistent clothing, hair, and features
- •If consistency fails, use silhouette/backlit approach
- •Fall back to abstract/symbolic if needed
Style Consistency
Lock these per video:
- •Aspect ratio (16:9 for YouTube, 9:16 for socials)
- •Color temperature (warm vs cool)
- •Film grain amount
- •Lighting style (natural vs neon vs mixed)
- •Render style (photorealistic vs stylized vs anime)
Budget Tiers
Tier 1: Minimal ($0 — Tools Only)
- •4-6 AI-generated images animated with Ken Burns effect
- •CapCut free tier for assembly
- •2-3 hours of work
Tier 2: Standard ($20-50 — Credits)
- •10-15 AI video clips from Kling/Runway
- •Professional transitions and color grading
- •4-6 hours of work
Tier 3: Premium ($100-200 — Full Production)
- •20+ AI video clips, multiple takes per scene
- •Custom title cards and motion graphics
- •Professional audio mastering
- •Multi-format export (YouTube + vertical)
- •8-12 hours of work
Quality Checklist
- • Every scene has a clear emotional purpose
- • Camera movements match song energy
- • Transitions land on beat changes
- • No scene exceeds 15 seconds without a cut
- • Color palette is consistent throughout
- • First 3 seconds are visually compelling
- • End card includes artist name + streaming links
- • Audio is properly synced (check lip sync if applicable)
- • Exported in correct format for target platform