AgentSkillsCN

Music Video

音乐视频

SKILL.md

Skill: Music Video Storyboard & Production

Purpose

Generate complete music video storyboards, scene-by-scene AI image/video prompts, and assembly guides for full-length music videos using AI generation tools (Kling, Sora, Runway Gen-3, Hailuo, Luma).

Trigger

When the user wants a full music video or visual album, or when generating premium content for a high-performing track.

When to Make a Full Music Video

Only invest in full videos for tracks that justify it:

  • Breakout tier (100+ plays) — always
  • High engagement (>30% like/play ratio) — always
  • Lead singles for album releases — always
  • Persona flagship tracks — always
  • Growing tier (20-50 plays) — only if genre benefits from visuals (J-rock, metal, cinematic)

Storyboard Framework

Song Analysis (Pre-Production)

Before any visual planning, analyze the full song:

yaml
Song Analysis:
  title: "Trust in Yourself"
  duration: "2:18"
  bpm: 140
  structure:
    - intro: "0:00-0:12" (instrumental build)
    - verse1: "0:12-0:38" (story setup)
    - prechorus: "0:38-0:50" (tension building)
    - chorus1: "0:50-1:14" (release, main hook)
    - verse2: "1:14-1:34" (deeper story)
    - chorus2: "1:34-1:54" (bigger, layered)
    - bridge: "1:54-2:06" (emotional shift)
    - final_chorus: "2:06-2:18" (climax + resolve)
  emotional_arc: [tension → hope → release → power → transcendence]
  visual_keywords: [arena, spotlight, rising, crowd, triumph]

Scene Count Formula

code
Duration / Average scene length = Scene count

Under 2:30  → 8-12 scenes (10-15 sec each)
2:30-3:30   → 12-18 scenes (12-15 sec each)
3:30-5:00   → 18-25 scenes (12-15 sec each)

Storyboard Template

For each scene:

yaml
Scene 3:
  timestamp: "0:38-0:50"
  song_section: "Pre-Chorus"
  duration: 12s
  description: "Camera follows protagonist walking through a neon-lit corridor. Walls pulse with the beat. Each step triggers a light ripple on the floor."
  emotional_beat: "Building anticipation"
  camera:
    shot_type: "tracking medium shot"
    movement: "steady forward dolly"
    angle: "eye level, slightly low"
  lighting: "Cyan side-light, purple overhead, dark shadows"
  subject: "Silhouetted figure walking toward bright light at end of corridor"
  props: "Glass walls with circuit patterns, floating data particles"
  color_palette: ["#0F172A", "#43BFE3", "#AB47C7"]
  transition_to_next: "Light burst fills frame → dissolve to chorus scene"
  ai_prompt: |
    A person walking through a futuristic glass corridor with neon cyan
    and purple lighting, circuit patterns on transparent walls, dark
    atmospheric, cinematic tracking shot, volumetric fog, cyberpunk
    aesthetic, 4K film quality, dramatic lighting
  generation_tool: "Kling 1.6 / Sora"
  generation_settings:
    aspect_ratio: "16:9"
    duration: "5s"
    motion_type: "camera forward dolly"

AI Video Generation Tools

Tool Comparison

ToolStrengthsDurationBest For
Kling 1.6Consistent motion, good faces5-10sCharacter scenes, narrative
SoraCinematic quality, complex scenes5-20sEstablishing shots, landscapes
Runway Gen-3Fast, good style consistency4-16sAbstract, stylized footage
HailuoFree tier, decent quality5sQuick tests, secondary footage
Luma Dream MachineGood 3D, object rotation5sProduct shots, abstract
Stable VideoImage-to-video, consistent4-8sAnimated album covers

Prompt Engineering for Video Gen

Structure: [Action] [Subject] [Environment] [Camera] [Style] [Quality]

code
A figure rising from darkness into golden light, arms spreading wide,
in a massive amphitheater with glowing audience silhouettes,
slow upward crane shot, cinematic color grading, epic film quality,
dramatic volumetric lighting, 4K

Motion Keywords (add for dynamic scenes):

code
"slow motion", "time lapse", "tracking shot", "crane shot",
"dolly zoom", "parallax", "camera orbit", "steadicam follow"

Quality Keywords:

code
"cinematic", "film grain", "anamorphic lens", "4K quality",
"professional color grading", "shallow depth of field",
"ARRI Alexa look", "35mm film"

Assembly Guide

Video Editor Timeline

After all scenes are generated, provide a complete timeline:

code
TIMELINE: "Trust in Yourself" — Full Music Video

Audio Track: trust-in-yourself.wav (2:18, 44.1kHz)

Video Track 1 (Primary footage):
  [0:00-0:12] Scene 1: Arena at dawn (5s gen + 7s slow-mo)
  [0:12-0:38] Scene 2-4: Verse montage (3x 8s clips)
  [0:38-0:50] Scene 5: Corridor walk (12s gen)
  [0:50-1:14] Scene 6-8: Chorus performance (3x 8s clips)
  [1:14-1:34] Scene 9-10: Deeper narrative (2x 10s clips)
  [1:34-1:54] Scene 11-13: Big chorus (3x 7s clips)
  [1:54-2:06] Scene 14: Bridge — emotional shift (12s gen)
  [2:06-2:18] Scene 15: Final climax + fade (12s gen)

Overlay Track:
  [0:00-0:03] Title card fade in: "FrankX"
  [2:14-2:18] End card: "Trust in Yourself — Stream everywhere"

Effects:
  - Cross dissolves between all scenes (0.5s)
  - Speed ramp on chorus hits (0.8x → 1.2x)
  - Color grade: teal and orange cinematic
  - Vignette on verse sections
  - Light leak transitions on chorus entries

Export Settings

yaml
YouTube:
  resolution: 3840x2160 (4K) or 1920x1080
  fps: 30 or 24 (cinematic)
  codec: H.264 or H.265
  bitrate: 20-50 Mbps
  audio: AAC 320kbps

TikTok/Reels (vertical cut):
  resolution: 1080x1920
  fps: 30
  duration: 30-60s (best moments)

Spotify Canvas:
  resolution: 720x1280 (9:16)
  duration: 3-8s loop
  format: MP4
  size: <10MB

Visual Consistency Rules

Color Grading

  • All scenes in one video share the same color grade
  • Grade must match album cover palette
  • Use LUT references: "Teal & Orange", "Cyberpunk Neon", "Dark Film Noir"

Character Consistency

When a character appears across scenes:

  1. Use the same reference image for all AI generations
  2. Specify consistent clothing, hair, and features
  3. If consistency fails, use silhouette/backlit approach
  4. Fall back to abstract/symbolic if needed

Style Consistency

Lock these per video:

  • Aspect ratio (16:9 for YouTube, 9:16 for socials)
  • Color temperature (warm vs cool)
  • Film grain amount
  • Lighting style (natural vs neon vs mixed)
  • Render style (photorealistic vs stylized vs anime)

Budget Tiers

Tier 1: Minimal ($0 — Tools Only)

  • 4-6 AI-generated images animated with Ken Burns effect
  • CapCut free tier for assembly
  • 2-3 hours of work

Tier 2: Standard ($20-50 — Credits)

  • 10-15 AI video clips from Kling/Runway
  • Professional transitions and color grading
  • 4-6 hours of work

Tier 3: Premium ($100-200 — Full Production)

  • 20+ AI video clips, multiple takes per scene
  • Custom title cards and motion graphics
  • Professional audio mastering
  • Multi-format export (YouTube + vertical)
  • 8-12 hours of work

Quality Checklist

  • Every scene has a clear emotional purpose
  • Camera movements match song energy
  • Transitions land on beat changes
  • No scene exceeds 15 seconds without a cut
  • Color palette is consistent throughout
  • First 3 seconds are visually compelling
  • End card includes artist name + streaming links
  • Audio is properly synced (check lip sync if applicable)
  • Exported in correct format for target platform