Video Generation Skill

Creates animated video clips using Veo 3.1 multi-image reference mode with native audio.

Docs: https://ai.google.dev/gemini-api/docs/video?example=dialogue#reference-images

API Mode: Multi-Image References

Uses reference_images parameter (NOT image parameter). This is text-to-video with visual references for consistency.

Constraint	Value	Notes
Max references	3	API enforced limit
Duration	Optional	API handles default (~8s)
`image` param	NOT USED	Incompatible with reference_images
Models	Both work	`veo-3.1-generate-preview` or `veo-3.1-fast-generate-preview`

Reference Strategy

Slot	Image	Purpose
1	Panel image	Scene composition
2	Character 1 sheet	Character consistency
3	Character 2 sheet	Second character (optional)

Method Signature

python

async def generate_clip_with_references(
    self,
    keyframe_path: Path,              # Panel as reference #1
    reference_images: list[Path],      # Character sheets (refs #2-3)
    dialogue: str = None,              # For native audio
    story_context: str = None,         # Scene description
    character_name: str = None,        # Speaker name
    duration_seconds: int = None,      # Optional (default: 8)
    clip_index: int = 0,
    next_panel_path: Path = None,      # Ignored (incompatible with refs)
) -> VideoClipResult:

Native Audio Prompting

Veo 3.1 generates speech from dialogue in the prompt:

code

DIALOGUE:
Mochi says: "Wow, this is amazing!"

AUDIO:
- Character speaks the dialogue naturally with appropriate emotion
- Natural ambient sounds matching the scene

Example Usage

python

from skills.generate_video import VideoGenerator

video_gen = VideoGenerator()

result = await video_gen.generate_clip_with_references(
    keyframe_path=panel_path,
    reference_images=[character_sheet_1, character_sheet_2],
    dialogue="Mochi: Look at this treasure map!",
    story_context="SCENE 1: Mochi discovers a mysterious map",
    character_name="Mochi",
)

# Video has native audio - no FFmpeg overlay needed!
print(f"Generated: {result.video_path} ({result.duration_seconds}s)")

API Call (Internal)

python

# NO image= param when using reference_images
operation = client.models.generate_videos(
    model="veo-3.1-fast-generate-preview",
    prompt=prompt,  # Includes dialogue for native audio
    config=types.GenerateVideosConfig(
        reference_images=[panel_ref, char1_ref, char2_ref],
        aspect_ratio="9:16",
    ),
)

Aspect Ratio

Ratio	Use Case
`9:16`	TikTok, Reels, Shorts, Manga (default)
`16:9`	YouTube horizontal

Polling & Timeout

•Poll interval: 10 seconds
•Max wait: 5 minutes (300s)
•Typical generation: 30-60 seconds

Error Handling

Error	Cause	Recovery
`400 INVALID_ARGUMENT`	>3 refs or `image`+`reference_images`	Check ref count
`FileNotFoundError`	Panel/ref missing	Check paths
`TimeoutError`	Veo too slow	Retry