Gemini Multimodal Tool
Use the ai-gem CLI tool for multimodal AI processing and image generation via Google's Gemini API.
Usage
bash
# Text queries ai-gem "Write a haiku about Python programming" # Analyze documents ai-gem "Summarize this document" document.pdf # Analyze images ai-gem "What's in this image?" photo.jpg # Process YouTube videos ai-gem "Create a 5-point summary" "https://youtu.be/VIDEO_ID" # Compare multiple files ai-gem "Compare these files" file1.pdf file2.png # Web search ai-gem "Current AI news" --search # Generate images (uses Nano Banana Pro by default) ai-gem --image "A cute robot reading a book in a cozy library" ai-gem --image "A landscape at sunset" --aspect-ratio 16:9 ai-gem --image "A cat wearing a hat" -o cat.png ai-gem --image "Edit this to add sunglasses" reference.jpg # Use alternative image model ai-gem --image "A blue triangle" -m gemini-2.5-flash-image
Image Generation Options
- •
--image/-i: Generate an image instead of text - •
--output/-o: Output file path (auto-generated if omitted) - •
--aspect-ratio/-a: Aspect ratio (1:1, 9:16, 16:9, etc.) - •
--model/-m: Override model (default: nano-banana-pro-preview) - •Attachments serve as reference images for editing
Requirements
- •
GEMINI_API_KEYenvironment variable must be set - •The
hamelpackage must be installed:pip install hamel
Supported Input Types
- •PDFs
- •Images (PNG, JPEG, GIF, WebP)
- •Videos (MP4, etc.)
- •YouTube URLs
- •Plain text files
- •Multiple files for comparison