AgentSkillsCN

gemini-api-guides

Google Gemini API 的全面参考。当您使用以下功能构建应用时,可参考此指南: (1) Gemini 模型(Gemini 3 Pro、2.5 Flash/Pro/Flash-Lite)可用于文本与多模态生成; (2) 图像生成(Imagen、Nano Banana)、视频生成(Veo 3.1)、音乐生成(Lyria); (3) 函数调用、结构化输出与代理式工作流; (4) 内置工具:Google 搜索、地图、代码执行、URL 上下文、计算机使用、文件搜索; (5) 实时语音/视频流的 Live API; (6) 长上下文(100 万+ 个标记)、嵌入式表示、文档/音频/视频理解; (7) 批量 API、上下文缓存、安全设置。 触发词:gemini api、google ai、genai sdk、gemini model、veo、imagen、nano banana、lyria、live api、vertex ai

SKILL.md
--- frontmatter
name: gemini-api-guides
description: |
  Comprehensive reference for Google's Gemini API. Use when building applications with:
  (1) Gemini models (Gemini 3 Pro, 2.5 Flash/Pro/Flash-Lite) for text and multimodal generation,
  (2) Image generation (Imagen, Nano Banana), video (Veo 3.1), music (Lyria),
  (3) Function calling, structured outputs, and agentic workflows,
  (4) Built-in tools: Google Search, Maps, Code Execution, URL Context, Computer Use, File Search,
  (5) Live API for real-time voice/video streaming,
  (6) Long context (1M+ tokens), embeddings, document/audio/video understanding,
  (7) Batch API, context caching, safety settings.
  Triggers: "gemini api", "google ai", "genai sdk", "gemini model", "veo", "imagen", "nano banana", "lyria", "live api", "vertex ai"

Gemini API Skill

Build AI applications with Google's Gemini models and tools.

Quick Start

Installation

bash
# Python
pip install google-genai

# JavaScript/Node.js
npm install @google/genai

# Go
go get google.golang.org/genai

Environment Setup

bash
export GEMINI_API_KEY="your-api-key"

Basic Usage

Python:

python
from google import genai

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Your prompt here"
)
print(response.text)

JavaScript:

javascript
import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
    model: "gemini-2.5-flash",
    contents: "Your prompt here"
});
console.log(response.text);

REST:

bash
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"contents": [{"parts": [{"text": "Your prompt here"}]}]}'

Model Selection

ModelBest ForContext Window
Gemini 3 ProMost intelligent tasks, multimodal reasoning, agenticSee models-overview
Gemini 2.5 ProComplex reasoning, coding, extended thinking1M tokens
Gemini 2.5 FlashBalanced performance, general tasks1M tokens
Gemini 2.5 Flash-LiteHigh-volume, cost-sensitive, fastestSee models-overview
ImagenHigh-fidelity image generationN/A
Veo 3.1Video generation (8s, 720p/1080p with audio)N/A
Nano BananaNative image gen with Gemini 2.5 FlashN/A
Nano Banana ProNative image gen with Gemini 3 ProN/A

Reference Documentation Index

Getting Started

TopicFileDescription
Setup & Librariesgetting-started.mdAPI keys, SDK installation, OpenAI compatibility

Models & Pricing

TopicFileDescription
Model Overviewmodels-overview.mdAll models, capabilities, context windows
Pricingapi-pricing.mdToken costs, tool pricing
Rate Limitsrate-limits.mdRPM/TPM limits, quotas
Gemini 3 Guidegemini-3.mdGemini 3 specific features and best practices
Imagenimagen.mdImage generation with Imagen model
Embeddingsembeddings.mdText embeddings for search/RAG
Veoveo.mdVideo generation with Veo 3.1 (69K)
Lyrialyria.mdMusic generation with Lyria RealTime
Roboticsrobotics.mdGemini Robotics-ER 1.5 (42K)

Core Capabilities

TopicFileDescription
Text Generationtext-generation.mdText generation, system instructions (38K)
Image Gen (Nano Banana)image-generation-gemini.mdNative image generation with Gemini (LARGE: 174K)
Image Understandingimage-understanding.mdVision, image analysis
Video Understandingvideo-understanding.mdVideo analysis, timestamps
Document Understandingdocument-understanding.mdPDF and document processing
Speech Generationspeech-generation.mdText-to-speech (TTS)
Audio Understandingaudio-understanding.mdAudio analysis, transcription

Advanced Features

TopicFileDescription
Thinking Modethinking.mdExtended reasoning capabilities
Thought Signaturesthought-signatures.mdEDGE CASE ONLY: Manual signature handling when NOT using official SDKs
Structured Outputsstructured-outputs.mdJSON schema responses
Function Callingfunction-calling.mdCustom tool integration (54K)
Long Contextlong-context.md1M+ token handling, context caching

Tools

TopicFileDescription
Tools Overviewtools-overview.mdBuilt-in tools summary, agent frameworks
Google Searchgoogle-search.mdWeb search grounding
Google Mapsgoogle-maps.mdLocation-aware grounding
Code Executioncode-execution.mdPython code execution tool
URL Contexturl-context.mdURL content extraction
Computer Usecomputer-use.mdBrowser automation (preview) (44K)
File Searchfile-search.mdRAG with document indexing

Live API (Real-time Streaming)

TopicFileDescription
Getting Startedlive-api-getting-started.mdLow-latency voice/video interactions
Capabilities Guidelive-api-capabilities.mdFull capabilities and configurations (32K)
Tool Uselive-api-tools.mdFunction calling & Search in Live API
Session Managementlive-api-sessions.mdSession handling, time limits
Ephemeral Tokensephemeral-tokens.mdShort-lived auth for client-side WebSockets

Guides

TopicFileDescription
Batch APIbatch-api.mdAsync processing at 50% cost (47K)
Files APIfiles-api.mdUpload and manage media files (49K)
Context Cachingcontext-caching.mdImplicit & explicit caching for cost savings
Media Resolutionmedia-resolution.mdControl token allocation for media
Tokenstokens.mdUnderstand and count tokens
Prompt Designprompt-design.mdPrompt strategies and best practices (47K)
Logs & Datasetslogs-datasets.mdEnable logging, view in AI Studio
Data Logging & Sharingdata-logging-sharing.mdStorage and management of API logs
Safety Settingssafety-settings.mdAdjust safety filters
Safety Guidancesafety-guidance.mdBest practices for safe AI use

Troubleshooting & Migration

TopicFileDescription
Troubleshootingtroubleshooting.mdDiagnose and resolve common API issues (25K)
Vertex AI Comparisonvertex-ai-comparison.mdREAD ONLY IF USER MENTIONS "VERTEX AI": Gemini Developer API vs Vertex AI differences

Large Files - Search Patterns

For large reference files (>30K), use grep to find specific sections:

image-generation-gemini.md (174K):

bash
grep -n "## " references/image-generation-gemini.md  # List sections
grep -n "edit" references/image-generation-gemini.md  # Find editing info
grep -n "style" references/image-generation-gemini.md  # Find style transfer

veo.md (69K):

bash
grep -n "## " references/veo.md  # List sections
grep -n "audio" references/veo.md  # Find audio generation info

models-overview.md (67K):

bash
grep -n "gemini-3" references/models-overview.md
grep -n "context" references/models-overview.md

function-calling.md (54K):

bash
grep -n "## " references/function-calling.md
grep -n "parallel" references/function-calling.md  # Parallel function calls

Common Patterns

Multimodal Input (Image + Text)

python
from google import genai
from google.genai import types

client = genai.Client()
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        types.Part.from_image(image_path),
        types.Part.from_text("Describe this image")
    ]
)

Function Calling

python
tools = [
    types.Tool(function_declarations=[{
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }])
]

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Paris?",
    config=types.GenerateContentConfig(tools=tools)
)

Google Search Grounding

python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What are the latest AI developments?",
    config=types.GenerateContentConfig(
        tools=[types.Tool(google_search=types.GoogleSearch())]
    )
)

Thinking Mode

python
response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="Solve this complex problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget_tokens=10000)
    )
)

Streaming

python
for chunk in client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Write a story"
):
    print(chunk.text, end="")

Key Concepts

Tool Execution Flow

Built-in tools (Google Search, Code Execution): Executed by Google

  1. Send prompt with tool config → Model executes tool → Response with grounded results

Custom tools (Function Calling): You execute

  1. Send prompt with function declarations → Model returns function call JSON
  2. You execute function, send result back → Model generates final response

Thought Signatures (Important)

  • If using official SDKs with chat feature: Thought signatures are handled automatically. No action needed.
  • If manually managing conversation history: Read thought-signatures.md for Gemini 3 Pro function calling requirements.

API Endpoints

EndpointPurpose
/v1beta/models/{model}:generateContentStandard generation
/v1beta/models/{model}:streamGenerateContentStreaming
/v1beta/models/{model}:embedContentEmbeddings
/v1beta/models/{model}:countTokensToken counting

Base URL: https://generativelanguage.googleapis.com