Gemini 3 Flash Prompting Guide
Expert-level reference for prompting Gemini 3 Flash effectively.
1. FUNDAMENTALS
Model Specifications
| Spec | Value |
|---|---|
| Context Window | 1M tokens input |
| Max Output | 64k tokens |
| Knowledge Cutoff | January 2025 |
| Default Temperature | 1.0 |
| Speed | 3x faster than Gemini 2.5 Pro |
Critical Rules
NEVER change temperature from 1.0
# WRONG - causes looping and degraded performance temperature=0.7 # CORRECT - always use default temperature=1.0 # or omit entirely
Gemini 3's reasoning is optimized for temperature 1.0. Lower values cause:
- •Looping behavior
- •Degraded performance on math/reasoning
- •Unexpected outputs
Less is More Gemini 3 Flash understands instructions better than previous models. Cut 30-50% of prompt verbosity compared to Gemini 2.x prompts.
Default Behavior is Concise The model prioritizes direct, efficient answers. If you need conversational/chatty responses, explicitly request it:
You are a friendly, conversational assistant. Explain things in detail with a warm tone.
2. THINKING LEVEL CONFIGURATION
Parameter: thinking_level
DO NOT use thinking_budget - that's for Gemini 2.5 models only. Using both causes errors.
Available Levels (Flash)
| Level | Use Case | Latency |
|---|---|---|
minimal | Chat, high-throughput apps | Fastest |
low | Simple instruction following | Fast |
medium | Balanced reasoning | Moderate |
high | Complex reasoning, coding | Slowest |
Code Examples
Python (google-genai SDK)
from google import genai
from google.genai import types
client = genai.Client()
# Low latency for chat
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Hello, how are you?",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
thinking_level=types.ThinkingLevel.MINIMAL
)
)
)
# High reasoning for complex tasks
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Solve this optimization problem...",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(
thinking_level=types.ThinkingLevel.HIGH
)
)
)
REST API
{
"model": "gemini-3-flash-preview",
"contents": [{"role": "user", "parts": [{"text": "Your prompt"}]}],
"generationConfig": {
"thinkingConfig": {
"thinkingLevel": "LOW"
}
}
}
Latency Optimization
Combine low thinking level with system instruction:
System: Think silently. Provide direct answers without showing reasoning.
3. PROMPT STRUCTURE
The Pattern
Role + Goal + Constraints + Examples + Output Format
Best Practices
Be Precise and Direct
# WRONG - verbose, persuasive I would really appreciate it if you could please help me summarize this document. It would be amazing if you could make it concise and capture all the key points. # CORRECT - direct Summarize this document in 3 bullet points. Focus on key findings.
Use Consistent Delimiters Choose ONE format - XML tags OR Markdown headers. Never mix.
<!-- XML Style -->
<role>You are a technical writer.</role>
<task>Summarize the following code.</task>
<constraints>
- Maximum 100 words
- Use bullet points
</constraints>
<input>
{code_here}
</input>
<!-- Markdown Style -->
## Role
You are a technical writer.
## Task
Summarize the following code.
## Constraints
- Maximum 100 words
- Use bullet points
## Input
{code_here}
Place Questions AFTER Context When working with large documents/data:
<document>
{large_document_content}
</document>
Based on the document above, what are the three main conclusions?
Anchor to Provided Context
Based on the information provided above... Using only the data in this document... According to the context given...
4. CONSTRAINT PLACEMENT
Critical: Negative Constraints Go at END
The model may drop negative constraints if they appear too early in complex prompts.
# WRONG - negative constraint at start Do not use bullet points. Do not include examples. Summarize this document about machine learning in 200 words. # CORRECT - negative constraints at end Summarize this document about machine learning. Output must be exactly 200 words. Do not use bullet points. Do not include examples.
Avoid Blanket Negatives
# WRONG - causes over-indexing, model may refuse basic tasks Do not infer anything. Do not guess. Do not make assumptions. # CORRECT - specific instruction Use only the information provided in the document. If the answer is not explicitly stated, respond with "Not found in document."
5. THOUGHT SIGNATURES (Function Calling)
What They Are
Encrypted representations of the model's internal reasoning. Required for maintaining context across multi-turn function calling.
Rules
- •MANDATORY for function calling - missing signatures return 400 error
- •Return signatures exactly as received - do not modify
- •Multi-step calls: accumulate and return ALL signatures
- •Official SDKs handle this automatically
Manual Handling (if not using SDK)
# When you receive a function call response:
response = model.generate(...)
# Extract thought signature
thought_signature = response.candidates[0].content.parts[0].thought_signature
# On next turn, include it back:
next_request = {
"contents": [
# Previous turns with signatures preserved
{
"role": "model",
"parts": [{
"functionCall": {...},
"thoughtSignature": thought_signature # Return exactly
}]
},
# Function result
{
"role": "user",
"parts": [{"functionResponse": {...}}]
}
]
}
SDK Auto-Handling
If using official SDKs (Python, Node, Java) with standard chat history, signatures are managed automatically. No manual handling needed.
6. FEW-SHOT PROMPTING
Why It Works
Few-shot examples are more effective than zero-shot for Gemini 3 Flash. They regulate:
- •Output formatting
- •Phrasing style
- •Response scope
- •Pattern matching
Best Practices
Show Patterns to Follow, NOT Anti-Patterns
# WRONG - showing what NOT to do Input: The weather is nice today. Output: DON'T say "The weather is pleasant" - that's wrong! # CORRECT - showing what TO do Input: The weather is nice today. Output: Current conditions are favorable. Input: I'm feeling happy. Output: Emotional state is positive.
Use Clear Prefixes
Text: The cat sat on the mat.
Sentiment: Neutral
Text: I love this product!
Sentiment: Positive
Text: This is terrible service.
Sentiment: Negative
Text: {user_input}
Sentiment:
Optimal Example Count
- •Start with 2-3 examples
- •Add more if output quality is inconsistent
- •Too many examples can cause overfitting
- •Experiment to find the sweet spot for your task
7. STRUCTURED OUTPUT
JSON Mode
from google import genai
from google.genai import types
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Extract the person's name and age from: John is 25 years old.",
config=types.GenerateContentConfig(
response_mime_type="application/json"
)
)
Schema Validation
from google import genai
from google.genai import types
schema = {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The person's full name"
},
"age": {
"type": "integer",
"description": "The person's age in years"
},
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"],
"description": "Overall sentiment of the text"
}
},
"required": ["name", "age"]
}
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Extract info from: John, age 25, loves this product!",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=schema
)
)
Key Points
- •Use
descriptionfield in schema to guide the model - •Property ordering is preserved (Gemini 2.5+)
- •API validates output against schema before returning
- •Complex schemas may cause 400 errors (long names, many optionals)
- •Structured output guarantees syntax, NOT semantic correctness
8. CONTEXT CACHING
Two Types
| Type | Description | Discount |
|---|---|---|
| Implicit | Automatic, no setup | 90% when cache hits |
| Explicit | Manual control, guaranteed discount | 90% |
Maximizing Cache Hits
Place large, common content at BEGINNING of prompts
# GOOD - cacheable prefix
<system_context>
{large_static_content_here}
</system_context>
<user_query>
{variable_user_question}
</user_query>
Send similar prompts in short time windows Implicit caching works best when requests share prefixes and arrive close together.
Explicit Caching Code
from google import genai
from google.genai import types
# Create a cache
cache = client.caches.create(
model="gemini-3-flash-preview",
contents=[
types.Content(
role="user",
parts=[types.Part(text=large_document)]
)
],
ttl="3600s" # 1 hour
)
# Use the cache
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Summarize the key points.",
config=types.GenerateContentConfig(
cached_content=cache.name
)
)
Requirements
- •Minimum 2,048 tokens to cache
- •Supports: text, PDF, image, audio, video
- •Set TTL based on content relevance (default: 1 hour)
- •Storage cost = tokens x time
9. AGENTIC WORKFLOWS
Multi-Turn Context
Gemini 3 Flash tracks conversation context well, but:
- •Restate key constraints every few turns
- •Use thought signatures for function calling chains
- •Consider explicit caching for repeated context
Interactions API (For Complex Agents)
from google import genai
# Long-running background task
interaction = client.interactions.create(
model="gemini-3-flash-preview",
contents="Analyze this codebase and suggest improvements...",
background=True # Returns immediately with interaction ID
)
# Check status later
result = client.interactions.get(interaction.name)
Agentic System Instructions
For agents that take actions, structure system prompts around:
## Reasoning Strategy - Break complex tasks into steps - Verify information before acting - Assess risk before state-changing operations ## Execution Rules - Prefer read operations over writes when uncertain - Request clarification for ambiguous instructions - Report blockers immediately ## Output Format - Explain reasoning before actions - Confirm completion of each step
10. SUPPORTED TOOLS
Built-in Tools
| Tool | Description | Notes |
|---|---|---|
| Google Search | Real-time web grounding | Billing starts Jan 5, 2026 |
| URL Context | Deep page analysis | Max 20 URLs per request |
| Code Execution | Run generated code | Enables Visual Thinking |
| File Search | Search uploaded files | - |
Combining Tools
from google import genai
from google.genai.types import Tool, GoogleSearch, UrlContext, CodeExecution
tools = [
Tool(google_search=GoogleSearch()),
Tool(url_context=UrlContext()),
Tool(code_execution=CodeExecution())
]
response = client.models.generate_content(
model="gemini-3-flash-preview",
contents="Search for latest Python 3.13 features and write example code.",
config=types.GenerateContentConfig(tools=tools)
)
NOT Supported (Yet)
- •Maps grounding
- •Computer use
- •Combining built-in tools with custom function calling
QUICK REFERENCE: DO NOT
| Mistake | Why It's Wrong |
|---|---|
temperature=0.7 | Causes looping, degraded reasoning |
thinking_budget=1024 | Wrong param for Gemini 3 (use thinking_level) |
| Mix XML + Markdown | Confuses delimiter parsing |
| Negatives at start | Model drops early constraints |
| Verbose prompts | Gemini 3 needs less, not more |
| Skip thought signatures | 400 error on function calls |
| Over-engineer prompts | 2.x techniques often backfire |
QUICK REFERENCE: OPTIMAL SETTINGS BY USE CASE
| Use Case | thinking_level | Notes |
|---|---|---|
| Chat/Q&A | minimal | Fastest response |
| Simple tasks | low | Good balance |
| Analysis | medium | Thoughtful responses |
| Complex reasoning | high | Maximum depth |
| Coding | high | Best accuracy |
| High throughput | minimal | Minimize latency |