Gemini 3 Flash Prompting Guide

Expert-level reference for prompting Gemini 3 Flash effectively.

1. FUNDAMENTALS

Model Specifications

Spec	Value
Context Window	1M tokens input
Max Output	64k tokens
Knowledge Cutoff	January 2025
Default Temperature	1.0
Speed	3x faster than Gemini 2.5 Pro

Critical Rules

NEVER change temperature from 1.0

code

# WRONG - causes looping and degraded performance
temperature=0.7

# CORRECT - always use default
temperature=1.0  # or omit entirely

Gemini 3's reasoning is optimized for temperature 1.0. Lower values cause:

•Looping behavior
•Degraded performance on math/reasoning
•Unexpected outputs

Less is More Gemini 3 Flash understands instructions better than previous models. Cut 30-50% of prompt verbosity compared to Gemini 2.x prompts.

Default Behavior is Concise The model prioritizes direct, efficient answers. If you need conversational/chatty responses, explicitly request it:

code

You are a friendly, conversational assistant. Explain things in detail with a warm tone.

2. THINKING LEVEL CONFIGURATION

Parameter: `thinking_level`

DO NOT use thinking_budget - that's for Gemini 2.5 models only. Using both causes errors.

Available Levels (Flash)

Level	Use Case	Latency
`minimal`	Chat, high-throughput apps	Fastest
`low`	Simple instruction following	Fast
`medium`	Balanced reasoning	Moderate
`high`	Complex reasoning, coding	Slowest

Code Examples

Python (google-genai SDK)

python

from google import genai
from google.genai import types

client = genai.Client()

# Low latency for chat
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Hello, how are you?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_level=types.ThinkingLevel.MINIMAL
        )
    )
)

# High reasoning for complex tasks
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Solve this optimization problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_level=types.ThinkingLevel.HIGH
        )
    )
)

REST API

json

{
  "model": "gemini-3-flash-preview",
  "contents": [{"role": "user", "parts": [{"text": "Your prompt"}]}],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingLevel": "LOW"
    }
  }
}

Latency Optimization

Combine low thinking level with system instruction:

code

System: Think silently. Provide direct answers without showing reasoning.

3. PROMPT STRUCTURE

The Pattern

code

Role + Goal + Constraints + Examples + Output Format

Best Practices

Be Precise and Direct

code

# WRONG - verbose, persuasive
I would really appreciate it if you could please help me summarize
this document. It would be amazing if you could make it concise
and capture all the key points.

# CORRECT - direct
Summarize this document in 3 bullet points. Focus on key findings.

Use Consistent Delimiters Choose ONE format - XML tags OR Markdown headers. Never mix.

xml

<!-- XML Style -->
<role>You are a technical writer.</role>
<task>Summarize the following code.</task>
<constraints>
- Maximum 100 words
- Use bullet points
</constraints>
<input>
{code_here}
</input>

markdown

<!-- Markdown Style -->
## Role
You are a technical writer.

## Task
Summarize the following code.

## Constraints
- Maximum 100 words
- Use bullet points

## Input
{code_here}

Place Questions AFTER Context When working with large documents/data:

code

<document>
{large_document_content}
</document>

Based on the document above, what are the three main conclusions?

Anchor to Provided Context

code

Based on the information provided above...
Using only the data in this document...
According to the context given...

4. CONSTRAINT PLACEMENT

Critical: Negative Constraints Go at END

The model may drop negative constraints if they appear too early in complex prompts.

code

# WRONG - negative constraint at start
Do not use bullet points.
Do not include examples.
Summarize this document about machine learning in 200 words.

# CORRECT - negative constraints at end
Summarize this document about machine learning.
Output must be exactly 200 words.
Do not use bullet points.
Do not include examples.

Avoid Blanket Negatives

code

# WRONG - causes over-indexing, model may refuse basic tasks
Do not infer anything.
Do not guess.
Do not make assumptions.

# CORRECT - specific instruction
Use only the information provided in the document.
If the answer is not explicitly stated, respond with "Not found in document."

5. THOUGHT SIGNATURES (Function Calling)

What They Are

Encrypted representations of the model's internal reasoning. Required for maintaining context across multi-turn function calling.

Rules

•MANDATORY for function calling - missing signatures return 400 error
•Return signatures exactly as received - do not modify
•Multi-step calls: accumulate and return ALL signatures
•Official SDKs handle this automatically

Manual Handling (if not using SDK)

python

# When you receive a function call response:
response = model.generate(...)

# Extract thought signature
thought_signature = response.candidates[0].content.parts[0].thought_signature

# On next turn, include it back:
next_request = {
    "contents": [
        # Previous turns with signatures preserved
        {
            "role": "model",
            "parts": [{
                "functionCall": {...},
                "thoughtSignature": thought_signature  # Return exactly
            }]
        },
        # Function result
        {
            "role": "user",
            "parts": [{"functionResponse": {...}}]
        }
    ]
}

SDK Auto-Handling

If using official SDKs (Python, Node, Java) with standard chat history, signatures are managed automatically. No manual handling needed.

6. FEW-SHOT PROMPTING

Why It Works

Few-shot examples are more effective than zero-shot for Gemini 3 Flash. They regulate:

•Output formatting
•Phrasing style
•Response scope
•Pattern matching

Best Practices

Show Patterns to Follow, NOT Anti-Patterns

code

# WRONG - showing what NOT to do
Input: The weather is nice today.
Output: DON'T say "The weather is pleasant" - that's wrong!

# CORRECT - showing what TO do
Input: The weather is nice today.
Output: Current conditions are favorable.

Input: I'm feeling happy.
Output: Emotional state is positive.

Use Clear Prefixes

code

Text: The cat sat on the mat.
Sentiment: Neutral

Text: I love this product!
Sentiment: Positive

Text: This is terrible service.
Sentiment: Negative

Text: {user_input}
Sentiment:

Optimal Example Count

•Start with 2-3 examples
•Add more if output quality is inconsistent
•Too many examples can cause overfitting
•Experiment to find the sweet spot for your task

7. STRUCTURED OUTPUT

JSON Mode

python

from google import genai
from google.genai import types

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract the person's name and age from: John is 25 years old.",
    config=types.GenerateContentConfig(
        response_mime_type="application/json"
    )
)

Schema Validation

python

from google import genai
from google.genai import types

schema = {
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "The person's full name"
        },
        "age": {
            "type": "integer",
            "description": "The person's age in years"
        },
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"],
            "description": "Overall sentiment of the text"
        }
    },
    "required": ["name", "age"]
}

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract info from: John, age 25, loves this product!",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=schema
    )
)

Key Points

•Use description field in schema to guide the model
•Property ordering is preserved (Gemini 2.5+)
•API validates output against schema before returning
•Complex schemas may cause 400 errors (long names, many optionals)
•Structured output guarantees syntax, NOT semantic correctness

8. CONTEXT CACHING

Two Types

Type	Description	Discount
Implicit	Automatic, no setup	90% when cache hits
Explicit	Manual control, guaranteed discount	90%

Maximizing Cache Hits

Place large, common content at BEGINNING of prompts

code

# GOOD - cacheable prefix
<system_context>
{large_static_content_here}
</system_context>

<user_query>
{variable_user_question}
</user_query>

Send similar prompts in short time windows Implicit caching works best when requests share prefixes and arrive close together.

Explicit Caching Code

python

from google import genai
from google.genai import types

# Create a cache
cache = client.caches.create(
    model="gemini-3-flash-preview",
    contents=[
        types.Content(
            role="user",
            parts=[types.Part(text=large_document)]
        )
    ],
    ttl="3600s"  # 1 hour
)

# Use the cache
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Summarize the key points.",
    config=types.GenerateContentConfig(
        cached_content=cache.name
    )
)

Requirements

•Minimum 2,048 tokens to cache
•Supports: text, PDF, image, audio, video
•Set TTL based on content relevance (default: 1 hour)
•Storage cost = tokens x time

9. AGENTIC WORKFLOWS

Multi-Turn Context

Gemini 3 Flash tracks conversation context well, but:

•Restate key constraints every few turns
•Use thought signatures for function calling chains
•Consider explicit caching for repeated context

Interactions API (For Complex Agents)

python

from google import genai

# Long-running background task
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    contents="Analyze this codebase and suggest improvements...",
    background=True  # Returns immediately with interaction ID
)

# Check status later
result = client.interactions.get(interaction.name)

Agentic System Instructions

For agents that take actions, structure system prompts around:

code

## Reasoning Strategy
- Break complex tasks into steps
- Verify information before acting
- Assess risk before state-changing operations

## Execution Rules
- Prefer read operations over writes when uncertain
- Request clarification for ambiguous instructions
- Report blockers immediately

## Output Format
- Explain reasoning before actions
- Confirm completion of each step

10. SUPPORTED TOOLS

Built-in Tools

Tool	Description	Notes
Google Search	Real-time web grounding	Billing starts Jan 5, 2026
URL Context	Deep page analysis	Max 20 URLs per request
Code Execution	Run generated code	Enables Visual Thinking
File Search	Search uploaded files	-

Combining Tools

python

from google import genai
from google.genai.types import Tool, GoogleSearch, UrlContext, CodeExecution

tools = [
    Tool(google_search=GoogleSearch()),
    Tool(url_context=UrlContext()),
    Tool(code_execution=CodeExecution())
]

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Search for latest Python 3.13 features and write example code.",
    config=types.GenerateContentConfig(tools=tools)
)

NOT Supported (Yet)

•Maps grounding
•Computer use
•Combining built-in tools with custom function calling

QUICK REFERENCE: DO NOT

Mistake	Why It's Wrong
`temperature=0.7`	Causes looping, degraded reasoning
`thinking_budget=1024`	Wrong param for Gemini 3 (use `thinking_level`)
Mix XML + Markdown	Confuses delimiter parsing
Negatives at start	Model drops early constraints
Verbose prompts	Gemini 3 needs less, not more
Skip thought signatures	400 error on function calls
Over-engineer prompts	2.x techniques often backfire

QUICK REFERENCE: OPTIMAL SETTINGS BY USE CASE

Use Case	thinking_level	Notes
Chat/Q&A	`minimal`	Fastest response
Simple tasks	`low`	Good balance
Analysis	`medium`	Thoughtful responses
Complex reasoning	`high`	Maximum depth
Coding	`high`	Best accuracy
High throughput	`minimal`	Minimize latency

Gemini 3 Flash Prompting Guide

1. FUNDAMENTALS

Model Specifications

Critical Rules

2. THINKING LEVEL CONFIGURATION

Parameter: thinking_level

Available Levels (Flash)

Code Examples

Latency Optimization

3. PROMPT STRUCTURE

The Pattern

Best Practices

4. CONSTRAINT PLACEMENT

Critical: Negative Constraints Go at END

Avoid Blanket Negatives

5. THOUGHT SIGNATURES (Function Calling)

What They Are

Rules

Manual Handling (if not using SDK)

SDK Auto-Handling

6. FEW-SHOT PROMPTING

Why It Works

Best Practices

7. STRUCTURED OUTPUT

JSON Mode

Schema Validation

Key Points

8. CONTEXT CACHING

Two Types

Maximizing Cache Hits

Explicit Caching Code

Requirements

9. AGENTIC WORKFLOWS

Multi-Turn Context

Interactions API (For Complex Agents)

Agentic System Instructions

10. SUPPORTED TOOLS

Built-in Tools

Combining Tools

NOT Supported (Yet)

QUICK REFERENCE: DO NOT

QUICK REFERENCE: OPTIMAL SETTINGS BY USE CASE

Sources

Parameter: `thinking_level`