AgentSkillsCN

gemini-prompting

专家参考,用于通过OpenRouter为Gemini 3 Flash模型编写提示词、配置思考层级、思维签名、上下文缓存,或提供少样本示例。

SKILL.md
--- frontmatter
name: gemini-prompting
description: Expert reference for prompting Gemini 3 Flash models via OpenRouter. Use when writing Gemini prompts, configuring thinking levels, thought signatures, context caching, or few-shot examples.

Gemini 3 Flash Prompting Guide

Expert-level reference for prompting Gemini 3 Flash effectively.


1. FUNDAMENTALS

Model Specifications

SpecValue
Context Window1M tokens input
Max Output64k tokens
Knowledge CutoffJanuary 2025
Default Temperature1.0
Speed3x faster than Gemini 2.5 Pro

Critical Rules

NEVER change temperature from 1.0

code
# WRONG - causes looping and degraded performance
temperature=0.7

# CORRECT - always use default
temperature=1.0  # or omit entirely

Gemini 3's reasoning is optimized for temperature 1.0. Lower values cause:

  • Looping behavior
  • Degraded performance on math/reasoning
  • Unexpected outputs

Less is More Gemini 3 Flash understands instructions better than previous models. Cut 30-50% of prompt verbosity compared to Gemini 2.x prompts.

Default Behavior is Concise The model prioritizes direct, efficient answers. If you need conversational/chatty responses, explicitly request it:

code
You are a friendly, conversational assistant. Explain things in detail with a warm tone.

2. THINKING LEVEL CONFIGURATION

Parameter: thinking_level

DO NOT use thinking_budget - that's for Gemini 2.5 models only. Using both causes errors.

Available Levels (Flash)

LevelUse CaseLatency
minimalChat, high-throughput appsFastest
lowSimple instruction followingFast
mediumBalanced reasoningModerate
highComplex reasoning, codingSlowest

Code Examples

Python (google-genai SDK)

python
from google import genai
from google.genai import types

client = genai.Client()

# Low latency for chat
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Hello, how are you?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_level=types.ThinkingLevel.MINIMAL
        )
    )
)

# High reasoning for complex tasks
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Solve this optimization problem...",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(
            thinking_level=types.ThinkingLevel.HIGH
        )
    )
)

REST API

json
{
  "model": "gemini-3-flash-preview",
  "contents": [{"role": "user", "parts": [{"text": "Your prompt"}]}],
  "generationConfig": {
    "thinkingConfig": {
      "thinkingLevel": "LOW"
    }
  }
}

Latency Optimization

Combine low thinking level with system instruction:

code
System: Think silently. Provide direct answers without showing reasoning.

3. PROMPT STRUCTURE

The Pattern

code
Role + Goal + Constraints + Examples + Output Format

Best Practices

Be Precise and Direct

code
# WRONG - verbose, persuasive
I would really appreciate it if you could please help me summarize
this document. It would be amazing if you could make it concise
and capture all the key points.

# CORRECT - direct
Summarize this document in 3 bullet points. Focus on key findings.

Use Consistent Delimiters Choose ONE format - XML tags OR Markdown headers. Never mix.

xml
<!-- XML Style -->
<role>You are a technical writer.</role>
<task>Summarize the following code.</task>
<constraints>
- Maximum 100 words
- Use bullet points
</constraints>
<input>
{code_here}
</input>
markdown
<!-- Markdown Style -->
## Role
You are a technical writer.

## Task
Summarize the following code.

## Constraints
- Maximum 100 words
- Use bullet points

## Input
{code_here}

Place Questions AFTER Context When working with large documents/data:

code
<document>
{large_document_content}
</document>

Based on the document above, what are the three main conclusions?

Anchor to Provided Context

code
Based on the information provided above...
Using only the data in this document...
According to the context given...

4. CONSTRAINT PLACEMENT

Critical: Negative Constraints Go at END

The model may drop negative constraints if they appear too early in complex prompts.

code
# WRONG - negative constraint at start
Do not use bullet points.
Do not include examples.
Summarize this document about machine learning in 200 words.

# CORRECT - negative constraints at end
Summarize this document about machine learning.
Output must be exactly 200 words.
Do not use bullet points.
Do not include examples.

Avoid Blanket Negatives

code
# WRONG - causes over-indexing, model may refuse basic tasks
Do not infer anything.
Do not guess.
Do not make assumptions.

# CORRECT - specific instruction
Use only the information provided in the document.
If the answer is not explicitly stated, respond with "Not found in document."

5. THOUGHT SIGNATURES (Function Calling)

What They Are

Encrypted representations of the model's internal reasoning. Required for maintaining context across multi-turn function calling.

Rules

  1. MANDATORY for function calling - missing signatures return 400 error
  2. Return signatures exactly as received - do not modify
  3. Multi-step calls: accumulate and return ALL signatures
  4. Official SDKs handle this automatically

Manual Handling (if not using SDK)

python
# When you receive a function call response:
response = model.generate(...)

# Extract thought signature
thought_signature = response.candidates[0].content.parts[0].thought_signature

# On next turn, include it back:
next_request = {
    "contents": [
        # Previous turns with signatures preserved
        {
            "role": "model",
            "parts": [{
                "functionCall": {...},
                "thoughtSignature": thought_signature  # Return exactly
            }]
        },
        # Function result
        {
            "role": "user",
            "parts": [{"functionResponse": {...}}]
        }
    ]
}

SDK Auto-Handling

If using official SDKs (Python, Node, Java) with standard chat history, signatures are managed automatically. No manual handling needed.


6. FEW-SHOT PROMPTING

Why It Works

Few-shot examples are more effective than zero-shot for Gemini 3 Flash. They regulate:

  • Output formatting
  • Phrasing style
  • Response scope
  • Pattern matching

Best Practices

Show Patterns to Follow, NOT Anti-Patterns

code
# WRONG - showing what NOT to do
Input: The weather is nice today.
Output: DON'T say "The weather is pleasant" - that's wrong!

# CORRECT - showing what TO do
Input: The weather is nice today.
Output: Current conditions are favorable.

Input: I'm feeling happy.
Output: Emotional state is positive.

Use Clear Prefixes

code
Text: The cat sat on the mat.
Sentiment: Neutral

Text: I love this product!
Sentiment: Positive

Text: This is terrible service.
Sentiment: Negative

Text: {user_input}
Sentiment:

Optimal Example Count

  • Start with 2-3 examples
  • Add more if output quality is inconsistent
  • Too many examples can cause overfitting
  • Experiment to find the sweet spot for your task

7. STRUCTURED OUTPUT

JSON Mode

python
from google import genai
from google.genai import types

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract the person's name and age from: John is 25 years old.",
    config=types.GenerateContentConfig(
        response_mime_type="application/json"
    )
)

Schema Validation

python
from google import genai
from google.genai import types

schema = {
    "type": "object",
    "properties": {
        "name": {
            "type": "string",
            "description": "The person's full name"
        },
        "age": {
            "type": "integer",
            "description": "The person's age in years"
        },
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "neutral"],
            "description": "Overall sentiment of the text"
        }
    },
    "required": ["name", "age"]
}

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Extract info from: John, age 25, loves this product!",
    config=types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=schema
    )
)

Key Points

  • Use description field in schema to guide the model
  • Property ordering is preserved (Gemini 2.5+)
  • API validates output against schema before returning
  • Complex schemas may cause 400 errors (long names, many optionals)
  • Structured output guarantees syntax, NOT semantic correctness

8. CONTEXT CACHING

Two Types

TypeDescriptionDiscount
ImplicitAutomatic, no setup90% when cache hits
ExplicitManual control, guaranteed discount90%

Maximizing Cache Hits

Place large, common content at BEGINNING of prompts

code
# GOOD - cacheable prefix
<system_context>
{large_static_content_here}
</system_context>

<user_query>
{variable_user_question}
</user_query>

Send similar prompts in short time windows Implicit caching works best when requests share prefixes and arrive close together.

Explicit Caching Code

python
from google import genai
from google.genai import types

# Create a cache
cache = client.caches.create(
    model="gemini-3-flash-preview",
    contents=[
        types.Content(
            role="user",
            parts=[types.Part(text=large_document)]
        )
    ],
    ttl="3600s"  # 1 hour
)

# Use the cache
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Summarize the key points.",
    config=types.GenerateContentConfig(
        cached_content=cache.name
    )
)

Requirements

  • Minimum 2,048 tokens to cache
  • Supports: text, PDF, image, audio, video
  • Set TTL based on content relevance (default: 1 hour)
  • Storage cost = tokens x time

9. AGENTIC WORKFLOWS

Multi-Turn Context

Gemini 3 Flash tracks conversation context well, but:

  • Restate key constraints every few turns
  • Use thought signatures for function calling chains
  • Consider explicit caching for repeated context

Interactions API (For Complex Agents)

python
from google import genai

# Long-running background task
interaction = client.interactions.create(
    model="gemini-3-flash-preview",
    contents="Analyze this codebase and suggest improvements...",
    background=True  # Returns immediately with interaction ID
)

# Check status later
result = client.interactions.get(interaction.name)

Agentic System Instructions

For agents that take actions, structure system prompts around:

code
## Reasoning Strategy
- Break complex tasks into steps
- Verify information before acting
- Assess risk before state-changing operations

## Execution Rules
- Prefer read operations over writes when uncertain
- Request clarification for ambiguous instructions
- Report blockers immediately

## Output Format
- Explain reasoning before actions
- Confirm completion of each step

10. SUPPORTED TOOLS

Built-in Tools

ToolDescriptionNotes
Google SearchReal-time web groundingBilling starts Jan 5, 2026
URL ContextDeep page analysisMax 20 URLs per request
Code ExecutionRun generated codeEnables Visual Thinking
File SearchSearch uploaded files-

Combining Tools

python
from google import genai
from google.genai.types import Tool, GoogleSearch, UrlContext, CodeExecution

tools = [
    Tool(google_search=GoogleSearch()),
    Tool(url_context=UrlContext()),
    Tool(code_execution=CodeExecution())
]

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="Search for latest Python 3.13 features and write example code.",
    config=types.GenerateContentConfig(tools=tools)
)

NOT Supported (Yet)

  • Maps grounding
  • Computer use
  • Combining built-in tools with custom function calling

QUICK REFERENCE: DO NOT

MistakeWhy It's Wrong
temperature=0.7Causes looping, degraded reasoning
thinking_budget=1024Wrong param for Gemini 3 (use thinking_level)
Mix XML + MarkdownConfuses delimiter parsing
Negatives at startModel drops early constraints
Verbose promptsGemini 3 needs less, not more
Skip thought signatures400 error on function calls
Over-engineer prompts2.x techniques often backfire

QUICK REFERENCE: OPTIMAL SETTINGS BY USE CASE

Use Casethinking_levelNotes
Chat/Q&AminimalFastest response
Simple taskslowGood balance
AnalysismediumThoughtful responses
Complex reasoninghighMaximum depth
CodinghighBest accuracy
High throughputminimalMinimize latency

Sources