Prompting best practices
This guide provides prompt engineering techniques for modern instruction-following models. Treat the guidance as general best practices that you can adapt to your model, tools, and evaluation setup.
General principles
Be explicit with your instructions
Models respond well to clear, explicit instructions. Being specific about your desired output can help enhance results. If you want "above and beyond" behavior, explicitly request it rather than relying on the model to infer this from vague prompts.
<section title="Example: Creating an analytics dashboard">Less effective:
Create an analytics dashboard
More effective:
Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics to create a fully-featured implementation.
Add context to improve performance
Providing context or motivation behind your instructions can help the model better understand your goals and deliver more targeted responses.
<section title="Example: Formatting preferences">Less effective:
NEVER use ellipses
More effective:
Your response will be read aloud by a text-to-speech engine, so never use ellipses since the text-to-speech engine will not know how to pronounce them.
Models can generalize from the explanation.
Be vigilant with examples & details
Models pay close attention to details and examples. Ensure that your examples align with the behaviors you want to encourage and minimize behaviors you want to avoid.
Long-horizon reasoning and state tracking
Many modern models handle long-horizon reasoning tasks better when you make state tracking explicit. Encourage incremental progress and clear checkpoints so the model can save state and continue work across sessions.
Context awareness and multi-window workflows
Some models can track their remaining context window (i.e. "token budget") throughout a conversation, which helps them manage long tasks more effectively.
Managing context limits:
If you are using an agent harness that compacts context or allows saving context to external files, add this information to your prompt so the model can behave accordingly. Otherwise, the model may try to wrap up work as it approaches the context limit. Below is an example prompt:
Your context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely from where you left off. Therefore, do not stop tasks early due to token budget concerns. As you approach your token budget limit, save your current progress and state to memory before the context window refreshes. Always be as persistent and autonomous as possible and complete tasks fully, even if the end of your budget is approaching. Never artificially stop any task early regardless of the context remaining.
Persistent memory tools pair naturally with context awareness for seamless context transitions.
Multi-context window workflows
For tasks spanning multiple context windows:
- •
Use a different prompt for the very first context window: Use the first context window to set up a framework (write tests, create setup scripts), then use future context windows to iterate on a todo-list.
- •
Have the model write tests in a structured format: Ask the model to create tests before starting work and keep track of them in a structured format (e.g.,
tests.json). This leads to better long-term ability to iterate. Remind the model of the importance of tests: "It is unacceptable to remove or edit tests because this could lead to missing or buggy functionality." - •
Set up quality of life tools: Encourage the model to create setup scripts (e.g.,
init.sh) to gracefully start servers, run test suites, and linters. This prevents repeated work when continuing from a fresh context window. - •
Starting fresh vs compacting: When a context window is cleared, consider starting with a brand new context window rather than using compaction. Some models are effective at discovering state from the local filesystem. In some cases, you may want to take advantage of this over compaction. Be prescriptive about how it should start:
- •"Call pwd; you can only read and write files in this directory."
- •"Review progress.txt, tests.json, and the git logs."
- •"Manually run through a fundamental integration test before moving on to implementing new features."
- •
Provide verification tools: As the length of autonomous tasks grows, the model needs to verify correctness without continuous human feedback. UI testing tools and scripted verification can help.
- •
Encourage complete usage of context: Prompt the model to efficiently complete components before moving on:
This is a very long task, so it may be beneficial to plan out your work clearly. It's encouraged to spend your entire output context working on the task - just make sure you don't run out of context with significant uncommitted work. Continue working systematically until you have completed this task.
State management best practices
- •Use structured formats for state data: When tracking structured information (like test results or task status), use JSON or other structured formats to help the model understand schema requirements
- •Use unstructured text for progress notes: Freeform progress notes work well for tracking general progress and context
- •Use git for state tracking: Git provides a log of what's been done and checkpoints that can be restored. Models often do better when they can read the commit history.
- •Emphasize incremental progress: Explicitly ask the model to keep track of its progress and focus on incremental work
// Structured state file (tests.json)
{
"tests": [
{"id": 1, "name": "authentication_flow", "status": "passing"},
{"id": 2, "name": "user_management", "status": "failing"},
{"id": 3, "name": "api_endpoints", "status": "not_started"}
],
"total": 200,
"passing": 150,
"failing": 25,
"not_started": 25
}
// Progress notes (progress.txt) Session 3 progress: - Fixed authentication token validation - Updated user model to handle edge cases - Next: investigate user_management test failures (test #2) - Note: Do not remove tests as this could lead to missing functionality
Communication style
Many modern models have a more concise and natural communication style compared to older generations:
- •More direct and grounded: Provides fact-based progress reports rather than self-celebratory updates
- •More conversational: Slightly more fluent and colloquial, less machine-like
- •Less verbose: May skip detailed summaries for efficiency unless prompted otherwise
This communication style accurately reflects what has been accomplished without unnecessary elaboration.
Guidance for specific situations
Balance verbosity
Modern models tend toward efficiency and may skip verbal summaries after tool calls, jumping directly to the next action. While this creates a streamlined workflow, you may prefer more visibility into its reasoning process.
If you want the model to provide updates as it works:
After completing a task that involves tool use, provide a quick summary of the work you've done.
Tool usage patterns
Instruction-following models benefit from explicit direction to use specific tools. If you say "can you suggest some changes," the model may provide suggestions rather than implementing them—even if making changes might be what you intended.
For the model to take action, be more explicit:
<section title="Example: Explicit instructions">Less effective (model will only suggest):
Can you suggest some changes to improve this function?
More effective (model will make the changes):
Change this function to improve its performance.
Or:
Make these edits to the authentication flow.
To make the model more proactive about taking action by default, you can add this to your system prompt:
<default_to_action> By default, implement changes rather than only suggesting them. If the user's intent is unclear, infer the most useful likely action and proceed, using tools to discover any missing details instead of guessing. Try to infer the user's intent about whether a tool call (e.g., file edit or read) is intended or not, and act accordingly. </default_to_action>
On the other hand, if you want the model to be more hesitant by default, less prone to jumping straight into implementations, and only take action if requested, you can steer this behavior with a prompt like the below:
<do_not_act_before_instructions> Do not jump into implementatation or changes files unless clearly instructed to make changes. When the user's intent is ambiguous, default to providing information, doing research, and providing recommendations rather than taking action. Only proceed with edits, modifications, or implementations when the user explicitly requests them. </do_not_act_before_instructions>
Tool usage and triggering
Some newer models are more responsive to the system prompt than previous models. If your prompts were designed to reduce undertriggering on tools or skills, these models may now overtrigger. The fix is to dial back any aggressive language. Where you might have said "CRITICAL: You MUST use this tool when...", you can use more normal prompting like "Use this tool when...".
Balancing autonomy and safety
Without guidance, some models may take actions that are difficult to reverse or affect shared systems, such as deleting files, force-pushing, or posting to external services. If you want the model to confirm before taking potentially risky actions, add guidance to your prompt:
Consider the reversibility and potential impact of your actions. You are encouraged to take local, reversible actions like editing files or running tests, but for actions that are hard to reverse, affect shared systems, or could be destructive, ask the user before proceeding. Examples of actions that warrant confirmation: - Destructive operations: deleting files or branches, dropping database tables, rm -rf - Hard to reverse operations: git push --force, git reset --hard, amending published commits - Operations visible to others: pushing code, commenting on PRs/issues, sending messages, modifying shared infrastructure When encountering obstacles, do not use destructive actions as a shortcut. For example, don't bypass safety checks (e.g. --no-verify) or discard unfamiliar files that may be in-progress work.
Overthinking and excessive thoroughness
Some models do significantly more upfront exploration than previous models, especially at higher reasoning settings. This initial work often helps to optimize the final results, but the model may gather extensive context or pursue multiple threads of research without being prompted. If your prompts previously encouraged the model to be more thorough, you should tune that guidance:
- •Replace blanket defaults with more targeted instructions. Instead of "Default to using [tool]," add guidance like "Use [tool] when it would enhance your understanding of the problem."
- •Remove over-prompting. Tools that undertriggered in previous models are likely to trigger appropriately now. Instructions like "If in doubt, use [tool]" will cause overtriggering.
- •Use reasoning settings as a fallback. If the model continues to be overly aggressive, use a lower reasoning setting or budget if available.
In some cases, a model may think extensively, which can inflate thinking tokens and slow down responses. If this behavior is undesirable, you can add explicit instructions to constrain its reasoning, or you can lower the reasoning setting to reduce overall thinking and token usage.
When you're deciding how to approach a problem, choose an approach and commit to it. Avoid revisiting decisions unless you encounter new information that directly contradicts your reasoning. If you're weighing two approaches, pick one and see it through. You can always course-correct later if the chosen approach fails.
Control the format of responses
There are a few ways that we have found to be particularly effective in steering output formatting:
- •
Tell the model what to do instead of what not to do
- •Instead of: "Do not use markdown in your response"
- •Try: "Your response should be composed of smoothly flowing prose paragraphs."
- •
Use XML format indicators
- •Try: "Write the prose sections of your response in <smoothly_flowing_prose_paragraphs> tags."
- •
Match your prompt style to the desired output
The formatting style used in your prompt may influence the model's response style. If you are still experiencing steerability issues with output formatting, we recommend matching your prompt style to your desired output style. For example, removing markdown from your prompt can reduce the volume of markdown in the output.
- •
Use detailed prompts for specific formatting preferences
For more control over markdown and formatting usage, provide explicit guidance:
<avoid_excessive_markdown_and_bullet_points> When writing reports, documents, technical explanations, analyses, or any long-form content, write in clear, flowing prose using complete paragraphs and sentences. Use standard paragraph breaks for organization and reserve markdown primarily for `inline code`, code blocks (```...```), and simple headings (###, and ###). Avoid using **bold** and *italics*. DO NOT use ordered lists (1. ...) or unordered lists (*) unless : a) you're presenting truly discrete items where a list format is the best option, or b) the user explicitly requests a list or ranking Instead of listing items with bullets or numbers, incorporate them naturally into sentences. This guidance applies especially to technical writing. Using prose instead of excessive formatting will improve user satisfaction. NEVER output a series of overly short bullet points. Your goal is readable, flowing text that guides the reader naturally through ideas rather than fragmenting information into isolated points. </avoid_excessive_markdown_and_bullet_points>
Research and information gathering
Many modern models demonstrate strong agentic search capabilities and can find and synthesize information from multiple sources effectively. For optimal research results:
- •
Provide clear success criteria: Define what constitutes a successful answer to your research question
- •
Encourage source verification: Ask the model to verify information across multiple sources
- •
For complex research tasks, use a structured approach:
Search for this information in a structured way. As you gather data, develop several competing hypotheses. Track your confidence levels in your progress notes to improve calibration. Regularly self-critique your approach and plan. Update a hypothesis tree or research notes file to persist information and provide transparency. Break down this complex research task systematically.
This structured approach allows the model to find and synthesize virtually any piece of information and iteratively critique its findings, no matter the size of the corpus.
Subagent orchestration
Many modern models demonstrate improved native subagent orchestration capabilities. These models can recognize when tasks would benefit from delegating work to specialized subagents and do so proactively without requiring explicit instruction.
To take advantage of this behavior:
- •Ensure well-defined subagent tools: Have subagent tools available and described in tool definitions
- •Let the model orchestrate naturally: The model will delegate appropriately without explicit instruction
- •Watch for overuse: Some models may overuse subagents in situations where a simpler, direct approach would suffice. For example, the model may spawn subagents for code exploration when a direct search is faster and sufficient.
If you're seeing excessive subagent use, add explicit guidance about when subagents are and aren't warranted:
Use subagents when tasks can run in parallel, require isolated context, or involve independent workstreams that don't need to share state. For simple tasks, sequential operations, single-file edits, or tasks where you need to maintain context across steps, work directly rather than delegating.
Model self-knowledge
If you would like the model to identify itself correctly in your application or use specific API strings:
The assistant is [model name], created by [organization]. The current model is [version].
For LLM-powered apps that need to specify model strings:
When an LLM is needed, please default to [model name] unless the user requests otherwise. The exact model string is [model-id].
Thinking sensitivity
When extended reasoning is disabled, some models are sensitive to the word "think" and its variants. Consider replacing "think" with alternatives like "consider" or "evaluate."
Leverage thinking & interleaved thinking capabilities
Many models offer reasoning capabilities that can be especially helpful for tasks involving reflection after tool use or complex multi-step reasoning. You can guide their reasoning for better results.
Some platforms support adaptive or automatic reasoning modes, while others use manual reasoning budgets. Choose the mode that best matches your task complexity.
You can guide the model's reasoning behavior:
After receiving tool results, carefully reflect on their quality and determine optimal next steps before proceeding. Use your thinking to plan and iterate based on this new information, and then take the best next action.
If you find the model reasoning more often than you'd like, which can happen with large or complex system prompts, add guidance to steer it:
Extended thinking adds latency and should only be used when it will meaningfully improve answer quality - typically for problems that require multi-step reasoning. When in doubt, respond directly.
If you are migrating between reasoning modes, update your configuration to match the new mode and tune the reasoning level based on task complexity.
Document creation
Many modern models excel at creating presentations, animations, and visual documents with strong instruction following. The models can produce polished, usable output on the first try in most cases.
For best results with document creation:
Create a professional presentation on [topic]. Include thoughtful design elements, visual hierarchy, and engaging animations where appropriate.
Improved vision capabilities
Many modern models have improved vision capabilities compared to older models. They perform better on image processing and data extraction tasks, particularly when there are multiple images present in context. These improvements carry over to computer use, where models can more reliably interpret screenshots and UI elements. You can also use these models to analyze videos by breaking them up into frames.
One effective technique is to provide a crop or zoom tool so the model can focus on relevant regions of an image.
Optimize parallel tool calling
Many models excel at parallel tool execution. They may:
- •Run multiple speculative searches during research
- •Read several files at once to build context faster
- •Execute bash commands in parallel (which can even bottleneck system performance)
This behavior is easily steerable. While the model has a high success rate in parallel tool calling without prompting, you can boost this to ~100% or adjust the aggression level:
<use_parallel_tool_calls> If you intend to call multiple tools and there are no dependencies between the tool calls, make all of the independent tool calls in parallel. Prioritize calling tools simultaneously whenever the actions can be done in parallel rather than sequentially. For example, when reading 3 files, run 3 tool calls in parallel to read all 3 files into context at the same time. Maximize use of parallel tool calls where possible to increase speed and efficiency. However, if some tool calls depend on previous calls to inform dependent values like the parameters, do NOT call these tools in parallel and instead call them sequentially. Never use placeholders or guess missing parameters in tool calls. </use_parallel_tool_calls>
Execute operations sequentially with brief pauses between each step to ensure stability.
Reduce file creation in agentic coding
Some models may create new files for testing and iteration purposes, particularly when working with code. This approach lets the model use files, especially scripts, as a temporary scratchpad before saving final output. Using temporary files can improve outcomes for agentic coding use cases.
If you'd prefer to minimize net new file creation, you can instruct the model to clean up after itself:
If you create any temporary new files, scripts, or helper files for iteration, clean up these files by removing them at the end of the task.
Overeagerness
Some models have a tendency to overengineer by creating extra files, adding unnecessary abstractions, or building in flexibility that wasn't requested. If you're seeing this undesired behavior, add specific guidance to keep solutions minimal.
For example:
Avoid over-engineering. Only make changes that are directly requested or clearly necessary. Keep solutions simple and focused: - Scope: Don't add features, refactor code, or make "improvements" beyond what was asked. A bug fix doesn't need surrounding code cleaned up. A simple feature doesn't need extra configurability. - Documentation: Don't add docstrings, comments, or type annotations to code you didn't change. Only add comments where the logic isn't self-evident. - Defensive coding: Don't add error handling, fallbacks, or validation for scenarios that can't happen. Trust internal code and framework guarantees. Only validate at system boundaries (user input, external APIs). - Abstractions: Don't create helpers, utilities, or abstractions for one-time operations. Don't design for hypothetical future requirements. The right amount of complexity is the minimum needed for the current task.
Frontend design
Many models excel at building complex, real-world web applications with strong frontend design. However, without guidance, models can default to generic patterns that create what users call the "AI slop" aesthetic. To create distinctive, creative frontends that surprise and delight:
Here's a system prompt snippet you can use to encourage better frontend design:
<frontend_aesthetics> You tend to converge toward generic, "on distribution" outputs. In frontend design, this creates what users call the "AI slop" aesthetic. Avoid this: make creative, distinctive frontends that surprise and delight. Focus on: - Typography: Choose fonts that are beautiful, unique, and interesting. Avoid generic fonts like Arial and Inter; opt instead for distinctive choices that elevate the frontend's aesthetics. - Color & Theme: Commit to a cohesive aesthetic. Use CSS variables for consistency. Dominant colors with sharp accents outperform timid, evenly-distributed palettes. Draw from IDE themes and cultural aesthetics for inspiration. - Motion: Use animations for effects and micro-interactions. Prioritize CSS-only solutions for HTML. Use Motion library for React when available. Focus on high-impact moments: one well-orchestrated page load with staggered reveals (animation-delay) creates more delight than scattered micro-interactions. - Backgrounds: Create atmosphere and depth rather than defaulting to solid colors. Layer CSS gradients, use geometric patterns, or add contextual effects that match the overall aesthetic. Avoid generic AI-generated aesthetics: - Overused font families (Inter, Roboto, Arial, system fonts) - Clichéd color schemes (particularly purple gradients on white backgrounds) - Predictable layouts and component patterns - Cookie-cutter design that lacks context-specific character Interpret creatively and make unexpected choices that feel genuinely designed for the context. Vary between light and dark themes, different fonts, different aesthetics. You still tend to converge on common choices (Space Grotesk, for example) across generations. Avoid this: it is critical that you think outside the box! </frontend_aesthetics>
You can also include a dedicated frontend design skill or style guide to reinforce the desired aesthetic.
Avoid focusing on passing tests and hard-coding
Models can sometimes focus too heavily on making tests pass at the expense of more general solutions, or may use workarounds like helper scripts for complex refactoring instead of using standard tools directly. To prevent this behavior and ensure robust, generalizable solutions:
Please write a high-quality, general-purpose solution using the standard tools available. Do not create helper scripts or workarounds to accomplish the task more efficiently. Implement a solution that works correctly for all valid inputs, not just the test cases. Do not hard-code values or create solutions that only work for specific test inputs. Instead, implement the actual logic that solves the problem generally. Focus on understanding the problem requirements and implementing the correct algorithm. Tests are there to verify correctness, not to define the solution. Provide a principled implementation that follows best practices and software design principles. If the task is unreasonable or infeasible, or if any of the tests are incorrect, please inform me rather than working around them. The solution should be robust, maintainable, and extendable.
Minimizing hallucinations in agentic coding
To encourage grounded answers and minimize hallucinations:
<investigate_before_answering> Never speculate about code you have not opened. If the user references a specific file, you MUST read the file before answering. Make sure to investigate and read relevant files BEFORE answering questions about the codebase. Never make any claims about code before investigating unless you are certain of the correct answer - give grounded and hallucination-free answers. </investigate_before_answering>
Migrating away from prefilled responses
Some platforms are moving away from prefilled responses as model instruction following improves. If prefills are deprecated in your stack, you can replace them with explicit instructions and schema-based outputs.
Here are common prefill scenarios and how to migrate away from them:
<section title="Controlling output formatting">Prefills have been used to force specific output formats like JSON/YAML, classification, and similar patterns where the prefill constrains the model to a particular structure.
Migration: Use structured outputs or tool-call schemas to constrain responses. Many models can reliably match complex schemas when told to, especially if implemented with retries. For classification tasks, use tools with enum fields or schema-based outputs.
</section> <section title="Eliminating preambles">Prefills like Here is the requested summary:\n were used to skip introductory text.
Migration: Use direct instructions in the system prompt: "Respond directly without preamble. Do not start with phrases like 'Here is...', 'Based on...', etc." Alternatively, direct the model to output within XML tags, use structured outputs, or use tool calling. If the occasional preamble slips through, strip it in post-processing.
</section> <section title="Avoiding bad refusals">Prefills were used to steer around unnecessary refusals.
Migration: Many models are better at appropriate refusals now. Clear prompting within the user message without prefill should be sufficient.
Prefills were used to continue partial completions, resume interrupted responses, or pick up where a previous generation left off.
Migration: Move the continuation to the user message, and include the final text from the interrupted response: "Your previous response was interrupted and ended with `[previous_response]`. Continue from where you left off." If this is part of error-handling or incomplete-response-handling and there is no UX penalty, retry the request.
</section> <section title="Context hydration and role consistency">Prefills were used to periodically ensure refreshed or injected context.
Migration: For very long conversations, inject what were previously prefilled-assistant reminders into the user turn. If context hydration is part of a more complex agentic system, consider hydrating via tools (expose or encourage use of tools containing context based on heuristics such as number of turns) or during context compaction.
</section>LaTeX output
Some models default to LaTeX for mathematical expressions, equations, and technical explanations. If you prefer plain text, add the following instructions to your prompt:
Format your response in plain text only. Do not use LaTeX, MathJax, or any markup notation such as \( \), $, or \frac{}{}. Write all math expressions using standard text characters (e.g., "/" for division, "*" for multiplication, and "^" for exponents).
Migration considerations
When migrating between model families or versions:
- •
Be specific about desired behavior: Consider describing exactly what you'd like to see in the output.
- •
Frame your instructions with modifiers: Adding modifiers that encourage the model to increase the quality and detail of its output can help better shape performance. For example, instead of "Create an analytics dashboard", use "Create an analytics dashboard. Include as many relevant features and interactions as possible. Go beyond the basics to create a fully-featured implementation."
- •
Request specific features explicitly: Animations and interactive elements should be requested explicitly when desired.
- •
Update reasoning configuration: If your platform exposes reasoning depth or budget, tune it to match task complexity.
- •
Migrate away from prefilled responses: Replace prefills with explicit instructions or schema-based outputs.
- •
Tune anti-laziness prompting: If your prompts previously encouraged the model to be more thorough or use tools more aggressively, dial back that guidance. Some newer models are significantly more proactive and may overtrigger on instructions that were needed for older models.