Prompt Engineering
Purpose
Design, evaluate, and version system prompts for LLM-powered features, including instruction structure, chain-of-thought patterns, output format constraints, and few-shot example selection.
Inputs
- •Feature requirements (what the LLM should do)
- •Input data format and examples
- •Desired output format and constraints
- •Quality requirements (accuracy, consistency, tone)
- •Cost and latency constraints (model selection guidance)
Process
Step 1: Define the Task Precisely
Before writing a prompt, articulate:
- •Input: What exactly does the model receive? (user message, context, data)
- •Output: What exactly should it produce? (classification, generation, extraction, transformation)
- •Constraints: What must it never do? (hallucinate facts, reveal system prompt, produce PII)
- •Edge cases: What happens with empty input, adversarial input, ambiguous input?
Step 2: Structure the System Prompt
Use a layered structure:
- •Identity and role: Who is the model in this context?
- •Task description: What it's being asked to do (one paragraph, precise)
- •Constraints and rules: Hard rules it must follow (numbered list)
- •Output format: Exact structure of the expected output (with template)
- •Few-shot examples: 2-3 input/output pairs showing ideal behavior
- •Edge case handling: What to do when uncertain, off-topic, or missing data
Step 3: Design Chain-of-Thought (if applicable)
For complex reasoning tasks:
- •Explicit CoT instruction: "Think step by step before answering"
- •Structured CoT: Define the reasoning steps (e.g., "1. Identify the entities, 2. Determine relationships, 3. Synthesize answer")
- •Hidden CoT: Instruct model to reason in a
<thinking>block, then provide the answer separately - •When to skip CoT: Classification tasks, simple extraction, and format conversion rarely benefit
Step 4: Design Output Format
Specify the exact output structure:
- •JSON output: Provide a schema with required fields, types, and constraints
- •Markdown output: Provide a template with section headers
- •Structured extraction: Define the fields and valid values explicitly
- •Validation: How will the output be parsed? Design the format for reliable parsing.
Step 5: Select and Craft Few-Shot Examples
Choose examples strategically:
- •Cover the range: Include examples representing different input types
- •Include edge cases: Show desired behavior for tricky inputs
- •Show the format: Examples should match the exact output format
- •Keep it minimal: 2-3 examples are usually enough; more can confuse
- •Order matters: Put the most representative example last (recency bias)
Step 6: Design Versioning Strategy
Plan for prompt evolution:
- •Version identifier: Semantic versioning (v1.0, v1.1, v2.0)
- •Storage: Version-controlled alongside code (not in a database, not hardcoded)
- •A/B testing: How to run two prompt versions simultaneously
- •Rollback: How to revert to a previous version quickly
- •Changelog: What changed and why, linked to eval results
Output Format
markdown
# Prompt Design: [Feature Name] ## Task Definition **Input:** [Description + example] **Output:** [Description + format] **Constraints:** [Hard rules] ## System Prompt (v1.0)
[Full system prompt text]
code
## Few-Shot Examples
### Example 1
**Input:** [Example input]
**Expected output:** [Example output]
### Example 2
**Input:** [Edge case input]
**Expected output:** [Edge case output]
## Chain-of-Thought Strategy
[Whether CoT is used, what the reasoning structure looks like]
## Output Schema
```json
{
"field1": "string (required) — description",
"field2": "number (optional) — description"
}
Prompt Versioning
| Version | Date | Change | Eval Score |
|---|---|---|---|
| v1.0 | [Date] | Initial | [Score] |
Model Recommendation
Recommended: [Model] at [temperature] Rationale: [Why this model for this task] Cost estimate: [$/1K requests]
code
## Quality Checks - [ ] System prompt has a clear role, task, constraints, format, and examples - [ ] Output format is designed for reliable programmatic parsing - [ ] Few-shot examples cover normal cases and at least one edge case - [ ] Constraints address prompt injection, hallucination, and off-topic input - [ ] Prompt is stored in version control with a changelog - [ ] Model and temperature are justified for the task requirements ## Evolution Notes <!-- Observations appended after each use -->