Prompt Scaffold
code
[ROLE] Who the model is / expertise [CONTEXT] Background, domain constraints [TASK] One clear instruction [FORMAT] Output structure with example [CONSTRAINTS] Boundaries, edge cases, what to avoid [EXAMPLES] Few-shot demonstrations (if needed)
Claude: use XML tags. GPT: markdown headers. Small models: minimal structure, maximum explicitness.
Dual-Model Adaptation
For Small Models (4B-8B Ollama)
- •Max 1000 tokens prompt length. Cut ruthlessly.
- •One task per prompt. No compound instructions.
- •Complete worked example is mandatory — the model mimics format.
- •Every enum value listed explicitly:
quality: one of "excellent", "good", "fair", "poor" - •No chain-of-thought unless tested (often hurts structured output).
- •Positive instructions: "Write X" not "Don't do Y."
- •Sandwich: format spec at start AND end.
For Large Models (GPT-5, Claude)
- •Multi-section prompts with nuanced instructions.
- •
<thinking>tags for chain-of-thought (Claude). - •Few-shot optional — large models generalize from descriptions.
- •Can handle "Don't X unless Y" conditional constraints.
- •System prompt for persona, user prompt for task.
Few-Shot Design
- •3-5 examples. More than 7 has diminishing returns.
- •Include at least one edge case example.
- •Show reasoning, not just input→output.
- •Order: easy → medium → edge case.
- •For small models: examples are the prompt. They define the contract.
Structured Output
- •Provide the exact schema with field descriptions.
- •Include a complete worked example (JSON, ready to copy).
- •Specify handling for missing/ambiguous fields.
- •For enums, list ALL valid values in the prompt.
- •Show good AND bad examples for critical fields:
code
## Dilemma ID Naming (CRITICAL) GOOD: `host_benevolent_or_self_serving` BAD: `host_motivation`
Defensive Patterns
- •Sandwich: Repeat critical instructions at start and end.
- •Validate → Feedback → Repair: Validate output, format structured errors, ask model to fix.
- •Discuss → Freeze → Serialize: Separate exploration from structured output generation.
- •Anti-pleasantry: "Do NOT end with 'Good luck!' or similar."
Systematic Testing
- •Define success criteria before iterating.
- •Test set of 10-20 diverse inputs including edge cases.
- •Change one thing at a time.
- •Track: prompt version, model, temperature, pass rate, failure modes.
- •Failure taxonomy: wrong format, hallucination, refusal, partial output, off-topic.