Codex Execution Plans (ExecPlans)

This document describes the requirements for an execution plan ("ExecPlan"), a design document that a coding agent can follow to deliver a working feature or system change. Treat the reader as a complete beginner to this repository: they have only the current working tree and the single ExecPlan file you provide. There is no memory of prior plans and no external context.

These plans are thorough design documents, and "living documents." As a user, you can use these documents to verify the approach before a long implementation process begins. The particular format described here has enabled agents to work for more than seven hours from a single prompt.

How to Use ExecPlans

When authoring an ExecPlan, follow this skill TO THE LETTER. If it is not in your context, refresh your memory by reading this entire file. Be thorough in reading (and re-reading) source material to produce an accurate specification. When creating a spec, start from the skeleton generated by scripts/init_plan.py and flesh it out as you do your research.

When implementing an ExecPlan, do not prompt the user for "next steps"; simply proceed to the next milestone. Keep all sections up to date, add or split entries in the list at every stopping point to affirmatively state the progress made and next steps. Resolve ambiguities autonomously, and commit frequently.

When discussing an ExecPlan, record decisions in a log in the spec for posterity; it should be unambiguously clear why any change to the specification was made. ExecPlans are living documents, and it should always be possible to restart from ONLY the ExecPlan and no other work.

When researching a design with challenging requirements or significant unknowns, use milestones to implement proof of concepts, "toy implementations", etc., that allow validating whether the user's proposal is feasible. Read the source code of libraries by finding or acquiring them, research deeply, and include prototypes to guide a fuller implementation.

NON-NEGOTIABLE REQUIREMENTS

These requirements admit no exceptions:

•
Every ExecPlan must be fully self-contained. Self-contained means that in its current form it contains all knowledge and instructions needed for a novice to succeed.
•
Every ExecPlan is a living document. Contributors are required to revise it as progress is made, as discoveries occur, and as design decisions are finalized. Each revision must remain fully self-contained.
•
Every ExecPlan must enable a complete novice to implement the feature end-to-end without prior knowledge of this repo.
•
Every ExecPlan must produce a demonstrably working behavior, not merely code changes to "meet a definition."
•
Every ExecPlan must define every term of art in plain language or do not use it.

Purpose and Intent Come First

Purpose and intent come first. Begin by explaining, in a few sentences, why the work matters from a user's perspective: what someone can do after this change that they could not do before, and how to see it working. Then guide the reader through the exact steps to achieve that outcome, including what to edit, what to run, and what they should observe.

The agent executing your plan can list files, read files, search, run the project, and run tests. It does not know any prior context and cannot infer what you meant from earlier milestones. Repeat any assumption you rely on. Do not point to external blogs or docs; if knowledge is required, embed it in the plan itself in your own words. If an ExecPlan builds upon a prior ExecPlan and that file is checked in, incorporate it by reference. If it is not, you must include all relevant context from that plan.

Self-Containment and Plain Language

Self-containment and plain language are paramount. If you introduce a phrase that is not ordinary English ("daemon", "middleware", "RPC gateway", "filter graph"), define it immediately and remind the reader how it manifests in this repository (for example, by naming the files or commands where it appears). Do not say "as defined previously" or "according to the architecture doc." Include the needed explanation here, even if you repeat yourself.

Avoid Common Failure Modes

Avoid common failure modes:

•Do not rely on undefined jargon.
•Do not describe "the letter of a feature" so narrowly that the resulting code compiles but does nothing meaningful.
•Do not outsource key decisions to the reader. When ambiguity exists, resolve it in the plan itself and explain why you chose that path.
•Err on the side of over-explaining user-visible effects and under-specifying incidental implementation details.

The validator script (scripts/validate_plan.py) detects many failure patterns automatically. Run it before starting implementation and after major revisions. See references/failure-modes.md for detailed examples of what to avoid.

Anchor the Plan with Observable Outcomes

Anchor the plan with observable outcomes. State what the user can do after implementation, the commands to run, and the outputs they should see. Acceptance should be phrased as behavior a human can verify ("after starting the server, navigating to http://localhost:8080/health returns HTTP 200 with body OK") rather than internal attributes ("added a HealthCheck struct"). If a change is internal, explain how its impact can still be demonstrated (for example, by running tests that fail before and pass after, and by showing a scenario that uses the new behavior).

Specify Repository Context Explicitly

Specify repository context explicitly. Name files with full repository-relative paths, name functions and modules precisely, and describe where new files should be created. If touching multiple areas, include a short orientation paragraph that explains how those parts fit together so a novice can navigate confidently. When running commands, show the working directory and exact command line. When outcomes depend on environment, state the assumptions and provide alternatives when reasonable.

Be Idempotent and Safe

Be idempotent and safe. Write the steps so they can be run multiple times without causing damage or drift. If a step can fail halfway, include how to retry or adapt. If a migration or destructive operation is necessary, spell out backups or safe fallbacks. Prefer additive, testable changes that can be validated as you go.

Validation Is Not Optional

Validation is not optional. Include instructions to run tests, to start the system if applicable, and to observe it doing something useful. Describe comprehensive testing for any new features or capabilities. Include expected outputs and error messages so a novice can tell success from failure. Where possible, show how to prove that the change is effective beyond compilation (for example, through a small end-to-end scenario, a CLI invocation, or an HTTP request/response transcript). State the exact test commands appropriate to the project's toolchain and how to interpret their results.

Capture Evidence

Capture evidence. When your steps produce terminal output, short diffs, or logs, include them inside the plan as indented examples. Keep them concise and focused on what proves success. If you need to include a patch, prefer file-scoped diffs or small excerpts that a reader can recreate by following your instructions rather than pasting large blobs.

Milestones

Milestones are narrative, not bureaucracy. If you break the work into milestones, introduce each with a brief paragraph that describes the scope, what will exist at the end of the milestone that did not exist before, the commands to run, and the acceptance you expect to observe. Keep it readable as a story: goal, work, result, proof.

Progress and milestones are distinct: milestones tell the story, progress tracks granular work. Both must exist. Never abbreviate a milestone merely for the sake of brevity, do not leave out details that could be crucial to a future implementation.

Each milestone must be independently verifiable and incrementally implement the overall goal of the execution plan.

Living Document Sections

ExecPlans are living documents. As you make key design decisions, update the plan to record both the decision and the thinking behind it.

ExecPlans must contain and maintain a Progress section, a Surprises & Discoveries section, a Decision Log, and an Outcomes & Retrospective section. These are not optional.

Progress

Use a list with checkboxes to summarize granular steps. Every stopping point must be documented here, even if it requires splitting a partially completed task into two ("done" vs. "remaining"). This section must always reflect the actual current state of the work.

code

- [x] (2025-01-20 13:00Z) Example completed step.
- [ ] Example incomplete step.
- [ ] Example partially completed step (completed: X; remaining: Y).

Use timestamps to measure rates of progress. Use scripts/parse_progress.py to extract and summarize progress when resuming work.

Surprises & Discoveries

When you discover optimizer behavior, performance tradeoffs, unexpected bugs, or inverse/unapply semantics that shaped your approach, capture those observations in the Surprises & Discoveries section with short evidence snippets (test output is ideal).

code

- Observation: jsonwebtoken.verify() throws on expired tokens instead of returning null
  Evidence: TokenExpiredError: jwt expired

Decision Log

Record every decision made while working on the plan in the format:

code

- Decision: Use HS256 algorithm instead of RS256
  Rationale: Single-server deployment; no cross-service verification needed
  Date/Author: 2025-01-20

If you change course mid-implementation, document why in the Decision Log and reflect the implications in Progress. Plans are guides for the next contributor as much as checklists for you.

Outcomes & Retrospective

At completion of a major task or the full plan, write an Outcomes & Retrospective entry summarizing what was achieved, what remains, and lessons learned.

Prototyping Milestones and Parallel Implementations

It is acceptable—and often encouraged—to include explicit prototyping milestones when they de-risk a larger change. Examples: adding a low-level operator to a dependency to validate feasibility, or exploring two composition orders while measuring optimizer effects. Keep prototypes additive and testable. Clearly label the scope as "prototyping"; describe how to run and observe results; and state the criteria for promoting or discarding the prototype.

Prefer additive code changes followed by subtractions that keep tests passing. Parallel implementations (e.g., keeping an adapter alongside an older path during migration) are fine when they reduce risk or enable tests to continue passing during a large migration. Describe how to validate both paths and how to retire one safely with tests.

When working with multiple new libraries or feature areas, consider creating spikes that evaluate the feasibility of these features INDEPENDENTLY of one another, proving that the external library performs as expected and implements the features we need in isolation.

Formatting

Format and envelope are simple and strict. Each ExecPlan must be one Markdown file. Use proper Markdown syntax: use # and ## and so on, correct syntax for ordered and unordered lists, and two newlines after every heading.

Write in plain prose. Prefer sentences over lists. Avoid checklists, tables, and long enumerations unless brevity would obscure meaning. Checklists are permitted only in the Progress section, where they are mandatory. Narrative sections must remain prose-first.

When you need to show commands, transcripts, diffs, or code, present them as indented blocks (4 spaces) within the plan. Use indentation for clarity.

Creating an ExecPlan

Step 1: Initialize

Create a new plan with proper structure:

code

python scripts/init_plan.py .plans/feature-name.md "Short Title Describing the Work"

Or with a purpose statement:

code

python scripts/init_plan.py .plans/feature-name.md "Add JWT Auth" --purpose "Enable users to log in"

Step 2: Research

Before filling in the plan:

•List files and directories to understand current state
•Read relevant source files to understand existing patterns
•Search for related code, tests, and documentation
•Identify unknowns and risks
•If external libraries are involved, read their source or documentation

Step 3: Author

Fill in each section of the generated plan. Key requirements:

•Purpose must state observable user-visible behavior
•Context must name files with full repository-relative paths
•Every term of art must be defined with its manifestation in this repo
•Milestones must have specific acceptance criteria with commands and expected output

See references/authoring-guide.md for detailed requirements on each section.

Step 4: Validate

Before starting implementation, run the validator:

code

python scripts/validate_plan.py .plans/feature-name.md

The validator checks for:

•Missing required sections (Purpose, Progress, Surprises, Decision Log, Outcomes, Context, Validation)
•External references ("as discussed", "see the architecture doc", "per the RFC")
•Unresolved placeholders ("TBD", "to be determined")
•Vague instructions ("as needed", "configure appropriately")
•Outsourced decisions ("use best judgment", "choose an appropriate")
•Vague acceptance ("tests should pass" without specifics, "should work correctly")
•Progress items without timestamps
•Jargon without definitions

Fix all errors. Review warnings. Re-validate until clean.

For strict validation (treats warnings as errors):

code

python scripts/validate_plan.py .plans/feature-name.md --strict

Executing an ExecPlan

Check Progress

When resuming work, check where things left off:

code

python scripts/parse_progress.py .plans/feature-name.md

This shows completed items with timestamps, remaining work, partially completed items, and what to do next.

Autonomous Execution

Once execution begins, DO NOT prompt the user for "next steps" or "should I continue." Simply proceed to the next milestone until completion or a blocking problem.

At Every Stopping Point

•Update Progress with completed steps (checked, with timestamps) and remaining steps (unchecked)
•If a step was partially completed, split it into "done" and "remaining"
•Record any discoveries in Surprises & Discoveries with evidence
•Record any decisions in Decision Log with rationale
•Commit if appropriate

Handling Ambiguity

Resolve ambiguities autonomously:

•Choose a path and document the rationale in the Decision Log
•Prefer additive, testable changes
•If two approaches are viable, note both and explain the choice
•If an approach fails, document why and pivot

Commit Discipline

Commit frequently:

•After each milestone completion
•After significant Progress updates
•Before risky operations
•With clear messages referencing the ExecPlan

Updating a Plan

When you revise a plan, you must ensure your changes are comprehensively reflected across all sections, including the living document sections, and you must write a note at the bottom of the plan describing the change and the reason why. ExecPlans must describe not just the what but the why for almost everything.

After revisions, re-run the validator to ensure no regressions:

code

python scripts/validate_plan.py .plans/feature-name.md

The Bar

If you follow the guidance above, a single, stateless agent—or a human novice—can read your ExecPlan from top to bottom and produce a working, observable result. That is the bar:

SELF-CONTAINED, SELF-SUFFICIENT, NOVICE-GUIDING, OUTCOME-FOCUSED.

Bundled Scripts

`scripts/init_plan.py`

Creates a new ExecPlan with proper structure, all required sections, and timestamps:

code

python scripts/init_plan.py <output.md> "Title" [--purpose "Purpose statement"]

Always use this instead of manually copying templates.

`scripts/validate_plan.py`

Validates plan structure and detects common failure modes:

code

python scripts/validate_plan.py <plan.md>          # Normal mode
python scripts/validate_plan.py <plan.md> --strict # Treat warnings as errors

Fix all errors before starting implementation. Re-validate after major revisions.

`scripts/parse_progress.py`

Extracts and summarizes the Progress section:

code

python scripts/parse_progress.py <plan.md>        # Human-readable summary
python scripts/parse_progress.py <plan.md> --json # Machine-readable output

Use when resuming work to see status and next steps.

Bundled References

•references/authoring-guide.md - Detailed requirements for each section of an ExecPlan
•references/failure-modes.md - Common failure patterns with detailed examples and fixes
•assets/TEMPLATE.md - Manual template (prefer scripts/init_plan.py instead)