Long-Running Agent Framework

Framework for enabling AI agents to work effectively across many context windows on complex tasks.

Core Problem

Long-running agents must work in discrete sessions where each new session begins with no memory of previous work. Without proper scaffolding, agents tend to:

•One-shot attempts - Try to complete everything at once, running out of context mid-implementation
•Premature completion - See partial progress and declare the job done
•Undocumented states - Leave code in broken or undocumented states between sessions

Two-Agent Solution

1. Initializer Agent (First Session Only)

Sets up the environment with all context future agents need:

•Create init.sh script for environment setup
•Generate comprehensive feature_list.json with all requirements
•Initialize claude-progress.txt for session logging
•Make initial git commit

See references/initializer-prompt.md for the full prompt template.

2. Coding Agent (Every Subsequent Session)

Makes incremental progress while maintaining clean state:

•Read progress files and git logs to get bearings
•Run basic tests to verify working state
•Work on ONE feature at a time
•Test end-to-end before marking complete
•Commit progress with descriptive messages
•Update progress file

See references/coding-prompt.md for the full prompt template.

Session Startup Sequence

Every coding agent session should begin:

code

1. pwd                              # Understand working directory
2. cat claude-progress.txt          # Read recent progress
3. cat feature_list.json            # Check feature status
4. git log --oneline -20            # Review recent commits
5. ./init.sh                        # Start dev environment
6. <run basic test>                 # Verify app works
7. <select next feature>            # Choose one failing feature

Key Files

feature_list.json

Comprehensive list of all features with pass/fail status. Use JSON format to prevent inappropriate edits.

json

{
  "features": [
    {
      "category": "functional",
      "description": "User can create new chat",
      "steps": ["Navigate to main", "Click New Chat", "Verify creation"],
      "passes": false
    }
  ]
}

Template: assets/feature_list_template.json

claude-progress.txt

Session-by-session log of work completed. Each entry includes:

•Session timestamp
•Features worked on
•Changes made
•Current state
•Next steps

Template: assets/progress_template.md

init.sh

Environment setup script that:

•Installs dependencies
•Starts development servers
•Sets up any required services

Critical Rules

For Feature List

•Never remove or edit test descriptions
•Only change passes field status
•Mark as passing ONLY after end-to-end verification

For Progress Tracking

•Always commit before session end
•Write descriptive commit messages
•Update progress file with summary
•Leave environment in mergeable state

For Testing

•Use browser automation for web apps (Puppeteer MCP)
•Test as a human user would
•Verify end-to-end, not just unit tests
•Document any known limitations

Common Failure Modes & Solutions

Problem	Solution
Agent one-shots entire project	Create detailed feature list, work one at a time
Declares victory too early	Check feature_list.json for failing tests
Leaves broken state	Run basic test at session start, fix first
Marks features done prematurely	Require end-to-end browser testing
Wastes time figuring out setup	Read init.sh, use established patterns

Adapting to Other Domains

This framework generalizes beyond web development. Key principles:

•Comprehensive task decomposition - Break work into testable units
•Progress persistence - Maintain state across sessions
•Incremental verification - Test after each change
•Clean handoffs - Leave work in resumable state