Long-Running Agent Framework
Framework for enabling AI agents to work effectively across many context windows on complex tasks.
Core Problem
Long-running agents must work in discrete sessions where each new session begins with no memory of previous work. Without proper scaffolding, agents tend to:
- •One-shot attempts - Try to complete everything at once, running out of context mid-implementation
- •Premature completion - See partial progress and declare the job done
- •Undocumented states - Leave code in broken or undocumented states between sessions
Two-Agent Solution
1. Initializer Agent (First Session Only)
Sets up the environment with all context future agents need:
- •Create
init.shscript for environment setup - •Generate comprehensive
feature_list.jsonwith all requirements - •Initialize
claude-progress.txtfor session logging - •Make initial git commit
See references/initializer-prompt.md for the full prompt template.
2. Coding Agent (Every Subsequent Session)
Makes incremental progress while maintaining clean state:
- •Read progress files and git logs to get bearings
- •Run basic tests to verify working state
- •Work on ONE feature at a time
- •Test end-to-end before marking complete
- •Commit progress with descriptive messages
- •Update progress file
See references/coding-prompt.md for the full prompt template.
Session Startup Sequence
Every coding agent session should begin:
1. pwd # Understand working directory 2. cat claude-progress.txt # Read recent progress 3. cat feature_list.json # Check feature status 4. git log --oneline -20 # Review recent commits 5. ./init.sh # Start dev environment 6. <run basic test> # Verify app works 7. <select next feature> # Choose one failing feature
Key Files
feature_list.json
Comprehensive list of all features with pass/fail status. Use JSON format to prevent inappropriate edits.
{
"features": [
{
"category": "functional",
"description": "User can create new chat",
"steps": ["Navigate to main", "Click New Chat", "Verify creation"],
"passes": false
}
]
}
Template: assets/feature_list_template.json
claude-progress.txt
Session-by-session log of work completed. Each entry includes:
- •Session timestamp
- •Features worked on
- •Changes made
- •Current state
- •Next steps
Template: assets/progress_template.md
init.sh
Environment setup script that:
- •Installs dependencies
- •Starts development servers
- •Sets up any required services
Critical Rules
For Feature List
- •Never remove or edit test descriptions
- •Only change
passesfield status - •Mark as passing ONLY after end-to-end verification
For Progress Tracking
- •Always commit before session end
- •Write descriptive commit messages
- •Update progress file with summary
- •Leave environment in mergeable state
For Testing
- •Use browser automation for web apps (Puppeteer MCP)
- •Test as a human user would
- •Verify end-to-end, not just unit tests
- •Document any known limitations
Common Failure Modes & Solutions
| Problem | Solution |
|---|---|
| Agent one-shots entire project | Create detailed feature list, work one at a time |
| Declares victory too early | Check feature_list.json for failing tests |
| Leaves broken state | Run basic test at session start, fix first |
| Marks features done prematurely | Require end-to-end browser testing |
| Wastes time figuring out setup | Read init.sh, use established patterns |
Adapting to Other Domains
This framework generalizes beyond web development. Key principles:
- •Comprehensive task decomposition - Break work into testable units
- •Progress persistence - Maintain state across sessions
- •Incremental verification - Test after each change
- •Clean handoffs - Leave work in resumable state