Tech Stack
Frontend
- •React + TypeScript + Vite
- •Tailwind CSS preferred over plain CSS files
- •Text editor: CodeMirror or TipTap recommended over plain textarea
- •react-markdown for LLM response rendering
- •vitest + @testing-library/react for testing
Backend
- •FastAPI + uv + sse-starlette (for real-time streaming)
- •Use pyproject.toml (modern, consolidates config)
- •CORS: Include all potential dev ports (5173, 5174, 3000) - Vite bounces to alternate ports
- •pytest + pytest-asyncio for testing
- •Ruff for linting
LLM Integration
- •claude-agent-sdk (Anthropic's official agent SDK)
- •Simple HTTP request/response preferred over WebSocket streaming
- •Client-side history management - frontend stores conversation, sends full history, backend stays stateless
- •Models: haiku (speed), opus (quality)
Makefile Conventions
Provide a Makefile with standardized commands:
.PHONY: help serve kill restart test lint
help: ## Show this help message
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-15s\033[0m %s\n", $$1, $$2}'
serve: ## Start both frontend and backend
@echo "Starting backend..."
cd backend && uv run uvicorn main:app --port 8000 &
@echo "Starting frontend..."
cd frontend && npm run dev &
@echo "Services started. Use 'make kill' to stop."
kill: ## Stop all services
@pkill -f "uvicorn main:app" 2>/dev/null || true
@pkill -f "vite" 2>/dev/null || true
@echo "Services stopped."
restart: kill serve ## Restart all services
test: ## Run all tests
cd backend && uv run pytest
cd frontend && npm test
lint: ## Run linters
cd backend && uv run ruff check .
cd frontend && npm run lint
Key principles:
- •
make helpas the default/documented entry point - shows all available commands - •
make servestarts full stack with single command - •
make killcleanly stops services - no orphan processes - •Disable hot reload during development - prevents test flakiness
- •Health endpoints return service identification for debugging:
@router.get("/health")
async def health():
return {"status": "healthy", "service": "your-service-name"}
SDK Gotchas (claude-agent-sdk)
Authentication
- •Local auth works automatically - no API key needed when running locally
- •Don't use raw
anthropic.Anthropic()- it requires explicit API key - •Import:
from claude_agent_sdk import query, ClaudeAgentOptions, ResultMessage
Two Usage Patterns
Pattern 1: Text-Only Mode (for simple completions)
options = ClaudeAgentOptions(
model="claude-sonnet-4-20250514",
system_prompt=system_prompt,
max_turns=1,
allowed_tools=[], # Disables tools, forces text output
)
result_text = ""
async for message in query(prompt=prompt, options=options):
if isinstance(message, ResultMessage):
result_text = message.result or ""
Pattern 2: Agent Mode (for file operations, testing, iteration)
options = ClaudeAgentOptions(
model="claude-sonnet-4-20250514",
system_prompt=system_prompt,
max_turns=None, # Unlimited turns
cwd=str(working_dir),
permission_mode="acceptEdits",
allowed_tools=["Read", "Glob", "Grep", "Write", "Edit", "WebFetch", "Bash"],
add_dirs=[str(working_dir)],
)
Tool Behavior (CRITICAL)
- •Tools enabled = different output: When tools are available, Claude may USE them rather than OUTPUT text
- •Example: "Create index.html" → Claude writes the file via Write tool,
ResultMessage.resultis empty - •To get text output, use
allowed_tools=[] - •To let Claude create files, enable tools + set
cwdandpermission_mode
Message Type Checking
from claude_agent_sdk import (
AssistantMessage, ResultMessage,
TextBlock, ToolUseBlock, ToolResultBlock
)
async for message in query(prompt=prompt, options=options):
if isinstance(message, AssistantMessage):
for block in message.content:
if isinstance(block, ToolUseBlock):
tool_name = block.name
tool_input = block.input or {}
elif isinstance(block, TextBlock):
text = block.text
elif isinstance(message, ResultMessage):
result = message.result
NEVER use attribute-based type checking like block.type == 'tool_use' - fails silently.
General Gotchas
- •Never break early from async generator - causes cancel scope errors
- •Consume entire generator even if you have the result
- •Empty result with tools - If
ResultMessage.resultis empty, Claude likely completed an action via tools
LLM Output Validation (Early Return Pattern)
When LLM output requires parsing, validate before proceeding:
# Parse structured output from LLM
analysis = await self._analyze_codebase(session_id, path, question)
# Log raw output for debugging
logger.info(
"Analysis complete",
session_id=session_id,
analysis_length=len(analysis),
analysis_preview=analysis[:500] if analysis else "EMPTY",
)
# Parse and validate
variants = parse_xml_tags(analysis, "variant")
logger.info("Parsed results", num_variants=len(variants))
# CRITICAL: Early return on validation failure
if len(variants) == 0:
logger.error("No variants found", analysis_output=analysis[:1000])
await activity_stream.emit(session_id, "session_error",
"Analysis failed to produce results. Check logs.")
return LaunchResponse(session_id=session_id, variants=[],
error="Analysis produced no results")
# Only proceed if validation passed
await activity_stream.emit(session_id, "analysis_complete",
f"Found {len(variants)} variants")
Key principle: Emit error event, then return early. Never continue a workflow with empty/invalid LLM output.
Parsing Structured LLM Output
When LLMs output structured formats (XML, JSON), use proper parsing libraries:
XML Output - Use ElementTree
DON'T use regex for XML parsing:
# Fragile - breaks on whitespace, special chars, nested tags
for match in re.finditer(r'<(\w+)>(.*?)</\1>', response):
axis = match.group(1)
content = match.group(2)
DO use xml.etree.ElementTree:
import xml.etree.ElementTree as ET
# Extract XML block from LLM response (may have surrounding text)
xml_match = re.search(r'<root>(.*?)</root>', response, re.DOTALL)
if not xml_match:
logger.warning("No XML block found")
return []
try:
root = ET.fromstring(f"<wrapper>{xml_match.group(1)}</wrapper>")
for item in root.findall('item'):
data = {
"id": item.get('id'), # Attributes
"name": item.findtext('name'), # Child element text
}
# Dynamic child parsing
for child in item:
if child.text:
data[child.tag] = child.text
except ET.ParseError as e:
logger.warning(f"XML parse error: {e}")
# Fallback to basic regex if needed
Benefits:
- •Proper tree traversal for nested structures
- •Automatic whitespace handling
- •Clear errors on malformed XML
- •Dynamic element discovery (no hardcoded tag names)
JSON Output
import json
# Clean markdown code blocks if present
clean = response.strip()
if clean.startswith("```"):
clean = re.sub(r"^```(?:json)?\n?", "", clean)
clean = re.sub(r"\n?```$", "", clean)
try:
data = json.loads(clean)
except json.JSONDecodeError as e:
logger.error(f"JSON parse error: {e}", raw_response=response[:500])
return None
Key principle: LLMs produce text, not data structures. Always parse with proper libraries, log raw output on failure.
Parallel Agent Operations
When running multiple agents in parallel with asyncio.gather:
results = await asyncio.gather(*build_tasks, return_exceptions=True)
successful = 0
failed = 0
for i, result in enumerate(results):
if isinstance(result, Exception):
failed += 1
logger.error("Build failed", variant_id=variants[i].id,
error=str(result), error_type=type(result).__name__)
variants[i].status = "error"
# CRITICAL: Emit to activity stream so errors are visible
await activity_stream.emit(session_id, "builder_error",
f"[{variants[i].id}] Build failed: {str(result)[:200]}",
details={"error": str(result)})
else:
successful += 1
variants[i] = result
logger.info("All builds completed", successful=successful, failed=failed)
Key principle: Errors in return_exceptions=True are silent by default. Emit them to SSE stream for visibility.
Structured Logging
Use structlog for LLM pipelines:
import structlog
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.format_exc_info,
structlog.dev.ConsoleRenderer() if sys.stdout.isatty()
else structlog.processors.JSONRenderer(),
],
wrapper_class=structlog.stdlib.BoundLogger,
logger_factory=structlog.stdlib.LoggerFactory(),
)
logger = structlog.get_logger()
What to log:
# Before LLM call
logger.info("Starting analysis", session_id=session_id, question=question[:100])
# After LLM call
logger.info("Analysis complete", result_length=len(result), has_expected_tags="<variant" in result)
# After parsing
logger.info("Parsed results", num_variants=len(variants))
# After parallel operations
logger.info("All builds completed", successful=successful_count, failed=failed_count)
# On errors
logger.error("Build failed", variant_id=variant.id, error=str(e), error_type=type(e).__name__)
Always include context IDs (session_id, variant_id, etc.) for log correlation.
Prompt Management
Store prompts as composable .txt files:
prompts/
├── system.txt # Base system prompt
├── format.txt # Output format instructions
├── examples.txt # Few-shot examples
└── shared/
└── json-output.txt # Reusable JSON formatting
Principles:
- •Load and compose at request time, not startup - allows changes without restart
- •Wrap chunks in XML tags when composing:
<system>...</system>,<format>...</format> - •Support template variables:
{user_input},{history} - •Name files by API route where used
Testing Philosophy
- •Test LLM apps end-to-end, not just frontend
- •Verify responses are correct, not just that requests succeed
- •Save logs to ./logs folder for debugging test failures
- •For agent testing: choose 2-3 typical scenarios initially (tokens cost money)
- •Exclude long-running agent tests from normal pytest runs
- •Use Playwright for E2E with network/console capture
Sample .gitignore
# Dependencies node_modules/ .pnpm-store/ # Build output dist/ build/ .vite/ # IDE .idea/ *.swp .vscode/ # Environment (NEVER commit API keys) .env .env.local .anthropic_key *.key # Logs logs/ *.log # Testing coverage/ .pytest_cache/ __pycache__/ # Cache .eslintcache .ruff_cache/ # Python .venv/ venv/ .uv/ # Generated *.tmp .annotations/
References
Load these as needed:
- •references/chat-interface.md - Message handling, input behavior, UI/UX patterns for chat applications
- •references/realtime-streaming.md - SSE patterns, early connection, phase-based activity UI
- •references/storage-patterns.md - File-based vs SQL, JSONL for logs, JSON for config