GPU Optimization Orchestrator V3
You are orchestrating N optimization agents for Kokkos CUDA GPU optimization.
Parameters
Extract parameters from $ARGUMENTS (space-separated):
- •N = First argument (default: 24)
- •CHUNK_SIZE = Second argument (default: 4)
- •BUILD_JOBS = Third argument (default: 4) - Compile with
-j{BUILD_JOBS} - •TIMEOUT = Fourth argument (default: 1800) - Per-agent timeout in seconds (30 min default)
- •LOG_DIR = Fifth argument (default:
./optim_logs) - Directory for agent logs
Example: /optim-orchestrator 24 4 4 1800 ./logs → 24 agents, chunks of 4, -j4, 30min timeout, custom logs
Key Changes in V3
- •Session ID: Unique timestamp per run for isolation
- •Build from Scratch: Always clean build directories before starting
- •Smart Monitoring: Progress checks every 60s, not on every action
- •Worktree Management: Reset worktrees to clean state at session start
Auto-detected
- •SESSION_ID: Unique timestamp for this run (e.g.,
20260123_143000) - •GPU_ARCH: Auto-detected with
nvidia-smi -L - •GPU_NAME: Auto-detected for reporting
Context
- •Repository: Current directory (must be subsetix_kokkos)
- •Target file:
experimental/include/experimental/subsetix/csr/set_algebra/optimized.hpp - •Baseline:
baseline.hpp(NEVER modify) - •Benchmark target: 3D Large (~5M rows)
Phase 0: Setup & Session Initialization
bash
# Get parameters
PARAMS=($ARGUMENTS)
N_AGENTS=${PARAMS[0]:-24}
CHUNK_SIZE=${PARAMS[1]:-4}
BUILD_JOBS=${PARAMS[2]:-4}
TIMEOUT=${PARAMS[3]:-1800}
LOG_DIR=${PARAMS[4]:-"./optim_logs"}
# Create session with unique ID
SESSION_ID=$(date +%Y%m%d_%H%M%S)
SESSION_LOG_DIR="$LOG_DIR/session_${SESSION_ID}"
mkdir -p "$SESSION_LOG_DIR"
# Detect GPU
GPU_INFO=$(nvidia-smi -L 2>/dev/null | head -1)
GPU_NAME=$(echo "$GPU_INFO" | sed 's/GPU 0: //;s/(UUID.*//;s/ *$//')
# Map GPU name to Kokkos architecture
case "$GPU_NAME" in
*RTX*40*|*4070*|*4060*) GPU_ARCH="ADA89" ;;
*RTX*30*|*3050*) GPU_ARCH="AMPERE86" ;;
*RTX*20*|*2080*) GPU_ARCH="TURING75" ;;
*A100*) GPU_ARCH="AMPERE80" ;;
*H100*) GPU_ARCH="HOPPER90" ;;
*) GPU_ARCH="AMPERE86" ;; # Default fallback
esac
# Session log file
LOG_FILE="$SESSION_LOG_DIR/orchestrator.log"
echo "=== GPU Optimization Orchestrator V3 ===" | tee "$LOG_FILE"
echo "Session ID: $SESSION_ID" | tee -a "$LOG_FILE"
echo "GPU: $GPU_NAME ($GPU_ARCH)" | tee -a "$LOG_FILE"
echo "Agents: $N_AGENTS | Chunk: $CHUNK_SIZE | Jobs: $BUILD_JOBS" | tee -a "$LOG_FILE"
echo "Session dir: $SESSION_LOG_DIR" | tee -a "$LOG_FILE"
echo "=====================================" | tee -a "$LOG_FILE"
Phase 1: Prepare Worktrees (Reset to Clean State)
bash
cd /home/sbstndbs/subsetix_kokkos
# Clean up worktrees from previous sessions
for i in $(seq -f "%02g" 1 $N_AGENTS); do
WORKTREE="../subsetix_kokkos_optimized_opt${i}"
# Remove existing worktree
if git worktree list | grep -q "optimized_opt${i}"; then
echo "Removing old worktree optimized_opt${i}..." | tee -a "$LOG_FILE"
git worktree remove "$WORKTREE" --force 2>&1 | tee -a "$LOG_FILE" || true
git branch -D "feature/optimized-opt${i}" 2>/dev/null || true
fi
# Create fresh worktree from current HEAD
echo "Creating fresh worktree optimized_opt${i}..." | tee -a "$LOG_FILE"
git worktree add "$WORKTREE" -b "feature/optimized-opt${i}-session-${SESSION_ID}" >> "$LOG_FILE" 2>&1
# Clean any existing builds
rm -rf "$WORKTREE/build-experimental-cuda"
# Verify optimized.hpp exists
if [ ! -f "$WORKTREE/experimental/include/experimental/subsetix/csr/set_algebra/optimized.hpp" ]; then
echo "ERROR: optimized.hpp not found in $WORKTREE" | tee -a "$LOG_FILE"
exit 1
fi
done
echo "✓ Prepared $N_AGENTS clean worktrees" | tee -a "$LOG_FILE"
Phase 2: Generate N Personas
For each agent, generate a random profile with 6 cursors:
python
import random
def generate_persona(agent_id: int) -> dict:
# Cursor 1: Risk (weighted)
risk = random.choices(
["Conservative", "Moderate", "Aggressive", "Experimental"],
weights=[0.25, 0.40, 0.25, 0.10]
)[0]
# Cursor 2: Expertise
expertise = random.choice([
"KokkosSpecialist", "GPUArchitect", "AlgorithmExpert",
"MemoryArchitect", "SystemsThinker", "ParallelismExpert",
"DataStructureSpecialist"
])
# Cursor 3: OptType
opt_type = random.choice([
"QuickWin", "KokkosPattern", "GPUHwSpecific",
"Algorithmic", "Structural", "MemoryLayout", "LatencyHiding"
])
# Cursor 4: Style (weighted)
style = random.choices(
["Analytical", "Experimental", "Incremental", "Hybrid"],
weights=[0.2, 0.3, 0.2, 0.3]
)[0]
# Cursor 5: Scope
scope = random.choice(["Local", "Regional", "Global"])
# Cursor 6: Innovation (weighted)
innovation = random.choices(
["Proven", "Novel", "Wild"],
weights=[0.4, 0.4, 0.2]
)[0]
return {
"agent_id": f"{agent_id:02d}",
"risk": risk,
"expertise": expertise,
"opt_type": opt_type,
"style": style,
"scope": scope,
"innovation": innovation
}
Save personas to $SESSION_LOG_DIR/personas.json
Phase 3: Launch Agents (Chunks)
Launch N agents in chunks of CHUNK_SIZE using the Task tool.
IMPORTANT - Build from Scratch: Each agent MUST start with a clean build:
bash
# Before building, ensure clean state
rm -rf build-experimental-cuda
cmake --preset experimental-cuda -DKokkos_ARCH_${GPU_ARCH}=ON
cmake --build --preset experimental-cuda -j${BUILD_JOBS}
SMART MONITORING:
- •Check progress every ~60 seconds (not on every action)
- •Look for heartbeat signals in logs
- •If no progress for 120s, consider agent stuck
TIMEOUT HANDLING:
- •Per-agent timeout is generous (default 30 min)
- •Check agent status periodically
- •If timeout exceeded, mark as failed and continue
AGENT TASK TEMPLATE:
text
You are optimization agent {agent_id} with persona:
{persona_json}
WORKTREE: /home/sbstndbs/subsetix_kokkos_optimized_opt{agent_id:02d}
TARGET: experimental/include/experimental/subsetix/csr/set_algebra/optimized.hpp
GPU: $GPU_ARCH
STEPS:
1. Read optimized.hpp and analyze according to your persona
2. Find ONE optimization matching your profile
3. Implement the optimization
4. Clean build: rm -rf build-experimental-cuda
5. Configure: cmake --preset experimental-cuda -DKokkos_ARCH_$GPU_ARCH=ON
6. Build: cmake --build --preset experimental-cuda -j$BUILD_JOBS
7. Test: ctest --preset experimental-cuda --output-on-failure
8. Return JSON result
Return ONLY this JSON format:
{
"agent_id": "{agent_id:02d}",
"persona": {...},
"optimization": {"name": "...", "description": "..."},
"status": "success|build_failed|tests_failed",
"log_file": "..."
}
Phase 4: Monitor and Collect
Smart monitoring loop:
bash
# Check progress every 60 seconds
while true; do
sleep 60
# Check for heartbeat (recent log activity)
for i in $(seq -f "%02g" 1 $N_AGENTS); do
LOG="$SESSION_LOG_DIR/agent_${i}.log"
if [ -f "$LOG" ]; then
LAST_ACTIVITY=$(stat -c %Y "$LOG")
NOW=$(date +%s)
ELAPSED=$((NOW - LAST_ACTIVITY))
# Alert if no activity for 2 minutes
if [ $ELAPSED -gt 120 ]; then
echo "⚠️ Agent $i appears stuck (no activity for ${ELAPSED}s)" | tee -a "$LOG_FILE"
fi
fi
done
# Check if all agents completed
PENDING=$(find "$SESSION_LOG_DIR" -name "agent_*_result.json" 2>/dev/null | wc -l)
if [ $PENDING -eq $N_AGENTS ]; then
break
fi
done
Phase 5: Final Summary
Save results to $SESSION_LOG_DIR/results.json:
json
{
"session_id": "$SESSION_ID",
"orchestrator": "v3",
"gpu_arch": "$GPU_ARCH",
"n_agents": $N_AGENTS,
"successful": 4,
"failed": 0,
"optimizations": [...],
"results": [
{"agent_id": "01", "status": "success", "optimization": {...}},
...
]
}
Return Format
json
{
"orchestrator": "v3",
"session_id": "$SESSION_ID",
"gpu_arch": "$GPU_ARCH",
"n_agents": $N_AGENTS,
"successful": N,
"failed": M,
"session_dir": "$SESSION_LOG_DIR",
"log_file": "$LOG_FILE",
"results_file": "$SESSION_LOG_DIR/results.json",
"next_steps": [
"Run benchmark specialist: /optim-benchmark $N_AGENTS $SESSION_ID",
"Run anti-triche: /optim-antitriche $N_AGENTS $SESSION_ID",
"Generate report: /optim-report $N_AGENTS $SESSION_LOG_DIR"
]
}
Session Directory Structure
code
$LOG_DIR/
└── session_20260123_143000/
├── orchestrator.log # Main orchestrator log
├── personas.json # Generated personas
├── results.json # Final results
├── agent_01.log # Individual agent logs
├── agent_01_result.json # Agent results
├── agent_02.log
├── agent_02_result.json
└── ...