GPU Optimization Pipeline - Full Automation V3

You are the pipeline orchestrator. You run the COMPLETE optimization workflow automatically.

Parameters

Extract parameters from $ARGUMENTS (space-separated):

•N = First argument (default: 24)
•CHUNK_SIZE = Second argument (default: 4)
•BUILD_JOBS = Third argument (default: 4)
•TIMEOUT = Fourth argument (default: 1800)
•LOG_DIR = Fifth argument (default: ./optim_logs)

Example: /optim-pipeline 24 4 4 1800 ./logs

Complete Workflow

You will execute ALL phases automatically:

bash

# Get parameters
PARAMS=($ARGUMENTS)
N_AGENTS=${PARAMS[0]:-24}
CHUNK_SIZE=${PARAMS[1]:-4}
BUILD_JOBS=${PARAMS[2]:-4}
TIMEOUT=${PARAMS[3]:-1800}
LOG_DIR=${PARAMS[4]:-"./optim_logs"}

# Additional pipeline parameters
REPEAT_COUNT=${PARAMS[5]:-10}
COOLDOWN=${PARAMS[6]:-2}
FILTER=${PARAMS[7]:-"3D_Large"}

echo "╔══════════════════════════════════════════════════════════╗"
echo "║     GPU OPTIMIZATION PIPELINE - FULL AUTOMATION          ║"
echo "╚══════════════════════════════════════════════════════════╝"
echo "Agents: $N_AGENTS | Chunk: $CHUNK_SIZE | Jobs: $BUILD_JOBS"
echo "Benchmark: $REPEAT_COUNT reps, ${COOLDOWN}s cooldown"
echo "Filter: $FILTER"
echo "══════════════════════════════════════════════════════════"
echo ""

Phase 1: Orchestrator

Launch optimization agents:

bash

echo "[1/4] 🚀 Launching $N_AGENTS optimization agents..."

# Launch orchestrator skill
ORCH_OUTPUT=$(optim-orchestrator $N_AGENTS $CHUNK_SIZE $BUILD_JOBS $TIMEOUT "$LOG_DIR")

# Extract session ID from output
SESSION_ID=$(echo "$ORCH_OUTPUT" | grep -oP 'Session ID: \K[0-9_]+' || echo "")

if [ -z "$SESSION_ID" ]; then
  echo "❌ Failed to get session ID from orchestrator"
  echo "$ORCH_OUTPUT"
  exit 1
fi

echo "✓ Session ID: $SESSION_ID"
echo ""

Phase 2: Smart Monitoring

Monitor agents with IMPROVED logic:

bash

echo "[2/4] ⏳ Monitoring agents (smart mode)..."

SESSION_LOG_DIR="$LOG_DIR/session_${SESSION_ID}"
START_TIME=$(date +%s)
MAX_WAIT=$((TIMEOUT + 300))  # Timeout + 5min grace period

while true; do
  NOW=$(date +%s)
  ELAPSED=$((NOW - START_TIME))

  # Count completed agents
  COMPLETED=0
  STUCK=0
  BUILD_FAILED=0

  for i in $(seq -f "%02g" 1 $N_AGENTS); do
    RESULT_FILE="$SESSION_LOG_DIR/agent_${i}_result.json"

    if [ -f "$RESULT_FILE" ]; then
      STATUS=$(cat "$RESULT_FILE" | jq -r '.status // "unknown"' 2>/dev/null)
      COMPLETED=$((COMPLETED + 1))

      if [ "$STATUS" = "build_failed" ]; then
        BUILD_FAILED=$((BUILD_FAILED + 1))
      fi
    else
      # Check if stuck (no log activity for 3 minutes)
      LOG="$SESSION_LOG_DIR/agent_${i}.log"
      if [ -f "$LOG" ]; then
        LAST_ACTIVITY=$(stat -c %Y "$LOG")
        LOG_AGE=$((NOW - LAST_ACTIVITY))

        # Stuck if no activity for 3min AND at least one build failed nearby
        if [ $LOG_AGE -gt 180 ] && [ $BUILD_FAILED -gt 0 ]; then
          STUCK=$((STUCK + 1))
        fi
      fi
    fi
  done

  # Progress bar (every 60s only)
  if [ $((ELAPSED % 60)) -eq 0 ]; then
    PROGRESS=$((COMPLETED * 100 / N_AGENTS))
    echo "[$(date +%H:%M:%S)] Progress: $COMPLETED/$N_AGENTS ($PROGRESS%) - Stuck: $STUCK - Build failed: $BUILD_FAILED"

    # Early abort if too many failures
    if [ $BUILD_FAILED -gt $((N_AGENTS / 2)) ]; then
      echo "⚠️  More than 50% builds failed, aborting..."
      break
    fi
  fi

  # Check completion
  if [ $COMPLETED -ge $N_AGENTS ]; then
    echo "✓ All agents completed!"
    break
  fi

  # Timeout check
  if [ $ELAPSED -gt $MAX_WAIT ]; then
    echo "⚠️  Timeout after ${ELAPSED}s"
    break
  fi

  sleep 10  # Check every 10s, print every 60s
done

echo ""

Phase 3: Benchmarks

Run sequential GPU benchmarks:

bash

echo "[3/4] 📊 Running GPU benchmarks..."

# Find worktrees with successful builds
SUCCESSFUL_AGENTS=()
for i in $(seq -f "%02g" 1 $N_AGENTS); do
  WORKTREE="/home/sbstndbs/subsetix_kokkos_optimized_opt${i}"
  if [ -d "$WORKTREE/build-experimental-cuda" ]; then
    RESULT_FILE="$SESSION_LOG_DIR/agent_${i}_result.json"
    if [ -f "$RESULT_FILE" ]; then
      STATUS=$(cat "$RESULT_FILE" | jq -r '.status // "unknown"' 2>/dev/null)
      if [ "$STATUS" = "success" ]; then
        SUCCESSFUL_AGENTS+=("$i")
      fi
    fi
  fi
done

echo "Found ${#SUCCESSFUL_AGENTS[@]} successful builds"

if [ ${#SUCCESSFUL_AGENTS[@]} -eq 0 ]; then
  echo "❌ No successful builds to benchmark"
  exit 1
fi

# Run benchmarks (using the skill)
optim-benchmark $N_AGENTS $SESSION_ID $REPEAT_COUNT $COOLDOWN "$FILTER"

echo "✓ Benchmarks complete"
echo ""

Phase 4: Anti-Triche + Report

bash

echo "[4/4] 🔍 Running anti-triche and generating report..."

# Anti-triche
optim-antitriche $N_AGENTS $SESSION_ID
echo "✓ Anti-triche complete"

# Generate report
optim-report $N_AGENTS $SESSION_ID

echo ""
echo "╔══════════════════════════════════════════════════════════╗"
echo "║              PIPELINE COMPLETE 🎉                       ║"
echo "╚══════════════════════════════════════════════════════════╝"
echo ""
echo "Results: $SESSION_LOG_DIR"
echo "Report: $SESSION_LOG_DIR/optimization_report_*.md"
echo ""

Return Format

json

{
  "pipeline": "full_automation",
  "session_id": "$SESSION_ID",
  "n_agents": $N_AGENTS,
  "completed": $COMPLETED,
  "successful_builds": ${#SUCCESSFUL_AGENTS[@]},
  "session_dir": "$SESSION_LOG_DIR",
  "report_path": "$SESSION_LOG_DIR/optimization_report_*.md"
}

Key Improvements

•Single command - All phases run automatically
•Smart monitoring - 10s checks, 60s progress updates, early abort on failures
•Successful-only benchmarks - Only benchmark agents that built successfully
•Clean output - Progress bar instead of spam

Return ONLY the final JSON summary.