Model Tracking Protocol
Version: 1.0.0 Purpose: MANDATORY tracking protocol for multi-model validation to prevent incomplete reviews Status: Production Ready
Overview
This skill defines the MANDATORY tracking protocol for multi-model validation. It provides templates and procedures that make proper tracking unforgettable.
The Problem This Solves:
Agents often launch multiple external AI models but fail to:
- •Create structured tracking tables before launch
- •Collect timing and performance data during execution
- •Document failures with error messages
- •Perform consensus analysis comparing model findings
- •Present results in a structured format
The Solution:
This skill provides MANDATORY checklists, templates, and protocols that ensure complete tracking. Missing ANY of these steps = INCOMPLETE review.
Table of Contents
- •MANDATORY Pre-Launch Checklist
- •Tracking Table Templates
- •Per-Model Status Updates
- •Failure Documentation Protocol
- •Consensus Analysis Requirements
- •Results Presentation Template
- •Common Failures and Prevention
- •Integration Examples
MANDATORY Pre-Launch Checklist
You MUST complete ALL items before launching ANY external models.
This is NOT optional. If you skip this, your multi-model validation is INCOMPLETE.
Checklist (Copy and Complete)
PRE-LAUNCH VERIFICATION (complete before Task calls): [ ] 1. SESSION_ID created: ________________________ [ ] 2. SESSION_DIR created: ________________________ [ ] 3. Tracking table written to: $SESSION_DIR/tracking.md [ ] 4. Start time recorded: SESSION_START=$(date +%s) [ ] 5. Model list confirmed (comma-separated): ________________________ [ ] 6. Per-model timing arrays initialized [ ] 7. Code context written to session directory [ ] 8. Tracking marker created: /tmp/.claude-multi-model-active If ANY item is unchecked, STOP and complete it before proceeding.
Why Pre-Launch Matters
Without pre-launch setup, you will:
- •Lose timing data (cannot calculate speed accurately)
- •Miss failed model details (no structured place to record)
- •Skip consensus analysis (no model list to compare)
- •Present incomplete results (no tracking table to populate)
Pre-Launch Script Template
CRITICAL CONSENSUS FIX APPLIED: Use file-based detection instead of environment variables.
#!/bin/bash
# Run this BEFORE launching any Task calls
# 1. Create unique session
SESSION_ID="review-$(date +%Y%m%d-%H%M%S)-$(head -c 4 /dev/urandom | xxd -p)"
SESSION_DIR="/tmp/${SESSION_ID}"
mkdir -p "$SESSION_DIR"
# 2. Record start time
SESSION_START=$(date +%s)
# 3. Create tracking table
cat > "$SESSION_DIR/tracking.md" << EOF
# Multi-Model Tracking
## Session Info
- Session ID: ${SESSION_ID}
- Started: $(date -u +%Y-%m-%dT%H:%M:%SZ)
- Models Requested: [FILL]
## Model Status
| Model | Agent ID | Status | Start | End | Duration | Issues | Quality | Notes |
|-------|----------|--------|-------|-----|----------|--------|---------|-------|
| [MODEL 1] | | pending | | | | | | |
| [MODEL 2] | | pending | | | | | | |
| [MODEL 3] | | pending | | | | | | |
## Failures
| Model | Failure Type | Error Message | Retry? |
|-------|--------------|---------------|--------|
## Consensus
| Issue | Model 1 | Model 2 | Model 3 | Agreement |
|-------|---------|---------|---------|-----------|
EOF
# 4. Initialize timing arrays
declare -A MODEL_START_TIMES
declare -A MODEL_END_TIMES
declare -A MODEL_STATUS
# 5. Create tracking marker file (CRITICAL FIX)
# This allows hooks to detect that tracking is active
echo "$SESSION_DIR" > /tmp/.claude-multi-model-active
echo "Pre-launch setup complete. Session: $SESSION_ID"
echo "Directory: $SESSION_DIR"
echo "Tracking table: $SESSION_DIR/tracking.md"
Strict Mode (Optional)
For stricter enforcement, set:
export CLAUDE_STRICT_TRACKING=true
When enabled, hooks will BLOCK execution if tracking is not set up, rather than just warning.
Tracking Table Templates
Template A: Simple Model Tracking (3-5 models)
| Model | Status | Time | Issues | Quality | Cost | |-------|--------|------|--------|---------|------| | claude-embedded | pending | - | - | - | FREE | | x-ai/grok-code-fast-1 | pending | - | - | - | - | | qwen/qwen3-coder:free | pending | - | - | - | FREE |
Update as each completes:
| Model | Status | Time | Issues | Quality | Cost | |-------|--------|------|--------|---------|------| | claude-embedded | success | 32s | 8 | 95% | FREE | | x-ai/grok-code-fast-1 | success | 45s | 6 | 87% | $0.002 | | qwen/qwen3-coder:free | timeout | - | - | - | - |
Template B: Detailed Model Tracking (6+ models)
## Model Execution Status ### Summary - Total Requested: 8 - Completed: 0 - In Progress: 0 - Failed: 0 - Pending: 8 ### Detailed Status | # | Model | Provider | Status | Start | Duration | Issues | Quality | Cost | Error | |---|-------|----------|--------|-------|----------|--------|---------|------|-------| | 1 | claude-embedded | Anthropic | pending | - | - | - | - | FREE | - | | 2 | x-ai/grok-code-fast-1 | X-ai | pending | - | - | - | - | - | - | | 3 | qwen/qwen3-coder:free | Qwen | pending | - | - | - | - | FREE | - | | 4 | google/gemini-3-pro | Google | pending | - | - | - | - | - | - | | 5 | openai/gpt-5.1-codex | OpenAI | pending | - | - | - | - | - | - | | 6 | mistralai/devstral | Mistral | pending | - | - | - | - | FREE | - | | 7 | deepseek/deepseek-r1 | DeepSeek | pending | - | - | - | - | - | - | | 8 | anthropic/claude-sonnet | Anthropic | pending | - | - | - | - | - | - |
Template C: Session-Based Tracking File
Create this file at $SESSION_DIR/tracking.md:
# Multi-Model Validation Tracking
Session: ${SESSION_ID}
Started: ${TIMESTAMP}
## Pre-Launch Verification
- [x] Session directory created: ${SESSION_DIR}
- [x] Tracking table initialized
- [x] Start time recorded: ${SESSION_START}
- [x] Model list: ${MODEL_LIST}
## Model Status
| Model | Status | Start | Duration | Issues | Quality |
|-------|--------|-------|----------|--------|---------|
| claude | pending | - | - | - | - |
| grok | pending | - | - | - | - |
| gemini | pending | - | - | - | - |
## Failures
(populated as failures occur)
## Consensus
(populated after all complete)
Update Protocol
As each model completes, IMMEDIATELY update:
- •Status:
pending->in_progress->success/failed/timeout - •Duration: Calculate from start time
- •Issues: Number of issues found
- •Quality: Percentage if calculable
- •Error: If failed, brief error message
DO NOT wait until all models finish. Update as each completes.
Per-Model Status Update Protocol
IMMEDIATELY After Each Model Completes
Do NOT wait until all models finish. Update tracking AS EACH COMPLETES.
Update Script
# Call this when each model completes
update_model_status() {
local model="$1"
local status="$2"
local issues="${3:-0}"
local quality="${4:-}"
local error="${5:-}"
local end_time=$(date +%s)
local start_time="${MODEL_START_TIMES[$model]}"
local duration=$((end_time - start_time))
# Update arrays
MODEL_END_TIMES["$model"]=$end_time
MODEL_STATUS["$model"]="$status"
# Log update to session tracking file
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) - Model: $model, Status: $status, Duration: ${duration}s" >> "$SESSION_DIR/execution.log"
# Update tracking table (append to tracking.md)
echo "| $model | $status | ${duration}s | $issues | ${quality:-N/A} | ${error:-} |" >> "$SESSION_DIR/tracking.md"
# Track performance in global statistics
if [[ "$status" == "success" ]]; then
track_model_performance "$model" "success" "$duration" "$issues" "$quality"
else
track_model_performance "$model" "$status" "$duration" 0 ""
fi
}
# Usage examples:
update_model_status "claude-embedded" "success" 8 95
update_model_status "x-ai/grok-code-fast-1" "success" 6 87
update_model_status "some-model" "timeout" 0 "" "Exceeded 120s limit"
update_model_status "other-model" "failed" 0 "" "API 500 error"
Status Values
| Status | Meaning | Action |
|---|---|---|
pending | Not started | Wait |
in_progress | Currently executing | Monitor |
success | Completed successfully | Collect results |
failed | Error during execution | Document error |
timeout | Exceeded time limit | Note timeout |
cancelled | User cancelled | Note cancellation |
Real-Time Progress Display
Show user progress as models complete:
Model Status (3/5 complete): ✓ claude-embedded (32s, 8 issues) ✓ x-ai/grok-code-fast-1 (45s, 6 issues) ✓ qwen/qwen3-coder:free (52s, 5 issues) ⏳ openai/gpt-5.1-codex (in progress, 60s elapsed) ⏳ google/gemini-3-pro (in progress, 48s elapsed)
Failure Documentation Protocol
EVERY failed model MUST be documented with:
- •Model name
- •Failure type (timeout, API error, parse error, etc.)
- •Error message (exact or summarized)
- •Whether retry was attempted
Failure Report Template
## Failed Models Report ### Model: x-ai/grok-code-fast-1 - **Failure Type:** API Error - **Error Message:** "500 Internal Server Error from OpenRouter" - **Retry Attempted:** Yes, 1 retry, same error - **Impact:** Review results based on 3/4 models instead of 4 - **Recommendation:** Check OpenRouter status, retry later ### Model: google/gemini-3-pro - **Failure Type:** Timeout - **Error Message:** "Exceeded 120s limit, response incomplete" - **Retry Attempted:** No, time constraints - **Impact:** Lost Gemini perspective, consensus based on remaining models - **Recommendation:** Extend timeout to 180s for this model
Failure Categorization
| Category | Common Causes | Recovery |
|---|---|---|
| Timeout | Model slow, large input, network latency | Retry with extended timeout |
| API Error | Provider down, rate limit, auth issue | Wait and retry, check API status |
| Parse Error | Malformed response, encoding issue | Retry, simplify prompt |
| Auth Error | Invalid API key, expired token | Check credentials |
| Context Limit | Input too large for model | Reduce context, split task |
| Rate Limit | Too many requests | Wait, implement backoff |
Failure Summary Table
Always include this in final results:
## Execution Summary | Metric | Value | |--------|-------| | Models Requested | 8 | | Successful | 5 (62.5%) | | Failed | 3 (37.5%) | ### Failed Models | Model | Failure | Recoverable? | Action | |-------|---------|--------------|--------| | grok-code-fast-1 | API 500 | Yes - retry later | Check OpenRouter status | | gemini-3-pro | Timeout | Yes - extend limit | Use 180s timeout | | deepseek-r1 | Auth Error | No - check key | Verify API key valid |
Writing Failures to Session Directory
# Document failure immediately when it occurs
document_failure() {
local model="$1"
local failure_type="$2"
local error_msg="$3"
local retry_attempted="${4:-No}"
cat >> "$SESSION_DIR/failures.md" << EOF
### Model: $model
- **Failure Type:** $failure_type
- **Error Message:** "$error_msg"
- **Retry Attempted:** $retry_attempted
- **Timestamp:** $(date -u +%Y-%m-%dT%H:%M:%SZ)
EOF
echo "Failure documented: $model ($failure_type)" >&2
}
# Usage:
document_failure "x-ai/grok-code-fast-1" "API Error" "500 Internal Server Error" "Yes, 1 retry"
Consensus Analysis Requirements
After ALL models complete (or max wait time), you MUST perform consensus analysis.
This is NOT optional. Even with 2 successful models, compare their findings.
Minimum Viable Consensus (2 models)
With only 2 models, consensus is simple:
- •AGREE: Both found the same issue
- •DISAGREE: Only one found the issue
| Issue | Model 1 | Model 2 | Consensus | |-------|---------|---------|-----------| | SQL injection | Yes | Yes | AGREE | | Missing validation | Yes | No | Model 1 only | | Weak hashing | No | Yes | Model 2 only |
Standard Consensus (3-5 models)
| Issue | Claude | Grok | Gemini | Agreement | |-------|--------|------|--------|-----------| | SQL injection | Yes | Yes | Yes | UNANIMOUS (3/3) | | Missing validation | Yes | Yes | No | STRONG (2/3) | | Rate limiting | Yes | No | No | DIVERGENT (1/3) |
Extended Consensus (6+ models)
For 6+ models, add summary statistics:
## Consensus Summary - **Unanimous Issues (100%):** 3 issues - **Strong Consensus (67%+):** 5 issues - **Majority (50%+):** 2 issues - **Divergent (<50%):** 4 issues ## Top 5 by Consensus 1. [6/6] SQL injection in search - FIX IMMEDIATELY 2. [6/6] Missing input validation - FIX IMMEDIATELY 3. [5/6] Weak password hashing - RECOMMENDED 4. [4/6] Missing rate limiting - CONSIDER 5. [3/6] Error handling gaps - INVESTIGATE
Consensus Analysis Script
# Perform consensus analysis on all model findings
analyze_consensus() {
local session_dir="$1"
local num_models="$2"
echo "## Consensus Analysis" > "$session_dir/consensus.md"
echo "" >> "$session_dir/consensus.md"
echo "Based on $num_models model reviews:" >> "$session_dir/consensus.md"
echo "" >> "$session_dir/consensus.md"
# Read all review files and extract issues
# (simplified - actual implementation would parse review markdown)
for review in "$session_dir"/*-review.md; do
echo "Processing: $review"
# Extract issues, compare, categorize by agreement level
done
# Calculate consensus levels
echo "### Consensus Levels" >> "$session_dir/consensus.md"
echo "" >> "$session_dir/consensus.md"
echo "- UNANIMOUS: All $num_models models agree" >> "$session_dir/consensus.md"
echo "- STRONG: ≥67% of models agree" >> "$session_dir/consensus.md"
echo "- MAJORITY: ≥50% of models agree" >> "$session_dir/consensus.md"
echo "- DIVERGENT: <50% of models agree" >> "$session_dir/consensus.md"
}
NO Consensus Analysis = INCOMPLETE Review
If you present results without a consensus comparison, your review is INCOMPLETE.
Minimum Requirements:
- •✅ Compare findings across ALL successful models
- •✅ Categorize by agreement level (unanimous, strong, majority, divergent)
- •✅ Prioritize issues by consensus + severity
- •✅ Document in
$SESSION_DIR/consensus.md
Results Presentation Template
Your final output MUST include ALL of these sections.
Required Output Format
## Multi-Model Review Complete
### Execution Summary
| Metric | Value |
|--------|-------|
| Session ID | review-20251224-143052-a3f2 |
| Session Directory | /tmp/review-20251224-143052-a3f2 |
| Models Requested | 5 |
| Successful | 4 (80%) |
| Failed | 1 (20%) |
| Total Duration | 68s (parallel) |
| Sequential Equivalent | 245s |
| Speedup | 3.6x |
### Model Performance
| Model | Time | Issues | Quality | Status | Cost |
|-------|------|--------|---------|--------|------|
| claude-embedded | 32s | 8 | 95% | Success | FREE |
| x-ai/grok-code-fast-1 | 45s | 6 | 87% | Success | $0.002 |
| qwen/qwen3-coder:free | 52s | 5 | 82% | Success | FREE |
| openai/gpt-5.1-codex | 68s | 7 | 89% | Success | $0.015 |
| mistralai/devstral | - | - | - | Timeout | - |
### Failed Models
| Model | Failure | Error |
|-------|---------|-------|
| mistralai/devstral | Timeout | Exceeded 120s limit |
### Top Issues by Consensus
1. **[UNANIMOUS]** SQL injection in search endpoint
- Flagged by: claude, grok, qwen, gpt-5 (4/4)
- Severity: CRITICAL
- Action: FIX IMMEDIATELY
2. **[UNANIMOUS]** Missing input validation
- Flagged by: claude, grok, qwen, gpt-5 (4/4)
- Severity: CRITICAL
- Action: FIX IMMEDIATELY
3. **[STRONG]** Weak password hashing
- Flagged by: claude, grok, gpt-5 (3/4)
- Severity: HIGH
- Action: RECOMMENDED
### Detailed Reports
- Session directory: /tmp/review-20251224-143052-a3f2
- Consolidated review: /tmp/review-20251224-143052-a3f2/consolidated-review.md
- Individual reviews: /tmp/review-20251224-143052-a3f2/{model}-review.md
- Tracking data: /tmp/review-20251224-143052-a3f2/tracking.md
- Consensus analysis: /tmp/review-20251224-143052-a3f2/consensus.md
### Statistics Saved
- Performance data logged to: ai-docs/llm-performance.json
Missing Section Detection
Before presenting, verify ALL sections are present:
verify_output_complete() {
local output="$1"
local required=(
"Execution Summary"
"Model Performance"
"Top Issues"
"Detailed Reports"
"Statistics"
)
local missing=()
for section in "${required[@]}"; do
if ! echo "$output" | grep -q "$section"; then
missing+=("$section")
fi
done
if [ ${#missing[@]} -gt 0 ]; then
echo "ERROR: Missing required sections: ${missing[*]}" >&2
return 1
fi
return 0
}
Checklist before presenting results:
- • Execution Summary (models requested/successful/failed)
- • Model Performance table (per-model times and quality)
- • Failed Models section (if any failed)
- • Top Issues by Consensus (prioritized list)
- • Detailed Reports (session directory, file paths)
- • Statistics confirmation (llm-performance.json updated)
Common Failures and Prevention
Failure 1: No Tracking Table Created
Symptom: Results presented as prose, not structured data
What went wrong:
"I ran 5 models. 3 succeeded and found various issues." (No table, no structure)
Prevention:
- •Always run pre-launch script FIRST
- •Create
$SESSION_DIR/tracking.mdbefore Task calls - •Populate table as models complete
Detection: SubagentStop hook warns if no tracking found
Failure 2: Timing Not Recorded
Symptom: "Duration: unknown" or missing speed stats
What went wrong:
# Launched models without recording start time Task: reviewer1 Task: reviewer2 # No SESSION_START, cannot calculate duration!
Prevention:
# ALWAYS do this first SESSION_START=$(date +%s) MODEL_START_TIMES["model1"]=$SESSION_START
Detection: Hook checks for timing data in output
Failure 3: Failed Models Not Documented
Symptom: "2 of 8 succeeded" with no failure details
What went wrong:
"Launched 8 models. 2 succeeded." (No info on why 6 failed)
Prevention:
# Immediately when model fails document_failure "model-name" "Timeout" "Exceeded 120s" "No"
Detection: Hook checks for failure section when success < total
Failure 4: No Consensus Analysis
Symptom: Individual model results listed without comparison
What went wrong:
"Model 1 found: A, B, C Model 2 found: B, D, E" (No comparison: which issues do they agree on?)
Prevention:
- •After all complete, ALWAYS run consolidation
- •Create consensus table comparing findings
- •Prioritize by agreement level
Detection: Hook checks for consensus keywords
Failure 5: Statistics Not Saved
Symptom: No record in ai-docs/llm-performance.json
What went wrong:
# Forgot to call tracking functions # No record of this session
Prevention:
# ALWAYS call these track_model_performance "model" "status" duration issues quality record_session_stats total success failed parallel sequential speedup
Detection: Hook checks file modification time
Prevention Checklist
Before presenting results, verify:
[ ] Tracking table exists at $SESSION_DIR/tracking.md [ ] Tracking table is populated with all model results [ ] All model times recorded (or "timeout"/"failed" noted) [ ] All failures documented in $SESSION_DIR/failures.md [ ] Consensus analysis performed in $SESSION_DIR/consensus.md [ ] Results match required output format [ ] Statistics saved to ai-docs/llm-performance.json [ ] Session directory contains all artifacts
Integration Examples
Example 1: Complete Multi-Model Review Workflow
#!/bin/bash
# Full multi-model review with complete tracking
# ============================================================================
# PHASE 1: PRE-LAUNCH (MANDATORY)
# ============================================================================
# 1. Create unique session
SESSION_ID="review-$(date +%Y%m%d-%H%M%S)-$(head -c 4 /dev/urandom | xxd -p)"
SESSION_DIR="/tmp/${SESSION_ID}"
mkdir -p "$SESSION_DIR"
# 2. Record start time
SESSION_START=$(date +%s)
# 3. Create tracking table
cat > "$SESSION_DIR/tracking.md" << EOF
# Multi-Model Validation Tracking
## Session: $SESSION_ID
Started: $(date -u +%Y-%m-%dT%H:%M:%SZ)
## Model Status
| Model | Status | Duration | Issues | Quality |
|-------|--------|----------|--------|---------|
EOF
# 4. Initialize timing arrays
declare -A MODEL_START_TIMES
declare -A MODEL_END_TIMES
# 5. Create tracking marker
echo "$SESSION_DIR" > /tmp/.claude-multi-model-active
# 6. Write code context
git diff > "$SESSION_DIR/code-context.md"
echo "Pre-launch complete. Session: $SESSION_ID"
# ============================================================================
# PHASE 2: MODEL EXECUTION (Parallel Task calls)
# ============================================================================
# Record start times for each model
MODEL_START_TIMES["claude-embedded"]=$(date +%s)
MODEL_START_TIMES["x-ai/grok-code-fast-1"]=$(date +%s)
MODEL_START_TIMES["qwen/qwen3-coder:free"]=$(date +%s)
# Launch all models in single message (parallel execution)
# (These would be actual Task calls in practice)
echo "Launching 3 models in parallel..."
# ============================================================================
# PHASE 3: RESULTS COLLECTION (as each completes)
# ============================================================================
# Update status immediately after each completes
update_model_status() {
local model="$1" status="$2" issues="${3:-0}" quality="${4:-}"
local end_time=$(date +%s)
local duration=$((end_time - MODEL_START_TIMES["$model"]))
echo "| $model | $status | ${duration}s | $issues | ${quality:-N/A} |" >> "$SESSION_DIR/tracking.md"
track_model_performance "$model" "$status" "$duration" "$issues" "$quality"
}
# Example completions
update_model_status "claude-embedded" "success" 8 95
update_model_status "x-ai/grok-code-fast-1" "success" 6 87
update_model_status "qwen/qwen3-coder:free" "timeout"
# ============================================================================
# PHASE 4: CONSENSUS ANALYSIS (MANDATORY)
# ============================================================================
# Consolidate and compare findings
echo "Performing consensus analysis..."
# (Would launch consolidation agent here)
# ============================================================================
# PHASE 5: STATISTICS & PRESENTATION
# ============================================================================
# Calculate session stats
PARALLEL_TIME=52 # max of all durations
SEQUENTIAL_TIME=129 # sum of all durations
SPEEDUP=2.5
# Record session
record_session_stats 3 2 1 "$PARALLEL_TIME" "$SEQUENTIAL_TIME" "$SPEEDUP"
# Present results
cat << RESULTS
## Multi-Model Review Complete
Session: $SESSION_ID
Directory: $SESSION_DIR
Models: 3 requested, 2 successful, 1 failed
See tracking table: $SESSION_DIR/tracking.md
See consensus: $SESSION_DIR/consensus.md
Statistics saved to: ai-docs/llm-performance.json
RESULTS
# Cleanup marker
rm -f /tmp/.claude-multi-model-active
Example 2: Minimal 2-Model Comparison
# Simplest viable multi-model validation # Pre-launch SESSION_ID="review-$(date +%s)" SESSION_DIR="/tmp/$SESSION_ID" mkdir -p "$SESSION_DIR" SESSION_START=$(date +%s) echo "$SESSION_DIR" > /tmp/.claude-multi-model-active # Launch echo "Launching Claude + Grok..." # Task: claude-embedded # Task: PROXY_MODE grok # Track track_model_performance "claude" "success" 32 8 95 track_model_performance "grok" "success" 45 6 87 # Consensus echo "Issues both found: SQL injection, missing validation" > "$SESSION_DIR/consensus.md" # Stats record_session_stats 2 2 0 45 77 1.7 # Cleanup rm -f /tmp/.claude-multi-model-active
Example 3: Handling Failures
# Multi-model with failure handling # Pre-launch (same as Example 1) # ... setup code ... # Launch 4 models # ... Task calls ... # Model 1: Success update_model_status "claude" "success" 32 8 95 # Model 2: Success update_model_status "grok" "success" 45 6 87 # Model 3: Timeout update_model_status "gemini" "timeout" document_failure "gemini" "Timeout" "Exceeded 120s limit" "No" # Model 4: API Error update_model_status "gpt5" "failed" document_failure "gpt5" "API Error" "500 from OpenRouter" "Yes, 1 retry" # Proceed with 2 successful models if [ "$SUCCESS_COUNT" -ge 2 ]; then echo "Proceeding with $SUCCESS_COUNT successful models" # Consensus with partial data else echo "ERROR: Only $SUCCESS_COUNT succeeded, need minimum 2" fi
Integration with Other Skills
With multi-model-validation
The multi-model-validation skill defines the execution patterns (4-Message Pattern, parallel execution, proxy mode). This skill (model-tracking-protocol) defines the tracking infrastructure.
Use together:
skills: orchestration:multi-model-validation, orchestration:model-tracking-protocol
Workflow:
- •Read
multi-model-validationfor execution patterns - •Read
model-tracking-protocolfor tracking setup - •Pre-launch (tracking protocol)
- •Execute (validation patterns)
- •Track (protocol updates)
- •Present (protocol templates)
With quality-gates
Use quality gates to ensure tracking is complete before proceeding:
# After tracking setup, verify completeness if [ ! -f "$SESSION_DIR/tracking.md" ]; then echo "QUALITY GATE FAILED: No tracking table" exit 1 fi # Before presenting results, verify all sections present verify_output_complete "$OUTPUT" || exit 1
With todowrite-orchestration
Track progress through multi-model phases:
TodoWrite: 1. Pre-launch setup (tracking protocol) 2. Launch models (validation patterns) 3. Collect results (tracking updates) 4. Consensus analysis (protocol requirement) 5. Present results (protocol template)
Quick Reference
File-Based Tracking Marker (CONSENSUS FIX)
Create marker after pre-launch setup:
echo "$SESSION_DIR" > /tmp/.claude-multi-model-active
Check if tracking active (in hooks):
if [[ -f /tmp/.claude-multi-model-active ]]; then SESSION_DIR=$(cat /tmp/.claude-multi-model-active) [[ -f "$SESSION_DIR/tracking.md" ]] && echo "Tracking active" fi
Remove marker when done:
rm -f /tmp/.claude-multi-model-active
Pre-Launch Commands
SESSION_ID="review-$(date +%Y%m%d-%H%M%S)-$(head -c 4 /dev/urandom | xxd -p)"
SESSION_DIR="/tmp/${SESSION_ID}"
mkdir -p "$SESSION_DIR"
SESSION_START=$(date +%s)
echo "$SESSION_DIR" > /tmp/.claude-multi-model-active
Tracking Commands
update_model_status "model" "status" issues quality document_failure "model" "type" "error" "retry" track_model_performance "model" "status" duration issues quality record_session_stats total success failed parallel sequential speedup
Verification Commands
verify_output_complete "$OUTPUT" [ -f "$SESSION_DIR/tracking.md" ] && echo "Tracking exists" [ -f ai-docs/llm-performance.json ] && echo "Statistics saved"
Summary
This skill provides MANDATORY tracking infrastructure for multi-model validation:
- •Pre-Launch Checklist - 8 items to complete before launching models
- •Tracking Tables - Templates for 3-5 models and 6+ models
- •Status Updates - Per-model completion tracking
- •Failure Documentation - Required format for all failures
- •Consensus Analysis - Comparing findings across models
- •Results Template - Required output format
- •Common Failures - Prevention strategies
- •Integration Examples - Complete workflows
Key Innovation: File-based tracking marker (/tmp/.claude-multi-model-active) allows hooks to detect active tracking without relying on environment variables.
Use this skill when: Running 2+ external AI models in parallel for validation, review, or consensus analysis.
Missing tracking = INCOMPLETE validation.