QuantConnect Validation Skill (Phase 5)
Purpose: Walk-forward validation for Phase 5 robustness testing before deployment.
Progressive Disclosure: This primer contains essentials only. Full details available via qc_validate.py docs command.
When to Use This Skill
Load when:
- •Running
/qc-validatecommand - •Testing out-of-sample performance
- •Evaluating strategy robustness
- •Making deployment decisions
Tool: Use python SCRIPTS/qc_validate.py for walk-forward validation
Walk-Forward Validation Overview
Purpose: Detect overfitting and ensure strategy generalizes to new data.
Approach:
- •Training (in-sample): Develop/optimize on 80% of data
- •Testing (out-of-sample): Validate on remaining 20%
- •Compare: Measure performance degradation
Example (5-year backtest 2019-2023):
- •In-sample: 2019-2022 (4 years) - Training period
- •Out-of-sample: 2023 (1 year) - Testing period
Key Metrics
1. Performance Degradation
Formula: (IS Sharpe - OOS Sharpe) / IS Sharpe
| Degradation | Quality | Decision |
|---|---|---|
| < 15% | Excellent | Deploy with confidence |
| 15-30% | Acceptable | Deploy but monitor |
| 30-40% | Concerning | Escalate to human |
| > 40% | Severe | Abandon (overfit) |
Key Insight: < 15% degradation indicates robust strategy that generalizes well.
2. Robustness Score
Formula: OOS Sharpe / IS Sharpe
| Score | Quality | Interpretation |
|---|---|---|
| > 0.75 | High | Strategy robust across periods |
| 0.60-0.75 | Moderate | Acceptable but monitor |
| < 0.60 | Low | Strategy unstable |
Key Insight: > 0.75 indicates strategy maintains performance out-of-sample.
Quick Usage
Run Walk-Forward Validation
# From hypothesis directory with iteration_state.json python SCRIPTS/qc_validate.py run --strategy strategy.py # Custom split ratio (default 80/20) python SCRIPTS/qc_validate.py run --strategy strategy.py --split 0.70
What it does:
- •Reads project_id from
iteration_state.json - •Splits date range (80/20 default)
- •Runs in-sample backtest
- •Runs out-of-sample backtest
- •Calculates degradation and robustness
- •Saves results to
PROJECT_LOGS/validation_result.json
Analyze Results
python SCRIPTS/qc_validate.py analyze --results PROJECT_LOGS/validation_result.json
Output:
- •Performance comparison table
- •Degradation percentage
- •Robustness assessment
- •Deployment recommendation
Decision Integration
After validation, the decision framework evaluates:
DEPLOY_STRATEGY (Deploy with confidence):
- •Degradation < 15% AND
- •Robustness > 0.75 AND
- •OOS Sharpe > 0.7
PROCEED_WITH_CAUTION (Deploy but monitor):
- •Degradation < 30% AND
- •Robustness > 0.60 AND
- •OOS Sharpe > 0.5
ABANDON_HYPOTHESIS (Too unstable):
- •Degradation > 40% OR
- •Robustness < 0.5 OR
- •OOS Sharpe < 0
ESCALATE_TO_HUMAN (Borderline):
- •Results don't clearly fit above criteria
Best Practices
1. Time Splits
- •Standard: 80/20 (4 years training, 1 year testing)
- •Conservative: 70/30 (more OOS testing)
- •Very Conservative: 60/40 (extensive testing)
Minimum OOS period: 6 months (1 year preferred)
2. Never Peek at Out-of-Sample
CRITICAL RULE: Never adjust strategy based on OOS results.
- •OOS is for testing only
- •Adjusting based on OOS defeats validation purpose
- •If you adjust, OOS becomes in-sample
3. Check Trade Count
Both periods need sufficient trades:
- •In-sample: Minimum 30 trades (50+ preferred)
- •Out-of-sample: Minimum 10 trades (20+ preferred)
Too few trades = unreliable validation.
4. Compare Multiple Metrics
Don't just look at Sharpe:
- •Sharpe Ratio degradation
- •Max Drawdown increase
- •Win Rate change
- •Profit Factor degradation
- •Trade Count consistency
All metrics should degrade similarly for robust strategy.
Common Issues
Severe Degradation (> 40%)
Cause: Strategy overfit to in-sample period
Example:
- •IS Sharpe: 1.5 → OOS Sharpe: 0.6
- •Degradation: 60%
Decision: ABANDON_HYPOTHESIS
Fix for next hypothesis: Simplify (fewer parameters), longer training period
Different Market Regimes
Cause: IS was bull market, OOS was bear market
Example:
- •2019-2022 (bull): Sharpe 1.2
- •2023 (bear): Sharpe -0.3
Decision: Not necessarily overfit, but not robust across regimes
Fix: Test across multiple regimes, add regime detection
Low Trade Count in OOS
Cause: Strategy stops trading in OOS period
Example:
- •IS: 120 trades → OOS: 3 trades
Decision: ESCALATE_TO_HUMAN (insufficient OOS data)
Integration with /qc-validate
The /qc-validate command workflow:
- •Read
iteration_state.jsonfor project_id and parameters - •Load this skill for validation approach
- •Modify strategy for time splits (80/20)
- •Run in-sample and OOS backtests
- •Calculate degradation and robustness
- •Evaluate using decision framework
- •Update
iteration_state.jsonwith results - •Git commit with validation summary
Reference Documentation
Need implementation details? All reference documentation accessible via --help:
python SCRIPTS/qc_validate.py --help
That's the only way to access complete reference documentation.
Topics covered in --help:
- •Walk-forward validation methodology
- •Performance degradation thresholds
- •Monte Carlo validation techniques
- •PSR/DSR statistical metrics
- •Common errors and fixes
- •Phase 5 decision criteria
The primer above covers 90% of use cases. Use --help for edge cases and detailed analysis.
Related Skills
- •quantconnect - Core strategy development
- •quantconnect-backtest - Phase 3 backtesting (qc_backtest.py:**)
- •quantconnect-optimization - Phase 4 optimization (qc_optimize.py:**)
- •decision-framework - Decision thresholds
- •backtesting-analysis - Metric interpretation
Key Principles
- •OOS is sacred - Never adjust strategy based on OOS results
- •Degradation < 15% is excellent - Strategy generalizes well
- •Robustness > 0.75 is target - Maintains performance OOS
- •Trade count matters - Need sufficient trades in both periods
- •Multiple metrics - All should degrade similarly for robustness
Example Decision
In-Sample (2019-2022): Sharpe: 0.97, Drawdown: 18%, Trades: 142 Out-of-Sample (2023): Sharpe: 0.89, Drawdown: 22%, Trades: 38 Degradation: 8.2% (< 15%) Robustness: 0.92 (> 0.75) → DEPLOY_STRATEGY (minimal degradation, high robustness)
Version: 2.0.0 (Progressive Disclosure) Last Updated: November 13, 2025 Lines: ~190 (was 463) Context Reduction: 59%