Notebook to Algorithm
Transforms trading strategy notebooks into production-ready code with automated validation and database design
When to Use
- •Converting a backtested trading strategy from Jupyter to production code
- •Need to preserve optimal parameters discovered during research
- •Require validation that converted code matches notebook outputs
- •Want database schema recommendations for production deployment
- •Migrating from exploratory research to systematic trading
When NOT to Use
- •Notebooks without trading/financial logic (use general refactoring tools)
- •Live trading execution (this generates algorithms, not execution systems)
- •Real-time market data integration (separate infrastructure concern)
- •Strategies without clear outputs to validate against
Orchestrator
This skill coordinates the following sub_agents:
| Sub-Agent | Purpose | Invoked During |
|---|---|---|
notebook-parser | Parse .ipynb structure, extract cells and outputs | Phase 1 |
strategy-extractor | Identify trading logic, parameters, preprocessing | Phase 2 |
code-generator | Generate Python/TypeScript modules from extracted logic | Phase 3 |
validation-runner | Execute tests, compare outputs, identify discrepancies | Phase 4 |
discrepancy-analyzer | Diagnose root causes, suggest fixes for mismatches | Phase 4 |
database-designer | Recommend schema based on strategy data requirements | Phase 5 |
Workflows
Primary: Convert Strategy (Full)
Path: workflows/convert-strategy.md
Complete conversion with validation loop:
- •Parse Notebook → Extract cells, outputs, structure
- •Extract Strategy → Identify logic, parameters, preprocessing
- •Generate Code → Create modular Python/TypeScript
- •Validate Outputs → Run tests, compare to notebook
- •Refine if Needed → Fix discrepancies, re-validate
- •Design Database → Recommend production schema
- •Package Deliverables → Final code, docs, mapping
Secondary: Quick Convert
Path: workflows/quick-convert.md
Fast conversion without validation loop:
- •Parse and extract
- •Generate code with defaults
- •Skip validation (user responsibility)
- •Basic database recommendations
Tertiary: Validate Only
Path: workflows/validate-only.md
For previously converted code:
- •Load notebook and converted code
- •Run comparison tests
- •Report discrepancies
- •Suggest fixes
Context Integration
Target User Profile
- •Quantitative analysts converting research to production
- •Solo traders automating strategies
- •Trading teams standardizing workflows
Receives Context From
- •User: Notebook path, target language preference
- •Notebook: Trading logic, parameters, outputs
Shares Context With
- •
database-design: Schema recommendations - •
python-development: Code generation patterns
Commands
/notebook-to-algorithm or /notebook-to-algorithm:convert
Full conversion workflow with validation.
Usage:
/notebook-to-algorithm path/to/strategy.ipynb > Select target language (Python/TypeScript) > Conversion runs with validation loop > Receive production code + database schema
/notebook-to-algorithm:quick <notebook-path>
Quick conversion without validation loop.
Usage:
/notebook-to-algorithm:quick path/to/strategy.ipynb --lang python
/notebook-to-algorithm:validate <notebook-path> <code-path>
Validate existing converted code against notebook.
Usage:
/notebook-to-algorithm:validate strategy.ipynb generated_strategy/
/notebook-to-algorithm:schema <notebook-path>
Generate database schema recommendation only.
Usage:
/notebook-to-algorithm:schema strategy.ipynb
Implementation
Entry Point Logic
When /notebook-to-algorithm is invoked:
1. Parse command arguments (notebook path, options) 2. Initialize conversion state 3. Execute Phase 1: Parse notebook 4. Execute Phase 2: Extract strategy components 5. Execute Phase 3: Generate code 6. Execute Phase 4: Validation loop (max 3 iterations) - Run converted code - Compare outputs to notebook - If discrepancies: analyze, fix, repeat 7. Execute Phase 5: Database design 8. Package and report results
State Management
state: notebook_path: string target_language: python | typescript parsed_notebook: NotebookStructure | null extracted_strategy: StrategyComponents | null generated_code: GeneratedModules | null validation_results: ValidationReport[] iteration_count: 0 max_iterations: 3 database_schema: SchemaRecommendation | null errors: []
Phase 1: Parse Notebook
Dispatch to: sub_agents/notebook-parser.md
Orchestrator Actions:
- •Load .ipynb file as JSON
- •Extract code cells with execution order
- •Extract markdown cells for documentation
- •Capture cell outputs (dataframes, metrics, plots)
- •Build cell dependency graph
- •Identify checkpoint outputs for validation
Notebook Structure:
notebook:
metadata:
kernel: python3
language: python
cells:
- id: cell_1
type: code
source: "import pandas as pd..."
outputs: [...]
execution_order: 1
- id: cell_2
type: markdown
source: "# Strategy Parameters"
checkpoints:
- cell_id: cell_5
name: "preprocessed_data"
type: dataframe
shape: [1000, 5]
- cell_id: cell_8
name: "signals"
type: dataframe
- cell_id: cell_12
name: "backtest_results"
type: dict
Output: NotebookStructure stored in state
Phase 2: Extract Strategy Components
Dispatch to: sub_agents/strategy-extractor.md
Orchestrator Actions:
- •Analyze code cells for trading patterns
- •Extract parameters (constants, config values)
- •Identify data loading/preprocessing logic
- •Identify signal generation logic
- •Identify execution/backtest logic
- •Flag visualization code (to exclude)
- •Detect trading pitfalls (look-ahead bias, etc.)
Component Classification:
components:
parameters:
- name: SMA_SHORT
value: 20
cell_id: cell_3
type: int
category: indicator_param
- name: STOP_LOSS
value: 0.02
cell_id: cell_3
type: float
category: risk_param
data_loading:
- cell_id: cell_1
function: load_price_data
inputs: [file_path]
outputs: [df]
preprocessing:
- cell_id: cell_2
function: clean_data
inputs: [df]
outputs: [df_clean]
indicators:
- cell_id: cell_4
function: calculate_sma
inputs: [df, period]
outputs: [sma_series]
signals:
- cell_id: cell_5
function: generate_signals
inputs: [df, sma_short, sma_long]
outputs: [signals_df]
excluded:
- cell_id: cell_10
reason: visualization_only
- cell_id: cell_11
reason: exploratory_analysis
warnings:
- type: potential_look_ahead
cell_id: cell_6
description: "Uses .shift(-1), check if intentional"
Output: StrategyComponents stored in state
Phase 3: Generate Code
Dispatch to: sub_agents/code-generator.md
Orchestrator Actions:
- •Select code templates based on target language
- •Map extracted components to modules
- •Generate module files with proper structure
- •Externalize parameters to config file
- •Add logging, error handling, type hints
- •Generate CLI interface
- •Create notebook-to-code mapping document
Generated File Structure (Python):
generated_strategy/ ├── strategy/ │ ├── __init__.py │ ├── signals.py # From cells 5-7 │ ├── indicators.py # From cell 4 │ └── execution.py # From cells 8-9 ├── data/ │ ├── __init__.py │ ├── loader.py # From cell 1 │ └── preprocessing.py # From cell 2 ├── config/ │ ├── parameters.yaml # Extracted from cell 3 │ └── settings.yaml # System config ├── tests/ │ ├── test_signals.py │ ├── test_parity.py # Notebook comparison tests │ └── fixtures/ │ └── reference_outputs.pkl ├── docs/ │ └── MAPPING.md # Cell-to-code traceability ├── main.py # CLI entry point ├── requirements.txt └── pyproject.toml
Generated File Structure (TypeScript):
generated_strategy/ ├── src/ │ ├── strategy/ │ │ ├── signals.ts │ │ ├── indicators.ts │ │ └── execution.ts │ ├── data/ │ │ ├── loader.ts │ │ └── preprocessing.ts │ ├── config/ │ │ └── parameters.ts │ └── index.ts ├── tests/ │ ├── signals.test.ts │ └── parity.test.ts ├── docs/ │ └── MAPPING.md ├── package.json └── tsconfig.json
Mapping Document Format:
# Notebook to Code Mapping ## strategy.ipynb → generated_strategy/ | Notebook Cell | Generated File | Function/Class | |---------------|----------------|----------------| | Cell 1 | data/loader.py:5-25 | load_price_data() | | Cell 2 | data/preprocessing.py:10-40 | clean_data() | | Cell 3 | config/parameters.yaml | (config values) | | Cell 4 | strategy/indicators.py:15-35 | calculate_sma() | | Cell 5-6 | strategy/signals.py:20-60 | SignalGenerator.generate() | | Cell 10 | (excluded) | visualization only |
Output: GeneratedModules stored in state
Phase 4: Validation Loop
Dispatch to: sub_agents/validation-runner.md and sub_agents/discrepancy-analyzer.md
Orchestrator Actions:
while iteration_count < max_iterations:
1. Run original notebook, capture checkpoint outputs
2. Run generated code with same inputs
3. Compare outputs at each checkpoint
4. If all match within tolerance:
- Mark validation passed
- Break loop
5. If discrepancies found:
- Dispatch to discrepancy-analyzer
- Receive diagnosis and fixes
- Apply fixes to generated code
- Increment iteration_count
- Continue loop
Validation Report:
validation:
iteration: 1
status: failed | passed
checkpoints:
- name: preprocessed_data
status: passed
notebook_shape: [1000, 5]
generated_shape: [1000, 5]
- name: signals
status: failed
discrepancy:
type: value_mismatch
location: "row 45, column 'signal'"
expected: 1
actual: 0
root_cause: "Missing .fillna(0) in indicator calculation"
- name: backtest_results
status: skipped
reason: "Depends on failed checkpoint"
fixes_applied:
- file: strategy/indicators.py
line: 28
change: "Added .fillna(0) to handle NaN values"
Tolerance Settings:
validation_config = {
'numeric_rtol': 1e-6, # Relative tolerance
'numeric_atol': 1e-8, # Absolute tolerance
'allow_row_reorder': False, # Strict row order
'ignore_columns': ['timestamp'], # Don't compare these
}
Output: ValidationReport[] stored in state
Phase 5: Database Design
Dispatch to: sub_agents/database-designer.md
Orchestrator Actions:
- •Analyze data requirements from extracted components
- •Identify time-series patterns
- •Determine data volume expectations
- •Generate schema recommendations
- •Include indexing strategies
- •Provide storage optimization tips
Schema Recommendation Format:
database_recommendation:
engine: PostgreSQL + TimescaleDB
rationale: "Time-series price data with ACID requirements"
tables:
- name: instruments
purpose: "Store tradable instrument metadata"
columns:
- name: id
type: SERIAL PRIMARY KEY
- name: symbol
type: VARCHAR(20) NOT NULL UNIQUE
- name: instrument_type
type: VARCHAR(50)
- name: price_data
purpose: "OHLCV time-series data"
columns:
- name: time
type: TIMESTAMPTZ NOT NULL
- name: instrument_id
type: INTEGER REFERENCES instruments(id)
- name: open
type: NUMERIC(18, 8)
- name: high
type: NUMERIC(18, 8)
- name: low
type: NUMERIC(18, 8)
- name: close
type: NUMERIC(18, 8)
- name: volume
type: NUMERIC(24, 8)
primary_key: [time, instrument_id]
timescaledb:
hypertable: true
chunk_interval: "7 days"
compression: true
retention: "2 years"
- name: strategy_parameters
purpose: "Versioned strategy configuration"
columns:
- name: strategy_name
type: VARCHAR(100)
- name: version
type: INTEGER
- name: parameters
type: JSONB
- name: backtest_metrics
type: JSONB
- name: signals
purpose: "Generated trading signals (append-only)"
columns:
- name: generated_at
type: TIMESTAMPTZ
- name: instrument_id
type: INTEGER
- name: signal_type
type: VARCHAR(10)
- name: strength
type: NUMERIC(5, 4)
indexes:
- table: price_data
columns: [instrument_id, time DESC]
purpose: "Symbol + time range queries"
- table: signals
columns: [strategy_name, generated_at DESC]
purpose: "Recent signals by strategy"
storage_estimates:
price_data: "~50MB per year per instrument (1-min data)"
signals: "~10MB per year per strategy"
total_first_year: "~500MB for 10 instruments"
performance_tips:
- "Use continuous aggregates for daily/weekly rollups"
- "Enable compression for data older than 7 days"
- "Partition signals by month if volume exceeds 10M rows"
Output: SchemaRecommendation stored in state, written to schema/database.sql
Error Handling
| Error | Phase | Resolution |
|---|---|---|
| Invalid notebook format | 1 | Report specific JSON parse error |
| No trading logic found | 2 | List detected patterns, ask for guidance |
| Circular cell dependencies | 2 | Show dependency graph, suggest resolution |
| Validation timeout | 4 | Save partial results, report timeout |
| Max iterations reached | 4 | Report remaining discrepancies, manual fix needed |
| Unsupported data types | 5 | Suggest alternative schema patterns |
Rollback Strategy:
rollback_actions:
- action: preserve_notebook
note: "Original notebook is never modified"
- action: delete_generated
path: generated_strategy/
condition: "On critical failure before Phase 4"
- action: keep_partial
note: "Keep generated code even if validation fails"
Success Output
╔══════════════════════════════════════════════════════════════════════╗ ║ Strategy Converted Successfully! ║ ╠══════════════════════════════════════════════════════════════════════╣ ║ ║ ║ Source: strategy.ipynb (15 cells, 342 lines) ║ ║ Target: generated_strategy/ (Python) ║ ║ ║ ║ Components Extracted: ║ ║ • Parameters: 5 (SMA_SHORT, SMA_LONG, STOP_LOSS, ...) ║ ║ • Data functions: 2 ║ ║ • Indicator functions: 3 ║ ║ • Signal functions: 2 ║ ║ • Excluded cells: 3 (visualization) ║ ║ ║ ║ Validation: ║ ║ • Iterations: 2 ║ ║ • Checkpoints passed: 4/4 ║ ║ • Output parity: VERIFIED ║ ║ ║ ║ Files Generated: ║ ║ • strategy/signals.py (89 lines) ║ ║ • strategy/indicators.py (45 lines) ║ ║ • data/loader.py (32 lines) ║ ║ • config/parameters.yaml ║ ║ • tests/test_parity.py ║ ║ • docs/MAPPING.md ║ ║ • schema/database.sql ║ ║ ║ ║ Warnings: ║ ║ ⚠ Potential look-ahead bias in cell 6 (review recommended) ║ ║ ║ ║ Next Steps: ║ ║ 1. Review generated code in generated_strategy/ ║ ║ 2. Run: pytest tests/ -v ║ ║ 3. Review database schema in schema/database.sql ║ ║ 4. Deploy with: python main.py --config config/parameters.yaml ║ ║ ║ ╚══════════════════════════════════════════════════════════════════════╝
Trading Pitfall Detection
The skill automatically detects common issues:
Look-Ahead Bias Detection
patterns_to_flag = [
r'\.shift\(-\d+\)', # Using future data
r'iloc\[-\d+\]', # Accessing future rows
r'next_.*=', # Variables named "next_*"
]
Overfitting Indicators
if sharpe_ratio > 3.0:
warn("Sharpe > 3 may indicate overfitting")
if parameter_count > 10:
warn("Many parameters increase overfitting risk")
Hidden State Detection
# Flag global variable modifications
global_mutations = detect_global_writes(cell_ast)
if global_mutations:
warn(f"Cell modifies global state: {global_mutations}")
Configuration
| Setting | Default | Override Flag |
|---|---|---|
| target_language | python | --lang=typescript |
| max_validation_iterations | 3 | --max-iter=N |
| numeric_tolerance | 1e-6 | --tolerance=N |
| include_tests | true | --no-tests |
| include_schema | true | --no-schema |
| verbose_mapping | false | --verbose-map |
Orchestrator Agent
This skill has an associated orchestrator agent at .claude/agents/notebook-to-algorithm.md that coordinates the sub-agents. The orchestrator:
- •Parses notebooks and extracts strategy components
- •Generates Python/TypeScript code modules
- •Runs validation loop with discrepancy analysis
- •Designs database schema for production deployment
References
- •
references/CONTEXT.md- Enhanced skill context - •
references/RESEARCH.md- Domain research and best practices - •
sub_agents/*.md- Sub-agent documentation - •
workflows/*.md- Workflow definitions