Notebook to Algorithm

Transforms trading strategy notebooks into production-ready code with automated validation and database design

When to Use

•Converting a backtested trading strategy from Jupyter to production code
•Need to preserve optimal parameters discovered during research
•Require validation that converted code matches notebook outputs
•Want database schema recommendations for production deployment
•Migrating from exploratory research to systematic trading

When NOT to Use

•Notebooks without trading/financial logic (use general refactoring tools)
•Live trading execution (this generates algorithms, not execution systems)
•Real-time market data integration (separate infrastructure concern)
•Strategies without clear outputs to validate against

Orchestrator

This skill coordinates the following sub_agents:

Sub-Agent	Purpose	Invoked During
`notebook-parser`	Parse .ipynb structure, extract cells and outputs	Phase 1
`strategy-extractor`	Identify trading logic, parameters, preprocessing	Phase 2
`code-generator`	Generate Python/TypeScript modules from extracted logic	Phase 3
`validation-runner`	Execute tests, compare outputs, identify discrepancies	Phase 4
`discrepancy-analyzer`	Diagnose root causes, suggest fixes for mismatches	Phase 4
`database-designer`	Recommend schema based on strategy data requirements	Phase 5

Workflows

Primary: Convert Strategy (Full)

Path: workflows/convert-strategy.md

Complete conversion with validation loop:

•Parse Notebook → Extract cells, outputs, structure
•Extract Strategy → Identify logic, parameters, preprocessing
•Generate Code → Create modular Python/TypeScript
•Validate Outputs → Run tests, compare to notebook
•Refine if Needed → Fix discrepancies, re-validate
•Design Database → Recommend production schema
•Package Deliverables → Final code, docs, mapping

Secondary: Quick Convert

Path: workflows/quick-convert.md

Fast conversion without validation loop:

•Parse and extract
•Generate code with defaults
•Skip validation (user responsibility)
•Basic database recommendations

Tertiary: Validate Only

Path: workflows/validate-only.md

For previously converted code:

•Load notebook and converted code
•Run comparison tests
•Report discrepancies
•Suggest fixes

Context Integration

Target User Profile

•Quantitative analysts converting research to production
•Solo traders automating strategies
•Trading teams standardizing workflows

Receives Context From

•User: Notebook path, target language preference
•Notebook: Trading logic, parameters, outputs

Shares Context With

•database-design: Schema recommendations
•python-development: Code generation patterns

Commands

`/notebook-to-algorithm` or `/notebook-to-algorithm:convert`

Full conversion workflow with validation.

Usage:

code

/notebook-to-algorithm path/to/strategy.ipynb
> Select target language (Python/TypeScript)
> Conversion runs with validation loop
> Receive production code + database schema

`/notebook-to-algorithm:quick <notebook-path>`

Quick conversion without validation loop.

Usage:

code

/notebook-to-algorithm:quick path/to/strategy.ipynb --lang python

`/notebook-to-algorithm:validate <notebook-path> <code-path>`

Validate existing converted code against notebook.

Usage:

code

/notebook-to-algorithm:validate strategy.ipynb generated_strategy/

`/notebook-to-algorithm:schema <notebook-path>`

Generate database schema recommendation only.

Usage:

code

/notebook-to-algorithm:schema strategy.ipynb

Implementation

Entry Point Logic

When /notebook-to-algorithm is invoked:

code

1. Parse command arguments (notebook path, options)
2. Initialize conversion state
3. Execute Phase 1: Parse notebook
4. Execute Phase 2: Extract strategy components
5. Execute Phase 3: Generate code
6. Execute Phase 4: Validation loop (max 3 iterations)
   - Run converted code
   - Compare outputs to notebook
   - If discrepancies: analyze, fix, repeat
7. Execute Phase 5: Database design
8. Package and report results

State Management

yaml

state:
  notebook_path: string
  target_language: python | typescript
  parsed_notebook: NotebookStructure | null
  extracted_strategy: StrategyComponents | null
  generated_code: GeneratedModules | null
  validation_results: ValidationReport[]
  iteration_count: 0
  max_iterations: 3
  database_schema: SchemaRecommendation | null
  errors: []

Phase 1: Parse Notebook

Dispatch to: sub_agents/notebook-parser.md

Orchestrator Actions:

•Load .ipynb file as JSON
•Extract code cells with execution order
•Extract markdown cells for documentation
•Capture cell outputs (dataframes, metrics, plots)
•Build cell dependency graph
•Identify checkpoint outputs for validation

Notebook Structure:

yaml

notebook:
  metadata:
    kernel: python3
    language: python
  cells:
    - id: cell_1
      type: code
      source: "import pandas as pd..."
      outputs: [...]
      execution_order: 1
    - id: cell_2
      type: markdown
      source: "# Strategy Parameters"
  checkpoints:
    - cell_id: cell_5
      name: "preprocessed_data"
      type: dataframe
      shape: [1000, 5]
    - cell_id: cell_8
      name: "signals"
      type: dataframe
    - cell_id: cell_12
      name: "backtest_results"
      type: dict

Output: NotebookStructure stored in state

Phase 2: Extract Strategy Components

Dispatch to: sub_agents/strategy-extractor.md

Orchestrator Actions:

•Analyze code cells for trading patterns
•Extract parameters (constants, config values)
•Identify data loading/preprocessing logic
•Identify signal generation logic
•Identify execution/backtest logic
•Flag visualization code (to exclude)
•Detect trading pitfalls (look-ahead bias, etc.)

Component Classification:

yaml

components:
  parameters:
    - name: SMA_SHORT
      value: 20
      cell_id: cell_3
      type: int
      category: indicator_param
    - name: STOP_LOSS
      value: 0.02
      cell_id: cell_3
      type: float
      category: risk_param

  data_loading:
    - cell_id: cell_1
      function: load_price_data
      inputs: [file_path]
      outputs: [df]

  preprocessing:
    - cell_id: cell_2
      function: clean_data
      inputs: [df]
      outputs: [df_clean]

  indicators:
    - cell_id: cell_4
      function: calculate_sma
      inputs: [df, period]
      outputs: [sma_series]

  signals:
    - cell_id: cell_5
      function: generate_signals
      inputs: [df, sma_short, sma_long]
      outputs: [signals_df]

  excluded:
    - cell_id: cell_10
      reason: visualization_only
    - cell_id: cell_11
      reason: exploratory_analysis

  warnings:
    - type: potential_look_ahead
      cell_id: cell_6
      description: "Uses .shift(-1), check if intentional"

Output: StrategyComponents stored in state

Phase 3: Generate Code

Dispatch to: sub_agents/code-generator.md

Orchestrator Actions:

•Select code templates based on target language
•Map extracted components to modules
•Generate module files with proper structure
•Externalize parameters to config file
•Add logging, error handling, type hints
•Generate CLI interface
•Create notebook-to-code mapping document

Generated File Structure (Python):

code

generated_strategy/
├── strategy/
│   ├── __init__.py
│   ├── signals.py          # From cells 5-7
│   ├── indicators.py       # From cell 4
│   └── execution.py        # From cells 8-9
├── data/
│   ├── __init__.py
│   ├── loader.py           # From cell 1
│   └── preprocessing.py    # From cell 2
├── config/
│   ├── parameters.yaml     # Extracted from cell 3
│   └── settings.yaml       # System config
├── tests/
│   ├── test_signals.py
│   ├── test_parity.py      # Notebook comparison tests
│   └── fixtures/
│       └── reference_outputs.pkl
├── docs/
│   └── MAPPING.md          # Cell-to-code traceability
├── main.py                 # CLI entry point
├── requirements.txt
└── pyproject.toml

Generated File Structure (TypeScript):

code

generated_strategy/
├── src/
│   ├── strategy/
│   │   ├── signals.ts
│   │   ├── indicators.ts
│   │   └── execution.ts
│   ├── data/
│   │   ├── loader.ts
│   │   └── preprocessing.ts
│   ├── config/
│   │   └── parameters.ts
│   └── index.ts
├── tests/
│   ├── signals.test.ts
│   └── parity.test.ts
├── docs/
│   └── MAPPING.md
├── package.json
└── tsconfig.json

Mapping Document Format:

markdown

# Notebook to Code Mapping

## strategy.ipynb → generated_strategy/

| Notebook Cell | Generated File | Function/Class |
|---------------|----------------|----------------|
| Cell 1 | data/loader.py:5-25 | load_price_data() |
| Cell 2 | data/preprocessing.py:10-40 | clean_data() |
| Cell 3 | config/parameters.yaml | (config values) |
| Cell 4 | strategy/indicators.py:15-35 | calculate_sma() |
| Cell 5-6 | strategy/signals.py:20-60 | SignalGenerator.generate() |
| Cell 10 | (excluded) | visualization only |

Output: GeneratedModules stored in state

Phase 4: Validation Loop

Dispatch to: sub_agents/validation-runner.md and sub_agents/discrepancy-analyzer.md

Orchestrator Actions:

code

while iteration_count < max_iterations:
    1. Run original notebook, capture checkpoint outputs
    2. Run generated code with same inputs
    3. Compare outputs at each checkpoint
    4. If all match within tolerance:
        - Mark validation passed
        - Break loop
    5. If discrepancies found:
        - Dispatch to discrepancy-analyzer
        - Receive diagnosis and fixes
        - Apply fixes to generated code
        - Increment iteration_count
        - Continue loop

Validation Report:

yaml

validation:
  iteration: 1
  status: failed | passed
  checkpoints:
    - name: preprocessed_data
      status: passed
      notebook_shape: [1000, 5]
      generated_shape: [1000, 5]

    - name: signals
      status: failed
      discrepancy:
        type: value_mismatch
        location: "row 45, column 'signal'"
        expected: 1
        actual: 0
        root_cause: "Missing .fillna(0) in indicator calculation"

    - name: backtest_results
      status: skipped
      reason: "Depends on failed checkpoint"

  fixes_applied:
    - file: strategy/indicators.py
      line: 28
      change: "Added .fillna(0) to handle NaN values"

Tolerance Settings:

python

validation_config = {
    'numeric_rtol': 1e-6,      # Relative tolerance
    'numeric_atol': 1e-8,      # Absolute tolerance
    'allow_row_reorder': False, # Strict row order
    'ignore_columns': ['timestamp'],  # Don't compare these
}

Output: ValidationReport[] stored in state

Phase 5: Database Design

Dispatch to: sub_agents/database-designer.md

Orchestrator Actions:

•Analyze data requirements from extracted components
•Identify time-series patterns
•Determine data volume expectations
•Generate schema recommendations
•Include indexing strategies
•Provide storage optimization tips

Schema Recommendation Format:

yaml

database_recommendation:
  engine: PostgreSQL + TimescaleDB
  rationale: "Time-series price data with ACID requirements"

  tables:
    - name: instruments
      purpose: "Store tradable instrument metadata"
      columns:
        - name: id
          type: SERIAL PRIMARY KEY
        - name: symbol
          type: VARCHAR(20) NOT NULL UNIQUE
        - name: instrument_type
          type: VARCHAR(50)

    - name: price_data
      purpose: "OHLCV time-series data"
      columns:
        - name: time
          type: TIMESTAMPTZ NOT NULL
        - name: instrument_id
          type: INTEGER REFERENCES instruments(id)
        - name: open
          type: NUMERIC(18, 8)
        - name: high
          type: NUMERIC(18, 8)
        - name: low
          type: NUMERIC(18, 8)
        - name: close
          type: NUMERIC(18, 8)
        - name: volume
          type: NUMERIC(24, 8)
      primary_key: [time, instrument_id]
      timescaledb:
        hypertable: true
        chunk_interval: "7 days"
        compression: true
        retention: "2 years"

    - name: strategy_parameters
      purpose: "Versioned strategy configuration"
      columns:
        - name: strategy_name
          type: VARCHAR(100)
        - name: version
          type: INTEGER
        - name: parameters
          type: JSONB
        - name: backtest_metrics
          type: JSONB

    - name: signals
      purpose: "Generated trading signals (append-only)"
      columns:
        - name: generated_at
          type: TIMESTAMPTZ
        - name: instrument_id
          type: INTEGER
        - name: signal_type
          type: VARCHAR(10)
        - name: strength
          type: NUMERIC(5, 4)

  indexes:
    - table: price_data
      columns: [instrument_id, time DESC]
      purpose: "Symbol + time range queries"

    - table: signals
      columns: [strategy_name, generated_at DESC]
      purpose: "Recent signals by strategy"

  storage_estimates:
    price_data: "~50MB per year per instrument (1-min data)"
    signals: "~10MB per year per strategy"
    total_first_year: "~500MB for 10 instruments"

  performance_tips:
    - "Use continuous aggregates for daily/weekly rollups"
    - "Enable compression for data older than 7 days"
    - "Partition signals by month if volume exceeds 10M rows"

Output: SchemaRecommendation stored in state, written to schema/database.sql

Error Handling

Error	Phase	Resolution
Invalid notebook format	1	Report specific JSON parse error
No trading logic found	2	List detected patterns, ask for guidance
Circular cell dependencies	2	Show dependency graph, suggest resolution
Validation timeout	4	Save partial results, report timeout
Max iterations reached	4	Report remaining discrepancies, manual fix needed
Unsupported data types	5	Suggest alternative schema patterns

Rollback Strategy:

yaml

rollback_actions:
  - action: preserve_notebook
    note: "Original notebook is never modified"
  - action: delete_generated
    path: generated_strategy/
    condition: "On critical failure before Phase 4"
  - action: keep_partial
    note: "Keep generated code even if validation fails"

Success Output

code

╔══════════════════════════════════════════════════════════════════════╗
║  Strategy Converted Successfully!                                      ║
╠══════════════════════════════════════════════════════════════════════╣
║                                                                        ║
║  Source: strategy.ipynb (15 cells, 342 lines)                          ║
║  Target: generated_strategy/ (Python)                                  ║
║                                                                        ║
║  Components Extracted:                                                 ║
║    • Parameters: 5 (SMA_SHORT, SMA_LONG, STOP_LOSS, ...)              ║
║    • Data functions: 2                                                 ║
║    • Indicator functions: 3                                            ║
║    • Signal functions: 2                                               ║
║    • Excluded cells: 3 (visualization)                                 ║
║                                                                        ║
║  Validation:                                                           ║
║    • Iterations: 2                                                     ║
║    • Checkpoints passed: 4/4                                           ║
║    • Output parity: VERIFIED                                           ║
║                                                                        ║
║  Files Generated:                                                      ║
║    • strategy/signals.py (89 lines)                                    ║
║    • strategy/indicators.py (45 lines)                                 ║
║    • data/loader.py (32 lines)                                         ║
║    • config/parameters.yaml                                            ║
║    • tests/test_parity.py                                              ║
║    • docs/MAPPING.md                                                   ║
║    • schema/database.sql                                               ║
║                                                                        ║
║  Warnings:                                                             ║
║    ⚠ Potential look-ahead bias in cell 6 (review recommended)         ║
║                                                                        ║
║  Next Steps:                                                           ║
║    1. Review generated code in generated_strategy/                     ║
║    2. Run: pytest tests/ -v                                            ║
║    3. Review database schema in schema/database.sql                    ║
║    4. Deploy with: python main.py --config config/parameters.yaml      ║
║                                                                        ║
╚══════════════════════════════════════════════════════════════════════╝

Trading Pitfall Detection

The skill automatically detects common issues:

Look-Ahead Bias Detection

python

patterns_to_flag = [
    r'\.shift\(-\d+\)',           # Using future data
    r'iloc\[-\d+\]',              # Accessing future rows
    r'next_.*=',                  # Variables named "next_*"
]

Overfitting Indicators

python

if sharpe_ratio > 3.0:
    warn("Sharpe > 3 may indicate overfitting")
if parameter_count > 10:
    warn("Many parameters increase overfitting risk")

Hidden State Detection

python

# Flag global variable modifications
global_mutations = detect_global_writes(cell_ast)
if global_mutations:
    warn(f"Cell modifies global state: {global_mutations}")

Configuration

Setting	Default	Override Flag
target_language	python	`--lang=typescript`
max_validation_iterations	3	`--max-iter=N`
numeric_tolerance	1e-6	`--tolerance=N`
include_tests	true	`--no-tests`
include_schema	true	`--no-schema`
verbose_mapping	false	`--verbose-map`

Orchestrator Agent

This skill has an associated orchestrator agent at .claude/agents/notebook-to-algorithm.md that coordinates the sub-agents. The orchestrator:

•Parses notebooks and extracts strategy components
•Generates Python/TypeScript code modules
•Runs validation loop with discrepancy analysis
•Designs database schema for production deployment

References

•references/CONTEXT.md - Enhanced skill context
•references/RESEARCH.md - Domain research and best practices
•sub_agents/*.md - Sub-agent documentation
•workflows/*.md - Workflow definitions

Notebook to Algorithm

When to Use

When NOT to Use

Orchestrator

Workflows

Primary: Convert Strategy (Full)

Secondary: Quick Convert

Tertiary: Validate Only

Context Integration

Target User Profile

Receives Context From

Shares Context With

Commands

/notebook-to-algorithm or /notebook-to-algorithm:convert

/notebook-to-algorithm:quick <notebook-path>

/notebook-to-algorithm:validate <notebook-path> <code-path>

/notebook-to-algorithm:schema <notebook-path>

Implementation

Entry Point Logic

State Management

Phase 1: Parse Notebook

Phase 2: Extract Strategy Components

Phase 3: Generate Code

Phase 4: Validation Loop

Phase 5: Database Design

Error Handling

Success Output

Trading Pitfall Detection

Look-Ahead Bias Detection

Overfitting Indicators

Hidden State Detection

Configuration

Orchestrator Agent

References

`/notebook-to-algorithm` or `/notebook-to-algorithm:convert`

`/notebook-to-algorithm:quick <notebook-path>`

`/notebook-to-algorithm:validate <notebook-path> <code-path>`

`/notebook-to-algorithm:schema <notebook-path>`