AgentSkillsCN

notebook-to-algorithm

将包含交易策略的Jupyter笔记本转换为生产就绪的算法,并提供验证与数据库优化建议。

SKILL.md
--- frontmatter
name: notebook-to-algorithm
description: Converts Jupyter notebooks containing trading strategies into production-ready algorithms with validation and database recommendations
version: 1.0.0
author: liri-skills
tags: [trading, jupyter, code-generation, validation, quantitative-finance]

Notebook to Algorithm

Transforms trading strategy notebooks into production-ready code with automated validation and database design

When to Use

  • Converting a backtested trading strategy from Jupyter to production code
  • Need to preserve optimal parameters discovered during research
  • Require validation that converted code matches notebook outputs
  • Want database schema recommendations for production deployment
  • Migrating from exploratory research to systematic trading

When NOT to Use

  • Notebooks without trading/financial logic (use general refactoring tools)
  • Live trading execution (this generates algorithms, not execution systems)
  • Real-time market data integration (separate infrastructure concern)
  • Strategies without clear outputs to validate against

Orchestrator

This skill coordinates the following sub_agents:

Sub-AgentPurposeInvoked During
notebook-parserParse .ipynb structure, extract cells and outputsPhase 1
strategy-extractorIdentify trading logic, parameters, preprocessingPhase 2
code-generatorGenerate Python/TypeScript modules from extracted logicPhase 3
validation-runnerExecute tests, compare outputs, identify discrepanciesPhase 4
discrepancy-analyzerDiagnose root causes, suggest fixes for mismatchesPhase 4
database-designerRecommend schema based on strategy data requirementsPhase 5

Workflows

Primary: Convert Strategy (Full)

Path: workflows/convert-strategy.md

Complete conversion with validation loop:

  1. Parse Notebook → Extract cells, outputs, structure
  2. Extract Strategy → Identify logic, parameters, preprocessing
  3. Generate Code → Create modular Python/TypeScript
  4. Validate Outputs → Run tests, compare to notebook
  5. Refine if Needed → Fix discrepancies, re-validate
  6. Design Database → Recommend production schema
  7. Package Deliverables → Final code, docs, mapping

Secondary: Quick Convert

Path: workflows/quick-convert.md

Fast conversion without validation loop:

  1. Parse and extract
  2. Generate code with defaults
  3. Skip validation (user responsibility)
  4. Basic database recommendations

Tertiary: Validate Only

Path: workflows/validate-only.md

For previously converted code:

  1. Load notebook and converted code
  2. Run comparison tests
  3. Report discrepancies
  4. Suggest fixes

Context Integration

Target User Profile

  • Quantitative analysts converting research to production
  • Solo traders automating strategies
  • Trading teams standardizing workflows

Receives Context From

  • User: Notebook path, target language preference
  • Notebook: Trading logic, parameters, outputs

Shares Context With

  • database-design: Schema recommendations
  • python-development: Code generation patterns

Commands

/notebook-to-algorithm or /notebook-to-algorithm:convert

Full conversion workflow with validation.

Usage:

code
/notebook-to-algorithm path/to/strategy.ipynb
> Select target language (Python/TypeScript)
> Conversion runs with validation loop
> Receive production code + database schema

/notebook-to-algorithm:quick <notebook-path>

Quick conversion without validation loop.

Usage:

code
/notebook-to-algorithm:quick path/to/strategy.ipynb --lang python

/notebook-to-algorithm:validate <notebook-path> <code-path>

Validate existing converted code against notebook.

Usage:

code
/notebook-to-algorithm:validate strategy.ipynb generated_strategy/

/notebook-to-algorithm:schema <notebook-path>

Generate database schema recommendation only.

Usage:

code
/notebook-to-algorithm:schema strategy.ipynb

Implementation

Entry Point Logic

When /notebook-to-algorithm is invoked:

code
1. Parse command arguments (notebook path, options)
2. Initialize conversion state
3. Execute Phase 1: Parse notebook
4. Execute Phase 2: Extract strategy components
5. Execute Phase 3: Generate code
6. Execute Phase 4: Validation loop (max 3 iterations)
   - Run converted code
   - Compare outputs to notebook
   - If discrepancies: analyze, fix, repeat
7. Execute Phase 5: Database design
8. Package and report results

State Management

yaml
state:
  notebook_path: string
  target_language: python | typescript
  parsed_notebook: NotebookStructure | null
  extracted_strategy: StrategyComponents | null
  generated_code: GeneratedModules | null
  validation_results: ValidationReport[]
  iteration_count: 0
  max_iterations: 3
  database_schema: SchemaRecommendation | null
  errors: []

Phase 1: Parse Notebook

Dispatch to: sub_agents/notebook-parser.md

Orchestrator Actions:

  1. Load .ipynb file as JSON
  2. Extract code cells with execution order
  3. Extract markdown cells for documentation
  4. Capture cell outputs (dataframes, metrics, plots)
  5. Build cell dependency graph
  6. Identify checkpoint outputs for validation

Notebook Structure:

yaml
notebook:
  metadata:
    kernel: python3
    language: python
  cells:
    - id: cell_1
      type: code
      source: "import pandas as pd..."
      outputs: [...]
      execution_order: 1
    - id: cell_2
      type: markdown
      source: "# Strategy Parameters"
  checkpoints:
    - cell_id: cell_5
      name: "preprocessed_data"
      type: dataframe
      shape: [1000, 5]
    - cell_id: cell_8
      name: "signals"
      type: dataframe
    - cell_id: cell_12
      name: "backtest_results"
      type: dict

Output: NotebookStructure stored in state


Phase 2: Extract Strategy Components

Dispatch to: sub_agents/strategy-extractor.md

Orchestrator Actions:

  1. Analyze code cells for trading patterns
  2. Extract parameters (constants, config values)
  3. Identify data loading/preprocessing logic
  4. Identify signal generation logic
  5. Identify execution/backtest logic
  6. Flag visualization code (to exclude)
  7. Detect trading pitfalls (look-ahead bias, etc.)

Component Classification:

yaml
components:
  parameters:
    - name: SMA_SHORT
      value: 20
      cell_id: cell_3
      type: int
      category: indicator_param
    - name: STOP_LOSS
      value: 0.02
      cell_id: cell_3
      type: float
      category: risk_param

  data_loading:
    - cell_id: cell_1
      function: load_price_data
      inputs: [file_path]
      outputs: [df]

  preprocessing:
    - cell_id: cell_2
      function: clean_data
      inputs: [df]
      outputs: [df_clean]

  indicators:
    - cell_id: cell_4
      function: calculate_sma
      inputs: [df, period]
      outputs: [sma_series]

  signals:
    - cell_id: cell_5
      function: generate_signals
      inputs: [df, sma_short, sma_long]
      outputs: [signals_df]

  excluded:
    - cell_id: cell_10
      reason: visualization_only
    - cell_id: cell_11
      reason: exploratory_analysis

  warnings:
    - type: potential_look_ahead
      cell_id: cell_6
      description: "Uses .shift(-1), check if intentional"

Output: StrategyComponents stored in state


Phase 3: Generate Code

Dispatch to: sub_agents/code-generator.md

Orchestrator Actions:

  1. Select code templates based on target language
  2. Map extracted components to modules
  3. Generate module files with proper structure
  4. Externalize parameters to config file
  5. Add logging, error handling, type hints
  6. Generate CLI interface
  7. Create notebook-to-code mapping document

Generated File Structure (Python):

code
generated_strategy/
├── strategy/
│   ├── __init__.py
│   ├── signals.py          # From cells 5-7
│   ├── indicators.py       # From cell 4
│   └── execution.py        # From cells 8-9
├── data/
│   ├── __init__.py
│   ├── loader.py           # From cell 1
│   └── preprocessing.py    # From cell 2
├── config/
│   ├── parameters.yaml     # Extracted from cell 3
│   └── settings.yaml       # System config
├── tests/
│   ├── test_signals.py
│   ├── test_parity.py      # Notebook comparison tests
│   └── fixtures/
│       └── reference_outputs.pkl
├── docs/
│   └── MAPPING.md          # Cell-to-code traceability
├── main.py                 # CLI entry point
├── requirements.txt
└── pyproject.toml

Generated File Structure (TypeScript):

code
generated_strategy/
├── src/
│   ├── strategy/
│   │   ├── signals.ts
│   │   ├── indicators.ts
│   │   └── execution.ts
│   ├── data/
│   │   ├── loader.ts
│   │   └── preprocessing.ts
│   ├── config/
│   │   └── parameters.ts
│   └── index.ts
├── tests/
│   ├── signals.test.ts
│   └── parity.test.ts
├── docs/
│   └── MAPPING.md
├── package.json
└── tsconfig.json

Mapping Document Format:

markdown
# Notebook to Code Mapping

## strategy.ipynb → generated_strategy/

| Notebook Cell | Generated File | Function/Class |
|---------------|----------------|----------------|
| Cell 1 | data/loader.py:5-25 | load_price_data() |
| Cell 2 | data/preprocessing.py:10-40 | clean_data() |
| Cell 3 | config/parameters.yaml | (config values) |
| Cell 4 | strategy/indicators.py:15-35 | calculate_sma() |
| Cell 5-6 | strategy/signals.py:20-60 | SignalGenerator.generate() |
| Cell 10 | (excluded) | visualization only |

Output: GeneratedModules stored in state


Phase 4: Validation Loop

Dispatch to: sub_agents/validation-runner.md and sub_agents/discrepancy-analyzer.md

Orchestrator Actions:

code
while iteration_count < max_iterations:
    1. Run original notebook, capture checkpoint outputs
    2. Run generated code with same inputs
    3. Compare outputs at each checkpoint
    4. If all match within tolerance:
        - Mark validation passed
        - Break loop
    5. If discrepancies found:
        - Dispatch to discrepancy-analyzer
        - Receive diagnosis and fixes
        - Apply fixes to generated code
        - Increment iteration_count
        - Continue loop

Validation Report:

yaml
validation:
  iteration: 1
  status: failed | passed
  checkpoints:
    - name: preprocessed_data
      status: passed
      notebook_shape: [1000, 5]
      generated_shape: [1000, 5]

    - name: signals
      status: failed
      discrepancy:
        type: value_mismatch
        location: "row 45, column 'signal'"
        expected: 1
        actual: 0
        root_cause: "Missing .fillna(0) in indicator calculation"

    - name: backtest_results
      status: skipped
      reason: "Depends on failed checkpoint"

  fixes_applied:
    - file: strategy/indicators.py
      line: 28
      change: "Added .fillna(0) to handle NaN values"

Tolerance Settings:

python
validation_config = {
    'numeric_rtol': 1e-6,      # Relative tolerance
    'numeric_atol': 1e-8,      # Absolute tolerance
    'allow_row_reorder': False, # Strict row order
    'ignore_columns': ['timestamp'],  # Don't compare these
}

Output: ValidationReport[] stored in state


Phase 5: Database Design

Dispatch to: sub_agents/database-designer.md

Orchestrator Actions:

  1. Analyze data requirements from extracted components
  2. Identify time-series patterns
  3. Determine data volume expectations
  4. Generate schema recommendations
  5. Include indexing strategies
  6. Provide storage optimization tips

Schema Recommendation Format:

yaml
database_recommendation:
  engine: PostgreSQL + TimescaleDB
  rationale: "Time-series price data with ACID requirements"

  tables:
    - name: instruments
      purpose: "Store tradable instrument metadata"
      columns:
        - name: id
          type: SERIAL PRIMARY KEY
        - name: symbol
          type: VARCHAR(20) NOT NULL UNIQUE
        - name: instrument_type
          type: VARCHAR(50)

    - name: price_data
      purpose: "OHLCV time-series data"
      columns:
        - name: time
          type: TIMESTAMPTZ NOT NULL
        - name: instrument_id
          type: INTEGER REFERENCES instruments(id)
        - name: open
          type: NUMERIC(18, 8)
        - name: high
          type: NUMERIC(18, 8)
        - name: low
          type: NUMERIC(18, 8)
        - name: close
          type: NUMERIC(18, 8)
        - name: volume
          type: NUMERIC(24, 8)
      primary_key: [time, instrument_id]
      timescaledb:
        hypertable: true
        chunk_interval: "7 days"
        compression: true
        retention: "2 years"

    - name: strategy_parameters
      purpose: "Versioned strategy configuration"
      columns:
        - name: strategy_name
          type: VARCHAR(100)
        - name: version
          type: INTEGER
        - name: parameters
          type: JSONB
        - name: backtest_metrics
          type: JSONB

    - name: signals
      purpose: "Generated trading signals (append-only)"
      columns:
        - name: generated_at
          type: TIMESTAMPTZ
        - name: instrument_id
          type: INTEGER
        - name: signal_type
          type: VARCHAR(10)
        - name: strength
          type: NUMERIC(5, 4)

  indexes:
    - table: price_data
      columns: [instrument_id, time DESC]
      purpose: "Symbol + time range queries"

    - table: signals
      columns: [strategy_name, generated_at DESC]
      purpose: "Recent signals by strategy"

  storage_estimates:
    price_data: "~50MB per year per instrument (1-min data)"
    signals: "~10MB per year per strategy"
    total_first_year: "~500MB for 10 instruments"

  performance_tips:
    - "Use continuous aggregates for daily/weekly rollups"
    - "Enable compression for data older than 7 days"
    - "Partition signals by month if volume exceeds 10M rows"

Output: SchemaRecommendation stored in state, written to schema/database.sql


Error Handling

ErrorPhaseResolution
Invalid notebook format1Report specific JSON parse error
No trading logic found2List detected patterns, ask for guidance
Circular cell dependencies2Show dependency graph, suggest resolution
Validation timeout4Save partial results, report timeout
Max iterations reached4Report remaining discrepancies, manual fix needed
Unsupported data types5Suggest alternative schema patterns

Rollback Strategy:

yaml
rollback_actions:
  - action: preserve_notebook
    note: "Original notebook is never modified"
  - action: delete_generated
    path: generated_strategy/
    condition: "On critical failure before Phase 4"
  - action: keep_partial
    note: "Keep generated code even if validation fails"

Success Output

code
╔══════════════════════════════════════════════════════════════════════╗
║  Strategy Converted Successfully!                                      ║
╠══════════════════════════════════════════════════════════════════════╣
║                                                                        ║
║  Source: strategy.ipynb (15 cells, 342 lines)                          ║
║  Target: generated_strategy/ (Python)                                  ║
║                                                                        ║
║  Components Extracted:                                                 ║
║    • Parameters: 5 (SMA_SHORT, SMA_LONG, STOP_LOSS, ...)              ║
║    • Data functions: 2                                                 ║
║    • Indicator functions: 3                                            ║
║    • Signal functions: 2                                               ║
║    • Excluded cells: 3 (visualization)                                 ║
║                                                                        ║
║  Validation:                                                           ║
║    • Iterations: 2                                                     ║
║    • Checkpoints passed: 4/4                                           ║
║    • Output parity: VERIFIED                                           ║
║                                                                        ║
║  Files Generated:                                                      ║
║    • strategy/signals.py (89 lines)                                    ║
║    • strategy/indicators.py (45 lines)                                 ║
║    • data/loader.py (32 lines)                                         ║
║    • config/parameters.yaml                                            ║
║    • tests/test_parity.py                                              ║
║    • docs/MAPPING.md                                                   ║
║    • schema/database.sql                                               ║
║                                                                        ║
║  Warnings:                                                             ║
║    ⚠ Potential look-ahead bias in cell 6 (review recommended)         ║
║                                                                        ║
║  Next Steps:                                                           ║
║    1. Review generated code in generated_strategy/                     ║
║    2. Run: pytest tests/ -v                                            ║
║    3. Review database schema in schema/database.sql                    ║
║    4. Deploy with: python main.py --config config/parameters.yaml      ║
║                                                                        ║
╚══════════════════════════════════════════════════════════════════════╝

Trading Pitfall Detection

The skill automatically detects common issues:

Look-Ahead Bias Detection

python
patterns_to_flag = [
    r'\.shift\(-\d+\)',           # Using future data
    r'iloc\[-\d+\]',              # Accessing future rows
    r'next_.*=',                  # Variables named "next_*"
]

Overfitting Indicators

python
if sharpe_ratio > 3.0:
    warn("Sharpe > 3 may indicate overfitting")
if parameter_count > 10:
    warn("Many parameters increase overfitting risk")

Hidden State Detection

python
# Flag global variable modifications
global_mutations = detect_global_writes(cell_ast)
if global_mutations:
    warn(f"Cell modifies global state: {global_mutations}")

Configuration

SettingDefaultOverride Flag
target_languagepython--lang=typescript
max_validation_iterations3--max-iter=N
numeric_tolerance1e-6--tolerance=N
include_teststrue--no-tests
include_schematrue--no-schema
verbose_mappingfalse--verbose-map

Orchestrator Agent

This skill has an associated orchestrator agent at .claude/agents/notebook-to-algorithm.md that coordinates the sub-agents. The orchestrator:

  • Parses notebooks and extracts strategy components
  • Generates Python/TypeScript code modules
  • Runs validation loop with discrepancy analysis
  • Designs database schema for production deployment

References

  • references/CONTEXT.md - Enhanced skill context
  • references/RESEARCH.md - Domain research and best practices
  • sub_agents/*.md - Sub-agent documentation
  • workflows/*.md - Workflow definitions