AgentSkillsCN

network-architecture-sizing

至关重要:Alpaca API 是获取所有 OHLCV 数据的必选项。yfinance 绝非有效的备用方案——永远不要依赖它。触发条件:(1) 任何代码尝试使用 yfinance 获取价格或成交量数据;(2) 加密货币成交量过滤器失效;(3) 检测到零成交量柱状图;(4) API 密钥配置出现问题;(5) 提出备用行为方案。

SKILL.md
--- frontmatter
name: network-architecture-sizing
description: "PPO network architecture sizing for trading models. Trigger: (1) model files are unexpectedly small/large, (2) choosing hidden_dims for training, (3) balancing model capacity vs inference speed."
author: Claude Code
date: 2025-12-18

Network Architecture Sizing - Research Notes

Experiment Overview

ItemDetails
Date2025-12-18
GoalUnderstand relationship between hidden_dims and model file size
EnvironmentGoogle Colab A100, PyTorch 2.x, NativePPOTrainer
StatusDocumented

Context

Training runs produced models at ~72 MB instead of expected ~148 MB. Investigation revealed the hidden_dims configuration determines model size, with the first layer dominating total parameter count due to multiplication with observation dimensions.

Architecture Comparison

Model Size vs Architecture

Architecturehidden_dimsLayersTotal ParamsFile SizeFirst Layer Size
Large (v2.2)(2048, 1024, 512, 256)412.6M~148 MB2048 × obs_dim
Medium (v2.3)(1024, 512, 256)36.1M~72 MB1024 × obs_dim
Small(512, 256, 128)3~1.5M~18 MB512 × obs_dim
Tiny(256, 128, 64)3~0.4M~5 MB256 × obs_dim

Why First Layer Dominates

With 53 features × 100 lookback = 5,300 input dimensions:

  • Large: 2048 × 5300 = 10.9M params (86% of network)
  • Medium: 1024 × 5300 = 5.4M params (89% of network)
  • Small: 512 × 5300 = 2.7M params (90% of network)

Key insight: The first hidden layer dimension has exponentially more impact on model size than deeper layers.

Configuration Locations

Current defaults in ppo_trainer_native.py:

FunctionGPU Tierhidden_dims
get_auto_config()H100(1024, 512, 256)
get_auto_config()A100(1024, 512, 256)
get_auto_config()high (40GB+)(512, 256, 128)
get_auto_config()medium (20-40GB)(512, 256, 128)
get_auto_config()low (<20GB)(256, 128, 64)
get_a100_config()A100-80GB(1024, 512, 256)
get_a100_config()A100-40GB(512, 256, 128)

Verified Workflow

To use larger architecture (148 MB models):

python
from alpaca_trading.gpu.ppo_trainer_native import get_auto_config

config = get_auto_config(total_timesteps=200_000_000, training_mode='production')
config.hidden_dims = (2048, 1024, 512, 256)  # Override to 4-layer large

trainer = NativePPOTrainer(env, config)

To verify model architecture before training:

python
import torch

# Check expected size
obs_dim = 5300  # 53 features × 100 lookback
hidden_dims = (2048, 1024, 512, 256)

params = obs_dim * hidden_dims[0]  # First layer
for i in range(len(hidden_dims) - 1):
    params += hidden_dims[i] * hidden_dims[i+1]
params += hidden_dims[-1] * 64 * 2  # Actor + critic heads

print(f"Expected params: {params:,}")
print(f"Expected size: ~{params * 4 * 3 / 1024 / 1024:.0f} MB")  # float32 × 3 (weights + optimizer state)

To inspect existing model:

python
import torch

ckpt = torch.load('model.pt', map_location='cpu', weights_only=False)
print(f"hidden_dims: {ckpt['config'].hidden_dims}")
print(f"Total params: {sum(v.numel() for v in ckpt['policy_state_dict'].values()):,}")

Failed Attempts (Critical)

AttemptWhy it FailedLesson Learned
Assuming all configs use same architectureDifferent GPU tiers have different defaultsAlways check hidden_dims in config before training
Only checking layer count3-layer (1024,512,256) vs 4-layer (2048,1024,512,256)First layer width matters more than depth
Not saving config with modelCouldn't reproduce trainingAlways save full config in checkpoint
Using large architecture on small GPUOOM errorsMatch architecture to available VRAM
Assuming bigger = betterOverfitting on small datasetsLarger models need more data/regularization

Performance Considerations

Larger Architecture (2048, 1024, 512, 256)

Pros:

  • Higher model capacity for complex patterns
  • Better for symbols with rich feature interactions
  • May capture longer-term dependencies

Cons:

  • 2x file size (~148 MB vs ~72 MB)
  • Slower inference (~1.5-2x)
  • Higher VRAM usage during training
  • More prone to overfitting with limited data

Smaller Architecture (1024, 512, 256)

Pros:

  • Faster inference (important for live trading)
  • Lower VRAM requirements
  • Faster training iterations
  • Better generalization on limited data

Cons:

  • May underfit complex market dynamics
  • Less capacity for feature interactions

Recommended Architecture by Use Case

Use CaseRecommended hidden_dimsRationale
Quick iteration/testing(512, 256, 128)Fast training, low memory
Standard production(1024, 512, 256)Good balance
Complex symbols (crypto)(2048, 1024, 512, 256)Higher volatility patterns
Limited training data (<1 year)(512, 256, 128)Reduce overfitting
Extended training (500M+ steps)(2048, 1024, 512, 256)Capacity for more learning

Key Insights

  • First layer width dominates model size - doubling first layer ~doubles total params
  • File size ≈ params × 12 bytes (float32 weights + Adam optimizer moments)
  • Current v2.3 defaults favor smaller models - optimized for speed over capacity
  • Architecture mismatch = inference failure - models trained with different hidden_dims are incompatible
  • Always log hidden_dims - critical for reproducibility and debugging

Diagnostic Commands

python
# Compare two model architectures
def compare_models(path1, path2):
    m1 = torch.load(path1, map_location='cpu', weights_only=False)
    m2 = torch.load(path2, map_location='cpu', weights_only=False)

    print(f"Model 1: {m1['config'].hidden_dims}")
    print(f"Model 2: {m2['config'].hidden_dims}")
    print(f"Params 1: {sum(v.numel() for v in m1['policy_state_dict'].values()):,}")
    print(f"Params 2: {sum(v.numel() for v in m2['policy_state_dict'].values()):,}")

References

  • alpaca_trading/gpu/ppo_trainer_native.py: Lines 1314, 1339, 1726, 1760
  • alpaca_trading/gpu/ppo_trainer_native.py: NativeActorCritic class (line 305)
  • CLAUDE.md: GPU Optimized Settings table