AgentSkillsCN

ml-config-system

基于数据类的层次化配置系统,适用于机器学习实验。 采用单一继承与可组合组件的普通数据类。 适用于以下场景:(1) 使用Python数据类设置实验配置;(2) 为监督学习、强化学习或PPO训练流水线创建配置;(3) 扩展层次结构以适应新任务或新算法。

SKILL.md
--- frontmatter
name: ml-config-system
description: |
  A dataclass-based hierarchical configuration system for ML experiments.
  Plain dataclasses with single inheritance and composable pieces.
  Use when: (1) Setting up experiment configurations using Python dataclasses,
  (2) Creating configs for SL, RL, or PPO training pipelines,
  (3) Extending the hierarchy for new tasks or algorithms.

ML Configuration System

Plain dataclass hierarchy for ML experiments. No mixins, no magic — just inheritance and composition.

Hierarchy

code
MLBaseConfig                    (name, seed, device, output_dir)
├── SLConfig                    (epochs, batch_size, lr, optimizer, scheduler, early stopping)
├── RLConfig                    (timesteps, gamma, num_envs, normalization)
│   └── PPOConfig               (clip_epsilon, gae_lambda, entropy_coef, value_coef)
└── (your task config inherits from any of these)

Composable Pieces

Standalone dataclasses attached via fields — not part of the hierarchy:

PieceFieldsPurpose
OutputDirbase_dir, save_config, timestamp_format, subdirsTimestamped run directory
ConsoleLoggingenabled, filename, tee_to_console, separate_streamsConsole output capture
Checkpointingenabled, save_best, save_last, save_frequency, metric, mode, filenamesModel saving
TensorBoardenabled, log_dir, flush_secs, log_intervalMetric logging

Add more composable pieces as needed (e.g., WandbConfig, EvalConfig).

Default Output Directory Structure

Every experiment creates a timestamped run directory:

code
{output.base_dir}/{config.name}_{YYYYMMDD_HHMMSS}/
├── config.json           # full config snapshot
├── console.log           # captured stdout/stderr
├── checkpoints/          # model weights
│   ├── model_best.pt
│   └── model_last.pt
└── tensorboard/          # tfevents files
    └── events.out.tfevents...

Created automatically by setup_output_dir(cfg).

Files

code
ml-config-system/
    SKILL.md                    -- This file (overview + field reference)
    base_config_template.py     -- MLBaseConfig, composable pieces, helpers
    sl_config_template.py       -- SLConfig (supervised learning)
    rl_config_template.py       -- RLConfig, PPOConfig (reinforcement learning)

How to Create a Task-Specific Config

  1. Pick a parent class (SLConfig, RLConfig, PPOConfig, or MLBaseConfig)
  2. Inherit from it
  3. Add composable pieces as fields
  4. Add task-specific fields
  5. Override defaults as needed
python
from dataclasses import dataclass, field

@dataclass
class MyTaskConfig(PPOConfig):
    # Override parent defaults
    name: str = "my_task"
    total_timesteps: int = 2_000_000
    num_envs: int = 8

    # Attach composable pieces (always include output + console)
    output: OutputDir = field(default_factory=OutputDir)
    console: ConsoleLogging = field(default_factory=ConsoleLogging)
    checkpointing: Checkpointing = field(default_factory=Checkpointing)
    tensorboard: TensorBoard = field(default_factory=TensorBoard)

    # Task-specific fields
    reward_scale: float = 1.0
    use_curriculum: bool = False

Setting Up a Run

python
# Create config
cfg = MyTaskConfig(name="ppo_snake_v2")

# Set up output directory (creates timestamped dir + subdirs + saves config.json)
run_dir = setup_output_dir(cfg)
# -> output/ppo_snake_v2_20260221_143000/

# Set up console logging (captures stdout/stderr to console.log)
cleanup = setup_console_logging(cfg, run_dir)

# ... train ...

# Restore original stdout/stderr
cleanup()

Saving and Loading

python
from dataclasses import asdict

# Save (also done automatically by setup_output_dir)
save_config(cfg, "output/config.json")

# Load
cfg = load_config(MyTaskConfig, "output/config.json")

# Manual serialization
d = asdict(cfg)  # standard dataclasses.asdict

Field Reference

MLBaseConfig

FieldTypeDefaultDescription
namestr"experiment"Experiment name (used in run directory)
seedint42Random seed
devicestr"auto""auto", "cpu", "cuda", "cuda:0"
output_dirstr"output"Output directory (fallback if no OutputDir piece)

OutputDir (composable)

FieldTypeDefaultDescription
base_dirstr"output"Parent directory for all runs
save_configboolTrueSave config.json to run directory
timestamp_formatstr"%Y%m%d_%H%M%S"Timestamp format for directory naming
subdirsDict[str, str]{"tensorboard": "tensorboard", "checkpoints": "checkpoints"}Subdirectories to create

ConsoleLogging (composable)

FieldTypeDefaultDescription
enabledboolTrueEnable console capture
filenamestr"console.log"Log file name in run directory
separate_streamsboolFalseSplit stdout/stderr into separate files
stdout_filenamestr"stdout.log"Stdout file (when separate_streams=True)
stderr_filenamestr"stderr.log"Stderr file (when separate_streams=True)
tee_to_consoleboolTrueAlso print to terminal
line_timestampsboolFalsePrefix each line with timestamp
timestamp_formatstr"%H:%M:%S"Timestamp format for line prefixes
flush_frequencyint1Flush every N writes

Checkpointing (composable)

FieldTypeDefaultDescription
enabledboolTrueEnable checkpointing
save_bestboolTrueSave best model
save_lastboolTrueSave last model
save_frequencyint0Save every N epochs; 0 = disabled
metricstr"loss"Metric to track for best model
modestr"min""min" or "max"
best_filenamestr"model_best.pt"Best model filename
last_filenamestr"model_last.pt"Last model filename
epoch_filename_formatstr"model_epoch_{epoch}.pt"Periodic save filename

TensorBoard (composable)

FieldTypeDefaultDescription
enabledboolTrueEnable TensorBoard logging
log_dirstr"tensorboard"Log directory (relative to run dir)
flush_secsint120Flush interval
log_intervalint100Steps between log writes

SLConfig (extends MLBaseConfig)

FieldTypeDefaultDescription
num_epochsint100Training epochs
batch_sizeint32Batch size
learning_ratefloat1e-3Learning rate
weight_decayfloat0.0L2 regularization
optimizerstr"Adam""Adam", "AdamW", "SGD"
schedulerOptional[str]None"cosine", "linear", "step", None
scheduler_min_lrfloat1e-6Minimum LR for scheduler
grad_clip_normOptional[float]NoneMax gradient norm; None = disabled
dropoutfloat0.0Dropout rate
early_stopping_patienceint0Epochs without improvement; 0 = disabled

RLConfig (extends MLBaseConfig)

FieldTypeDefaultDescription
total_timestepsint1_000_000Total training timesteps
gammafloat0.99Discount factor
learning_ratefloat3e-4Learning rate
num_envsint1Parallel environments
normalize_obsboolFalseNormalize observations
normalize_rewardboolFalseNormalize rewards

PPOConfig (extends RLConfig)

FieldTypeDefaultDescription
frames_per_batchint2048Frames per rollout batch
num_epochsint10PPO epochs per batch
mini_batch_sizeint64Mini-batch size
clip_epsilonfloat0.2PPO clipping range
gae_lambdafloat0.95GAE lambda
normalize_advantageboolTrueNormalize advantages
value_coeffloat0.5Value loss coefficient
entropy_coeffloat0.01Entropy bonus coefficient
max_grad_normfloat0.5Gradient clipping norm
target_klOptional[float]NoneKL early stopping; None = disabled

Extending the Hierarchy

To add a new algorithm (e.g., SAC):

python
@dataclass
class SACConfig(RLConfig):
    tau: float = 0.005              # soft update coefficient
    alpha: float = 0.2              # entropy temperature
    auto_alpha: bool = True         # auto-tune alpha
    buffer_size: int = 1_000_000    # replay buffer size
    batch_size: int = 256
    learning_rate: float = 3e-4
    num_epochs: int = 1             # gradient steps per env step

To add a new composable piece:

python
@dataclass
class WandbConfig:
    enabled: bool = False
    project: str = "my-project"
    entity: Optional[str] = None
    log_interval: int = 100