AgentSkillsCN

east-py-datascience

面向东方语言的数据科学与机器学习平台功能(TypeScript 类型)。适用于编写需要优化的东方程序(如 MADS、Optuna、SimAnneal、Scipy、Optimization、GoogleOr),或用于机器学习任务(XGBoost、LightGBM、NGBoost、Torch MLP、Lightning、GP),亦可用于机器学习相关工具库(Sklearn 的预处理、度量指标、数据集划分),以及共形预测(MAPIE)和模型可解释性(SHAP)。适用场景包括:(1) 使用 @elaraai/east-py-datascience 编写东方程序;(2) 采用 MADS 进行无导数优化;(3) 利用 Optuna 实现贝叶斯优化;(4) 通过 SimAnneal 处理离散/组合优化问题;(5) 以 XGBoost 或 LightGBM 开展梯度提升算法;(6) 借助 NGBoost 或 GP 进行概率预测;(7) 采用 Torch MLP 或 Lightning 构建神经网络;(8) 利用 Sklearn 完成数据预处理与指标计算;(9) 通过 MAPIE 生成共形预测区间;(10) 使用 Shap 探究模型可解释性;(11) 以 Optimization 实现迭代坐标下降法;(12) 依托 GoogleOr 开发约束规划、车辆路径规划、线性规划/混合整数规划,或各类图算法。

SKILL.md
--- frontmatter
name: east-py-datascience
description: "Data science and machine learning platform functions for the East language (TypeScript types). Use when writing East programs that need optimization (MADS, Optuna, SimAnneal, Scipy, Optimization, GoogleOr), machine learning (XGBoost, LightGBM, NGBoost, Torch MLP, Lightning, GP), ML utilities (Sklearn preprocessing, metrics, splits), conformal prediction (MAPIE), or model explainability (SHAP). Triggers for: (1) Writing East programs with @elaraai/east-py-datascience, (2) Derivative-free optimization with MADS, (3) Bayesian optimization with Optuna, (4) Discrete/combinatorial optimization with SimAnneal, (5) Gradient boosting with XGBoost or LightGBM, (6) Probabilistic predictions with NGBoost or GP, (7) Neural networks with Torch MLP or Lightning, (8) Data preprocessing and metrics with Sklearn, (9) Conformal prediction intervals with MAPIE, (10) Model explainability with Shap, (11) Iterative coordinate descent with Optimization, (12) Constraint programming, vehicle routing, LP/MIP, or graph algorithms with GoogleOr."

East Data Science

Data science and machine learning platform functions for the East language. Provides optimization, ML models, preprocessing, and explainability.

Quick Start

typescript
import { East, FloatType, variant } from "@elaraai/east";
import { MADS } from "@elaraai/east-py-datascience";

// Define objective function
const objective = East.function([MADS.Types.VectorType], FloatType, ($, x) => {
    const x0 = $.let(x.get(0n));
    const x1 = $.let(x.get(1n));
    return $.return(x0.multiply(x0).add(x1.multiply(x1)));
});

// Optimize
const optimize = East.function([], MADS.Types.ResultType, $ => {
    const x0 = $.let([0.5, 0.5]);
    const bounds = $.let({ lower: [-1.0, -1.0], upper: [1.0, 1.0] });
    const config = $.let({
        max_bb_eval: variant('some', 100n),
        display_degree: variant('some', 0n),
        direction_type: variant('none', null),
        initial_mesh_size: variant('none', null),
        min_mesh_size: variant('none', null),
        seed: variant('some', 42n),
    });
    return $.return(MADS.optimize(objective, x0, bounds, variant('none', null), config));
});

Decision Tree: Which Module to Use

code
Task → What do you need?
    │
    ├─ MADS (derivative-free continuous optimization)
    │   └─ .optimize()
    │
    ├─ Optuna (Bayesian hyperparameter tuning)
    │   └─ .optimize()
    │
    ├─ SimAnneal (discrete/combinatorial optimization)
    │   └─ .optimize(), .optimizePermutation(), .optimizeSubset()
    │
    ├─ ALNS (adaptive large neighborhood search)
    │   └─ .optimize([SolutionType], initial, objective, destroys, repairs, config)
    │   └─ Generic over solution type S - define your own struct
    │
    ├─ Optimization (iterative coordinate descent)
    │   └─ .iterative(objective, paramSpaces, config)
    │
    ├─ GoogleOr (Google OR-Tools)
    │   ├─ CP-SAT → .cpsatSolve(), .cpsatSolveAll()
    │   ├─ Routing → .routingSolve() (TSP, CVRP, VRPTW, VRPPD)
    │   ├─ Linear → .linearSolve() (LP, MIP)
    │   └─ Graph → .minCostFlow(), .maxFlow(), .assignment()
    │
    ├─ Scipy
    │   ├─ Optimization → .optimizeMinimize(), .optimizeMinimizeQuadratic(), .optimizeDualAnnealing()
    │   ├─ Statistics → .statsDescribe(), .statsPearsonr(), .statsSpearmanr(), .statsPercentile(), .statsIqr(), .statsMedian(), .statsMad(), .statsRobust()
    │   ├─ Curve Fitting → .curveFit()
    │   └─ Interpolation → .interpolate1dFit(), .interpolate1dPredict()
    │
    ├─ XGBoost (gradient boosting)
    │   ├─ Train → .trainRegressor(), .trainClassifier(), .trainQuantile()
    │   └─ Predict → .predict(), .predictClass(), .predictProba(), .predictQuantile()
    │
    ├─ LightGBM (fast gradient boosting)
    │   ├─ Train → .trainRegressor(), .trainClassifier()
    │   └─ Predict → .predict(), .predictClass(), .predictProba()
    │
    ├─ NGBoost (probabilistic gradient boosting)
    │   ├─ Train → .trainRegressor()
    │   └─ Predict → .predict(), .predictDist()
    │
    ├─ Torch (neural networks)
    │   ├─ Train → .mlpTrain(), .mlpTrainMulti()
    │   ├─ Predict → .mlpPredict(), .mlpPredictMulti()
    │   └─ Embeddings → .mlpEncode(), .mlpDecode()
    │
    ├─ Lightning (PyTorch Lightning neural networks)
    │   ├─ Train → .train(X, y, config, masks, group_weights, conditions)
    │   ├─ Predict → .predict(model, X, masks, conditions)
    │   ├─ Embeddings → .encode(), .decode(), .decodeConditional() (autoencoder only)
    │   ├─ Architectures:
    │   │   ├─ mlp: simple feedforward
    │   │   ├─ autoencoder: encoder → latent → decoder
    │   │   ├─ conv1d: 1D convolutional autoencoder (temporal)
    │   │   ├─ sequential: LSTM/GRU autoencoder (temporal)
    │   │   └─ transformer: attention-based autoencoder (temporal)
    │   ├─ Output modes:
    │   │   ├─ regression: MSE loss
    │   │   ├─ binary: BCE loss, per-position pos_weights (VectorType), masks
    │   │   └─ multi_head: N independent CE heads, per-head class_weights, masks
    │   ├─ Conditional generation: condition_dim in temporal architectures
    │   └─ Features: early stopping, gradient clipping, epoch callbacks, group_weights
    │
    ├─ GP (Gaussian Process regression)
    │   ├─ Train → .train()
    │   └─ Predict → .predict(), .predictStd()
    │
    ├─ MAPIE (conformal prediction intervals)
    │   ├─ Regression → .trainConformalRegressor(), .trainCQR()
    │   ├─ Classification → .trainConformalClassifier()
    │   ├─ Predict → .predictInterval(), .predictSet()
    │   └─ SHAP integration → .uncertaintyPredictorRegressor(), .uncertaintyPredictorClassifier()
    │
    ├─ Sklearn (preprocessing & metrics)
    │   ├─ Splitting (with stratification and rare class filtering) → .trainTestSplit(), .trainValTestSplit()
    │   ├─ Scaling → .standardScalerFit/Transform(), .minMaxScalerFit/Transform(), .robustScalerFit/Transform()
    │   ├─ Encoding → .labelEncoderFit/Transform/InverseTransform(), .ordinalEncoderFit/Transform()
    │   ├─ Class weights → .computeClassWeight()
    │   ├─ Regression metrics → .computeMetrics(), .computeMetricsMulti()
    │   ├─ Classification metrics → .computeClassificationMetrics(), .computeClassificationMetricsMulti()
    │   ├─ Probability metrics → .rocAucScore(), .logLoss(), .confusionMatrix()
    │   └─ Multi-target → .regressorChainTrain(), .regressorChainPredict()
    │
    └─ Shap (model explainability)
        ├─ Create → .treeExplainerCreate() (XGBoost only), .kernelExplainerCreate() (any model)
        ├─ Compute → .computeValues(), .featureImportance()
        └─ Supports → TreeExplainer: XGBoost; KernelExplainer: XGBoost, LightGBM, NGBoost, GP, Torch, RegressorChain, MAPIE

Common Types

TypeDefinitionDescription
VectorTypeArrayType(FloatType)1D array of floats (e.g., [1.0, 2.0, 3.0])
MatrixTypeArrayType(ArrayType(FloatType))2D array of floats (e.g., [[1.0, 2.0], [3.0, 4.0]])
LabelVectorTypeArrayType(IntegerType)Class labels as integers (e.g., [0n, 1n, 0n, 2n])
ModelBlobTypeBlobTypeSerialized model (opaque, pass to predict functions)

Reference Documentation

  • API Reference - Complete function signatures, types, and config options
  • Examples - Working code examples by use case

Available Modules

ModuleImportPurpose
MADSimport { MADS } from "@elaraai/east-py-datascience"Derivative-free blackbox optimization
Optunaimport { Optuna } from "@elaraai/east-py-datascience"Bayesian optimization (hyperparameter tuning)
SimAnnealimport { SimAnneal } from "@elaraai/east-py-datascience"Simulated annealing (permutation/subset)
ALNSimport { ALNS } from "@elaraai/east-py-datascience"Adaptive Large Neighborhood Search (generic over solution type)
Scipyimport { Scipy } from "@elaraai/east-py-datascience"Statistics, optimization, interpolation
XGBoostimport { XGBoost } from "@elaraai/east-py-datascience"Gradient boosting (regression/classification/quantile)
LightGBMimport { LightGBM } from "@elaraai/east-py-datascience"Fast gradient boosting
NGBoostimport { NGBoost } from "@elaraai/east-py-datascience"Probabilistic gradient boosting
Torchimport { Torch } from "@elaraai/east-py-datascience"Neural networks (MLP)
Lightningimport { Lightning } from "@elaraai/east-py-datascience"PyTorch Lightning neural networks
GPimport { GP } from "@elaraai/east-py-datascience"Gaussian Process regression
MAPIEimport { MAPIE } from "@elaraai/east-py-datascience"Conformal prediction intervals
Sklearnimport { Sklearn } from "@elaraai/east-py-datascience"Preprocessing, metrics, data splitting
Shapimport { Shap } from "@elaraai/east-py-datascience"Model explainability (SHAP values)
Optimizationimport { Optimization } from "@elaraai/east-py-datascience"Iterative coordinate descent optimization
GoogleOrimport { GoogleOr } from "@elaraai/east-py-datascience"OR-Tools: CP-SAT, routing, LP/MIP, graph algorithms

Accessing Types

typescript
import { MADS, Optuna, Sklearn, XGBoost, ALNS } from "@elaraai/east-py-datascience";

// Access types via Module.Types.TypeName
MADS.Types.VectorType          // ArrayType(FloatType)
MADS.Types.BoundsType          // StructType({ lower, upper })
MADS.Types.ResultType          // StructType({ x_best, f_best, ... })

Optuna.Types.ParamSpaceType    // Parameter definition
Optuna.Types.StudyResultType   // Optimization result

ALNS.Types.ConfigType          // ALNS configuration
ALNS.Types.ResultType          // Result with "S" placeholder for solution type

Sklearn.Types.SplitConfigType  // Train/test split config
XGBoost.Types.ModelBlobType    // Trained model

Common Patterns

Train and Predict

typescript
// 1. Prepare data
const X = $.let([[...], [...], ...]);
const y = $.let([...]);

// 2. Configure and train
const config = $.let({ /* options with variant('some', value) or variant('none', null) */ });
const model = $.let(Module.train(X, y, config));

// 3. Predict
const predictions = $.let(Module.predict(model, X_test));

Optimization

typescript
// 1. Define objective function
const objective = East.function([VectorType], FloatType, ($, x) => {
    // compute and return objective value
});

// 2. Set bounds and config
const bounds = $.let({ lower: [...], upper: [...] });
const config = $.let({ /* options */ });

// 3. Optimize
const result = $.let(Module.optimize(objective, x0, bounds, config));
// result.x_best, result.f_best