ML Pipeline Reference
Training pipeline (scripts/train.py)
code
1. Load data: Prometheus (preferred) or synthetic fallback 2. Validate: min_rows >= window_size * 5 3. Preprocess: DataPreprocessor.fit_transform() -- adds temporal features + scales 4. Window: WindowGenerator.create_sequences(stride=1) -- overlapping windows 5. Split: temporal 80/20 (last 20% = validation) 6. Train: LSTM Autoencoder (30 epochs, batch_size=32, early stopping patience=10) 7. Threshold: 95th percentile of validation reconstruction errors 8. Save: weights, config JSON, preprocessor joblib, threshold npy
Preprocessing details
Scaler mode: fixed_minmax (deterministic, data-independent)
Fixed bounds from config/data.yaml:
- •request_rate: [0, 150], latency_p95: [0, 0.50], memory_usage: [0, 2B]
- •error_rate: [0, 3.0], cpu_usage: [0, 0.15]
- •Temporal features: [-1, 1] for sin/cos, [0, 1] for binary
Why not StandardScaler: StandardScaler memorizes training data distribution (mean/std). New data with different parameters produces shifted z-scores, causing 100% false positive rate. Fixed bounds eliminate this coupling.
Synthetic data formulas
Daily pattern: daily_factor = 0.5 + 0.4 * sin(2pi * (hour - 8) / 24)
| Metric | Formula | Min (2 AM) | Max (2 PM) |
|---|---|---|---|
| request_rate | 125 * factor + N(0, 3) | ~12.5 | ~112.5 |
| latency_p95 | 0.22 * factor + 0.215 + N(0, 0.015) | ~0.24 | ~0.41 |
| memory_usage | base_memory + N(0, base*0.03) | constant | constant |
| error_rate | 2.5 * factor + N(0, 0.05) | ~0.25 | ~2.25 |
| cpu_usage | 0.125 * factor + N(0, 0.005) | ~0.013 | ~0.113 |
These formulas were derived from mathematical analysis of the mock service and verified against live Prometheus queries.
Key gotchas
- •Train/inference parity: Both pipelines must use identical preprocessing. The saved
preprocessor.joblibensures this. - •Startup transient: ~9 min of false anomaly after cold start (rate()[5m] warm-up).
- •Window padding: If fewer than window_size data points, zeros are padded. This degrades detection accuracy.
- •Prometheus 11K limit: Auto-adjusted in
PrometheusClient._adjust_step_if_needed().