AgentSkillsCN

fabric-forecasting

在 Microsoft Fabric 上构建时间序列预测流水线——包括数据准备、性能剖析、聚类分析、特征工程与模型训练。在实施需求预测、训练 LightGBM/Prophet 模型、进行时间序列特征工程,或在 Fabric 上部署预测流水线时,可使用此技能。

SKILL.md
--- frontmatter
name: "fabric-forecasting"
description: 'Build time-series forecasting pipelines on Microsoft Fabric — data preparation, profiling, clustering, feature engineering, and model training. Use when implementing demand forecasting, training LightGBM/Prophet models, engineering time-series features, or deploying prediction pipelines on Fabric.'
metadata:
  author: "AgentX"
  version: "1.0.0"
  created: "2025-07-13"
  updated: "2025-07-13"
compatibility:
  languages: ["python", "pyspark", "sql"]
  frameworks: ["microsoft-fabric", "apache-spark", "lightgbm", "prophet", "optuna"]
  platforms: ["windows", "linux", "macos"]
prerequisites:
  - "Microsoft Fabric workspace with active capacity"
  - "Fabric MCP Server (ms-fabric-mcp-server)"
  - "Lakehouse with historical time-series data (12+ months recommended)"
  - "Python libraries: lightgbm, prophet, optuna, scikit-learn, plotly"

Fabric Forecasting

Time-series forecasting pipelines on Fabric — from raw data to trained models with profiling, clustering, and feature engineering.

When to Use

  • Building demand forecasting models (retail, supply chain, finance)
  • Forecasting across many series (products, stores, regions)
  • Classifying time-series patterns (regular, intermittent, lumpy, erratic)
  • Creating feature-engineered datasets for ML models
  • Training and tuning LightGBM, Prophet, or ensemble models on Fabric

Decision Tree

code
Need time-series forecasting on Fabric?
├─ Have historical data in Lakehouse?
│   ├─ Yes → Start at Phase 1 (Intake & Discovery)
│   └─ No → Use fabric-analytics skill to ingest data first
├─ Know the forecasting scenario?
│   ├─ Clear requirements → Start at Phase 2 (Scenario Interpretation)
│   └─ Need discovery → Start at Phase 1
├─ Have a customization plan?
│   └─ Yes → Start at Phase 4 (Notebook Generation)
├─ Which model?
│   ├─ Many series + external features → LightGBM ✅
│   ├─ Few series + strong seasonality → Prophet ✅
│   ├─ Intermittent demand → Specialized methods (Croston, SBA)
│   └─ Unsure → Profile data first (Phase 1-2), then decide
└─ Not forecasting?
    ├─ Ad-hoc analytics → Use fabric-analytics skill
    └─ Chat-based Q&A → Use fabric-data-agent skill

Pipeline Overview

The forecasting pipeline follows a 5-phase workflow organized as a notebook pipeline:

code
Phase 1: Intake ─→ Phase 2: Interpret ─→ Phase 3: Plan ─→ Phase 4: Notebooks ─→ Phase 5: Finalize
                                                              │
                                                    ┌────────┼────────┐────────┐────────┐
                                                   NB01    NB02    NB03    NB04    NB05
                                                   Prep   Profile Cluster Feature  Train

Phase 1: Intake & Data Discovery

Goal: Understand the data and user's forecasting scenario.

StepActionOutput
1Gather inputs (workspace, lakehouse, table)User requirements
2Discover table schema and date rangeData inventory
3Identify time column, target, ID columnsColumn mapping
4Calculate basic statistics (seasonality, trend)Data profile

Checkpoint: Present data summary, confirm column mapping.

Phase 2: Scenario Interpretation

Goal: Translate business requirements into forecasting parameters.

ParameterExamples
Forecast horizon4 weeks, 12 months, 90 days
Time granularityDaily, weekly, monthly
Number of series1 (single), 100s (multi-series), 10K+ (hierarchical)
SeasonalityWeekly (7), Monthly (30), Yearly (365)
External factorsPromotions, holidays, weather, events

Checkpoint: Confirm scenario parameters with user.

Phase 3: Customization Planning

Goal: Determine which notebook customizations are needed based on the scenario.

Customization Risk Levels

RiskTypeExamples
LowParameter substitutionColumn names, table names, horizon, date format
MediumStructural adaptationTime aggregation, lag sizes, clustering params, skip sections
HighAlgorithm/generativeModel changes, external regressors, custom metrics, new cells

Rule: Low = auto-apply. Medium = explain and confirm. High = detailed proposal + approval.

Checkpoint: Present customization plan, get approval for medium/high risk changes.

Phase 4: Notebook Generation (NB01-NB05)

Five notebooks form the pipeline, each building on the previous output:

NotebookPurposeInputOutput
NB01: Data PreparationClean, fill gaps, handle missing valuesRaw Lakehouse table{scenario}_prepared
NB02: ProfilingClassify series (regular/lumpy/erratic/intermittent)_prepared table{scenario}_profiled
NB03: ClusteringGroup similar series via K-Means_profiled table{scenario}_clustered
NB04: Feature EngineeringLags, rolling stats, calendar features_clustered table{scenario}_features
NB05: Train & TuneTrain LightGBM, tune with Optuna_features table{scenario}_forecasts

Checkpoint: After each notebook, validate output table before proceeding.

Phase 5: Finalization & Delivery

Goal: Package deliverables and deploy to Fabric.

StepAction
1Upload notebooks to Fabric workspace
2Attach default Lakehouse to each notebook
3Execute notebooks in sequence
4Validate forecast output quality
5Generate completion report

Core Concepts

Time-Series Classification

Profiling classifies each series to guide model selection:

TypeCV²ADICharacteristicsModel Approach
RegularLowLowSmooth demand, consistentLightGBM, Prophet
ErraticHighLowVolatile but frequentLightGBM with more features
LumpyHighHighSporadic and variableCroston, SBA
IntermittentLowHighInfrequent but stableCroston, TSB
  • CV² = Coefficient of Variation squared (demand variability)
  • ADI = Average Demand Interval (frequency of non-zero demand)

Feature Engineering Patterns

Feature TypeExamplesWhen to Use
Lagslag_7, lag_14, lag_28Always — capture autocorrelation
Rolling statsrolling_mean_7, rolling_std_14Always — smooth noise
Calendarday_of_week, month, is_weekendWhen weekly/monthly seasonality
Holidayis_holiday, days_to_holidayRetail, service industries
Externaltemperature, promo_flagWhen external data available
Interactionproduct_category × monthWhen patterns differ by group

Model Selection Guide

ScenarioRecommended ModelReason
Many series (100+)LightGBMScales well, handles features
Few series (1-10) with strong seasonalityProphetBuilt-in seasonality decomposition
Intermittent demandCroston / SBADesigned for sparse data
Ensemble approachLightGBM + Prophet blendBest accuracy, more complexity
Need explainabilityLightGBM (SHAP)Feature importance built-in

Notebook Conventions

Cell Organization

python
# Every notebook follows this pattern:
# 1. Title cell (markdown) with scenario name and timestamp
# 2. Import/setup cell
# 3. Configuration cell (all parameters in one place)
# 4. Processing cells with markdown explanations
# 5. Validation cells with data quality checks
# 6. Summary cell with output statistics

Customization Markers

python
# CUSTOMIZED: Changed lag window from 7 to 14 based on bi-weekly seasonality
lag_features = create_lag_features(df, lags=[7, 14, 21, 28])

# DEFAULT: Using standard clustering parameters
n_clusters = 5

Validation After Each Notebook

python
# Standard validation pattern
output_df = spark.read.table(f"{scenario}_prepared")
row_count = output_df.count()
null_count = output_df.filter(F.col(target_col).isNull()).count()
date_range = output_df.agg(F.min(date_col), F.max(date_col)).collect()[0]

print(f"✅ Output table: {scenario}_prepared")
print(f"   Rows: {row_count:,}")
print(f"   Nulls in target: {null_count}")
print(f"   Date range: {date_range[0]} to {date_range[1]}")

Livy Session Management

Same rules as fabric-analytics and fabric-data-agent:

code
1. Check for existing sessions FIRST (reuse idle sessions)
2. Create only if none exist (cold start: 3-6+ minutes)
3. Never close sessions unless explicitly requested
4. Use naming: forecasting-{scenario}-{timestamp}
5. Validate all code via Livy before including in final notebooks

Error Handling

Retry Protocol

code
Attempt 1 → Execute via Livy
  ↓ (on failure)
Attempt 2 → Diagnose error, apply fix, retry
  ↓ (on failure)
Attempt 3 → Try alternative approach
  ↓ (on failure)
Escalate to user with error details + options:
  A) Suggested fix
  B) Skip this cell and continue
  C) User provides guidance

Common Errors

ErrorCauseSolution
Not enough data< 2 full seasons of historyReduce forecast horizon or aggregate to coarser grain
Too many nullsMissing dates in time seriesFill gaps in NB01 (forward fill or interpolation)
Memory errorToo many series × featuresReduce feature set or process in batches
Optuna timeoutHyperparameter search too longReduce n_trials or use early stopping
Cluster imbalanceOne cluster gets 90% of seriesAdjust n_clusters or try different algorithm

Output Artifacts

code
run/{scenario_name}_{YYYYMMDD}/
├── Fabric 01 DataPreparation.ipynb
├── Fabric 02 ProfilingIntermittent.ipynb
├── Fabric 03 Clustering.ipynb
├── Fabric 04 FeatureEngineering.ipynb
├── Fabric 05 TrainTestSelectTune.ipynb
├── completion_report.md
└── requirements.txt (if new dependencies)

Anti-Patterns

  • Skip profiling: Treating all series identically → wrong model for intermittent data
  • Too many lags: 100+ lag features → overfitting, slow training
  • No train/test split: Evaluating on training data → inflated accuracy
  • Ignore data quality: Missing dates, duplicates → biased forecasts
  • Fixed parameters: Using defaults without tuning → suboptimal accuracy
  • No validation checkpoints: Running all notebooks blindly → catching errors too late

Boundaries

Always Do

  • Gather inputs (workspace, lakehouse, table, scenario) before starting
  • Profile data before choosing models
  • Validate output after each notebook
  • Get user approval for medium/high risk customizations
  • Generate all code as reproducible notebooks
  • Use timestamped output folders
  • Document all customization decisions in completion report

Ask First

  • Structural changes (adding/removing notebooks, changing flow)
  • Algorithm changes (swapping LightGBM for another model)
  • New dependencies not in original requirements
  • Skipping entire notebooks or major sections
  • High-risk generative customizations

Never Do

  • Proceed without required inputs
  • Save notebook code that hasn't been validated via Livy
  • Overwrite original template notebooks
  • Hardcode credentials or connection strings
  • Assume column names without schema verification
  • Skip user approval for medium/high risk changes

Reference Index

DocumentDescription
references/model-selection-guide.mdDetailed model comparison and hyperparameter tuning
references/feature-engineering-catalog.mdComplete feature engineering patterns and formulas

Asset Templates

FileDescription
assets/completion-report-template.mdCross-phase handover document template
assets/notebook-config-template.pyStandard configuration cell for all notebooks