AgentSkillsCN

fabric-performance-monitoring

监控并优化 Microsoft Fabric 的容量、Spark 计算以及工作负载性能。适用于在被要求检查容量利用率、诊断限流(HTTP 430)、监控 Spark VCore 消耗、分析 CU 使用情况、审查 Monitoring Hub 作业、通过 Fabric REST API 查询容量健康状况、生成性能报告、调优 Spark 资源配置、探究并发限制,或优化 Fabric SKU 规模时使用。支持 PowerShell、T-SQL 以及 REST API 的工作流。

SKILL.md
--- frontmatter
name: fabric-performance-monitoring
description: Monitor and optimize Microsoft Fabric capacity, Spark compute, and workload performance. Use when asked to check capacity utilization, diagnose throttling (HTTP 430), monitor Spark VCore consumption, analyze CU usage, review Monitoring Hub jobs, query Fabric REST APIs for capacity health, generate performance reports, tune Spark resource profiles, investigate concurrency limits, or optimize Fabric SKU sizing. Supports PowerShell, T-SQL, and REST API workflows.
license: Complete terms in LICENSE.txt

Microsoft Fabric Performance Monitoring

Toolkit for monitoring, diagnosing, and optimizing Microsoft Fabric capacity and workload performance across Spark, Data Warehouse, Lakehouse, and Pipeline workloads.

When to Use This Skill

  • Checking Fabric capacity utilization or CU consumption
  • Diagnosing throttling errors (HTTP 430 / TooManyRequestsForCapacity)
  • Monitoring Spark VCore usage and concurrency limits
  • Querying Fabric REST APIs for capacity and workspace health
  • Generating capacity performance reports
  • Tuning Spark resource profiles (readHeavy, writeHeavy, balanced)
  • Investigating job failures in the Monitoring Hub
  • Analyzing autoscale billing vs capacity-based billing
  • Reviewing background vs interactive operation patterns
  • Planning capacity SKU sizing or rightsizing

Prerequisites

  • PowerShell 7+ with Az.Fabric module installed
  • Microsoft Entra ID app registration with Fabric API permissions
  • Fabric Capacity Admin or Workspace Admin role
  • Fabric Capacity Metrics app installed (for visual monitoring)

Core Concepts

Capacity Units and Spark VCores

One Capacity Unit (CU) equals two Apache Spark VCores. Fabric capacity is shared across all workspaces assigned to it, and Spark VCores are shared among notebooks, Spark job definitions, and lakehouses within those workspaces.

Operation Types

Fabric classifies operations as interactive (on-demand, like DAX queries) or background (scheduled, like refreshes and Spark jobs). Background operations are smoothed over a 24-hour period. All Spark operations are background operations.

Throttling Behavior

When capacity is fully utilized, new Spark jobs receive HTTP 430 with TooManyRequestsForCapacity. With queueing enabled, pipeline-triggered and scheduled jobs enter a FIFO queue and retry automatically when capacity becomes available.

Capacity SKU Limits

SKUSpark VCoresQueue Limit
F244
F484
F8168
F163216
F326432
F6412864
F128256128
F256512256
F5121024512

Spark Resource Profiles

Fabric supports predefined Spark resource profiles for workload optimization. New workspaces default to writeHeavy. Available profiles: readHeavy, writeHeavy, balanced. When writeHeavy is used, VOrder is disabled by default and must be manually enabled.

Step-by-Step Workflows

Workflow 1: Capacity Health Check

Run the capacity health check script to retrieve current capacity status, SKU details, and state.

powershell
./scripts/Get-FabricCapacityHealth.ps1 -SubscriptionId "<sub-id>" -ResourceGroupName "<rg>" -CapacityName "<name>"

See capacity-health-reference.md for detailed API response schemas and interpretation guidance.

Workflow 2: Spark Concurrency Analysis

Run the Spark concurrency analyzer to check active sessions, queued jobs, and throttling status.

powershell
./scripts/Get-FabricSparkConcurrency.ps1 -WorkspaceId "<workspace-id>"

Workflow 3: Monitoring Hub Job Audit

Run the job audit script to retrieve recent job executions, durations, and failure details.

powershell
./scripts/Get-FabricJobHistory.ps1 -WorkspaceId "<workspace-id>" -HoursBack 24

Workflow 4: Generate Performance Report

Use the performance report template to query the SQL analytics endpoint for Lakehouse operation metrics, then generate a summary with the report generator.

Workflow 5: Autoscale vs Capacity Cost Analysis

See cost-analysis-reference.md for guidance on comparing autoscale billing vs capacity-based models using Azure Cost Management.

remediate

SymptomLikely CauseResolution
HTTP 430 errorsCapacity fully utilizedScale SKU, cancel idle sessions, enable queueing
Jobs stuck in queueAll VCores consumedCheck Monitoring Hub, stop idle notebooks
Slow Spark startupUsing custom pool with cold startSwitch to starter pool for quick sessions
High CU consumptionInefficient queries or unoptimized codeReview Capacity Metrics app, optimize DAX/Spark
Autoscale charges unexpectedSpark jobs billed independentlyCheck Azure Cost Analysis with Autoscale meter
VOrder disabledwriteHeavy profile activeManually enable VOrder if read performance needed

References