Microsoft Fabric Spark Compute remediate

Structured diagnostic workflows for resolving Apache Spark compute issues in Microsoft Fabric Data Engineering and Data Science workloads.

When to Use This Skill

•Spark jobs fail with HTTP 430 throttling or TooManyRequestsForCapacity errors
•Notebook or Spark job sessions are slow to start (>30 seconds)
•Environment publishing fails or hangs
•Autoscale is not scaling up or down as expected
•Jobs are queued indefinitely or expiring after 24 hours
•Custom pool creation fails or pools are undersized
•Library installation causes session startup delays
•Capacity appears exhausted despite low job counts
•VNet/Private Link provisioning adds unexpected delays
•Burst factor or job-level bursting behavior is unclear

Prerequisites

•Workspace Admin or Capacity Admin role in Microsoft Fabric
•Access to the Monitoring Hub for active Spark sessions
•Access to Workspace Settings > Data Engineering/Science
•Knowledge of current Fabric capacity SKU (F2 through F2048)

Quick Diagnostic: Identify Your Issue

Start here. Match your symptom to a diagnostic path:

Symptom	Diagnostic Path
HTTP 430 error	See Throttling and Concurrency
Jobs stuck in queue	See Throttling and Concurrency
Slow session startup	See Session and Environment
Environment publish fails	See Session and Environment
Autoscale not working	See Pool Configuration
Pool sizing questions	See Pool Configuration
Library conflicts	See Session and Environment
VNet delay on first job	See Session and Environment

Core Concepts

Capacity Unit to VCore Mapping

Every Fabric capacity SKU provides Spark VCores at a fixed ratio with a 3x burst factor:

1 Capacity Unit = 2 Spark VCores

For an F64 SKU: 64 CU x 2 = 128 base VCores, with 3x burst = 384 max Spark VCores.

Job Admission Model

Fabric Spark uses optimistic job admission: jobs are admitted based on their minimum core requirement (determined by the pool's minimum node setting). Jobs start with minimum nodes and scale up toward maximum nodes as cores become available. If no cores are available for the minimum requirement, the job is rejected or queued.

Two Pool Types

•Starter Pools: Pre-warmed, medium nodes only, 5-10 second startup, always available
•Custom Pools: User-configured node sizes (Small through XX-Large), 2-5 minute cold start, full flexibility

remediate Workflows

Workflow 1: HTTP 430 Throttling

•Confirm the error: HTTP Response code 430: This Spark job can't be run because you have hit a Spark compute or API rate limit
•Open the Monitoring Hub and count active Spark sessions
•Calculate your capacity's max VCores: SKU CU × 2 × 3 (burst) = max VCores
•Compare active usage against max VCores
•Resolve by canceling idle sessions, upgrading SKU, or enabling job queueing for pipeline/scheduler jobs

See throttling-and-concurrency.md for the full SKU limits table and queue configuration.

Workflow 2: Slow Session Startup

•Determine pool type (Starter vs Custom)
•If Starter Pool with no custom libraries: expect 5-10 seconds; if slower, check capacity utilization
•If custom libraries or Spark properties are attached via environment: expect 30 seconds to 5 minutes
•If using non-Medium node size: Starter Pool fast-start is unavailable; expect 2-5 minutes (on-demand)
•If Private Link is enabled and this is the first job: expect 10-15 minute VNet provisioning delay

See session-and-environment.md for detailed diagnosis.

Workflow 3: Environment Publishing Failure

•Check if another Publish action is already in progress (only one at a time)
•Verify library compatibility with the selected Spark runtime version
•If runtime was recently changed, remove incompatible libraries and republish
•If Private Link is enabled, the first publish may trigger VNet provisioning (10-15 min delay)
•Review the error notification for specific failure details

See session-and-environment.md for resolution steps.

Available Scripts

Run the Spark capacity calculator to quickly determine VCore limits, max nodes, and queue limits for any Fabric SKU.

powershell

# Calculate capacity for F64 SKU
./scripts/Get-FabricSparkCapacity.ps1 -SkuSize 64

# Compare multiple SKUs
./scripts/Get-FabricSparkCapacity.ps1 -SkuSize 64 -CompareWith 128,256

Key Decision Points

When to Use Starter Pools vs Custom Pools

Use Starter Pools when: you need fast startup (<10s), workloads fit Medium nodes (8 VCores, 64 GB), and you have no heavy library dependencies.

Use Custom Pools when: you need Large/X-Large/XX-Large nodes for memory-intensive workloads, you need precise control over min/max node counts, or you need to limit autoscale behavior.

When to Enable Job-Level Bursting

Enable (default) when: you run large batch jobs that benefit from consuming all available burst VCores and concurrency is low.

Disable when: you have a multi-tenant environment with many concurrent users and fairness across teams matters more than single-job throughput.

Admin Portal path: Capacity Settings > Data Engineering/Science > Disable Job-Level Bursting toggle.

References

•Throttling and Concurrency Guide — SKU limits, queue sizes, HTTP 430 resolution
•Session and Environment Guide — Startup delays, publishing, libraries, VNet
•Pool Configuration Guide — Node sizing, autoscale, custom pool setup, billing