AgentSkillsCN

fabric-rest-api-perf-remediate

诊断并解决 Microsoft Fabric REST API 的性能问题,包括 HTTP 429 限流、长时间运行操作(LRO)超时、分页瓶颈,以及 API 响应时间过慢等问题。适用于在修复 Fabric API 延迟、容量限流、重试后处理、操作轮询失败、批量 API 调用优化,或通过 PowerShell 自动化 Fabric REST API 诊断时使用。涵盖 api.fabric.microsoft.com 端点性能、Entra ID 令牌获取延迟,以及 Fabric 容量 SKU 速率限制。

SKILL.md
--- frontmatter
name: fabric-rest-api-perf-remediate
description: Diagnose and resolve Microsoft Fabric REST API performance issues including HTTP 429 throttling, long running operation (LRO) timeouts, pagination bottlenecks, and slow API response times. Use when remediate Fabric API latency, capacity throttling, retry-after handling, operation polling failures, bulk API call optimization, or automating Fabric REST API diagnostics with PowerShell. Covers api.fabric.microsoft.com endpoint performance, Entra ID token acquisition delays, and Fabric capacity SKU rate limits.
license: Complete terms in LICENSE.txt

Microsoft Fabric REST API Performance remediate

Structured diagnostic workflows and automation scripts for identifying and resolving performance bottlenecks in Microsoft Fabric REST API integrations.

When to Use This Skill

  • API calls to api.fabric.microsoft.com are slow or timing out
  • Receiving HTTP 429 (Too Many Requests) responses with Retry-After headers
  • Long running operations (LRO) polling is inefficient or stalling
  • Paginated API responses are taking too long to enumerate
  • Bulk workspace or item operations exceed capacity throttle limits
  • Entra ID token acquisition is adding unexpected latency
  • Spark job submissions return HTTP 430 (TooManyRequestsForCapacity)
  • Need to benchmark Fabric REST API throughput for a given capacity SKU

Prerequisites

  • PowerShell 7+ with Invoke-RestMethod support
  • Microsoft Entra ID app registration with appropriate Fabric scopes
  • A valid Bearer token or MSAL-based authentication flow
  • Access to at least one Fabric workspace

Diagnostic Decision Tree

Determine the root cause category before applying a fix:

code
API Call Slow or Failing?
├── HTTP 429 returned?
│   ├── YES → Throttling. See §1 Throttling Diagnosis
│   └── NO  → Continue
├── HTTP 430 returned?
│   ├── YES → Capacity exhausted. See §2 Capacity Limits
│   └── NO  → Continue
├── HTTP 202 + LRO stalling?
│   ├── YES → Polling issue. See §3 LRO Optimization
│   └── NO  → Continue
├── Large result sets slow?
│   ├── YES → Pagination. See §4 Pagination Tuning
│   └── NO  → Continue
├── Token acquisition slow?
│   ├── YES → Auth latency. See §5 Token Performance
│   └── NO  → General latency. See §6 Baseline Benchmarking

§1 Throttling Diagnosis (HTTP 429)

Fabric throttles per-user, per-API within a time window. When exceeded, the API returns HTTP 429 with a Retry-After header (in seconds).

Diagnosis Steps:

  1. Run the throttle diagnostic script to measure your current request rate against throttle limits
  2. Capture Retry-After header values to understand cooldown periods
  3. Review call patterns for burst behavior vs. steady-state

Resolution Patterns:

PatternDescription
Exponential backoffRespect Retry-After, then add jitter to avoid thundering herd
Request batchingGroup related calls to reduce total API invocations
Caller isolationUse separate service principals for independent workloads
Rate limiterImplement a client-side token bucket before sending requests

Key Facts:

  • Every Fabric admin and core public API call is throttled
  • Throttle window and limits are per-user, per-API (not published explicitly)
  • The Retry-After value is in seconds (commonly 30-60s)

See throttling-deep-dive.md for implementation patterns.


§2 Capacity Rate Limits (HTTP 430)

Spark jobs and compute-bound operations have a separate throttle tied to the Fabric capacity SKU. When the max queue limit is reached, new jobs return HTTP 430.

Capacity Queue Limits:

SKUQueue Limit
F2 / F44
F88
F1616
F3232
F64 (P1)64
F128 (P2)128
F256 (P3)256
F512 (P4)512
F10241024
F20482048
TrialNot supported

Resolution:

  1. Cancel active Spark jobs via the Monitoring Hub
  2. Upgrade to a larger capacity SKU
  3. Enable optimistic job admission for higher concurrency
  4. Implement client-side queue management before submitting jobs

§3 Long Running Operation (LRO) Optimization

Many Fabric APIs return HTTP 202 Accepted with three critical headers:

  • Location — polling URL (Get Operation State endpoint)
  • x-ms-operation-id — operation GUID for constructing polling URLs
  • Retry-After — seconds to wait before first poll

Common Performance Issues:

IssueSymptomFix
Aggressive pollingHundreds of GET calls, wastes quotaHonor Retry-After, use exponential backoff
Ignoring Location headerBuilding URLs manually, missing result endpointUse Location header directly; it transitions from State to Result when complete
Not checking for resultPolling succeeds but result never fetchedAfter Succeeded status, call Get Operation Result
Missing failure handlingStuck in infinite poll loopCheck for Failed and Skipped statuses

LRO Status Values: Succeeded, Failed, Skipped, Completed

Run the LRO polling benchmark script to profile your polling efficiency.

See lro-patterns.md for complete polling implementation patterns.


§4 Pagination Tuning

Fabric paginated APIs return continuationToken and continuationUri in response bodies. Performance degrades when consuming large result sets sequentially.

Optimization Strategies:

  1. Use continuationUri directly rather than rebuilding URLs with continuationToken
  2. Process pages concurrently when downstream logic allows
  3. Implement early termination when the target item is found
  4. Cache intermediate results for retry resilience

Template: Use the pagination walker template for efficient enumeration.


§5 Token Acquisition Performance

Slow Entra ID token acquisition adds latency to every API call chain.

Diagnosis:

  1. Measure token acquisition time separately from API call time
  2. Check if tokens are being acquired per-request instead of cached
  3. Verify token lifetime and refresh logic

Optimization:

TechniqueImpact
Token cachingEliminate redundant auth round-trips
MSAL token cache serializationPersist tokens across process restarts
Certificate-based authFaster than client secret for service principals
Reduce scope requestsRequest only needed scopes per call

§6 Baseline Benchmarking

Before remediate, establish a performance baseline.

Run the baseline benchmark script to capture:

  • Token acquisition latency (ms)
  • Simple GET endpoint response time (ms)
  • Paginated enumeration throughput (items/sec)
  • LRO polling round-trip time (ms)

Compare results against expected ranges in the baseline reference.


Quick Reference: HTTP Status Codes

CodeMeaningAction
200SuccessProcess response
201Created (LRO complete)Fetch result
202Accepted (LRO started)Begin polling via Location header
400Bad requestValidate request body/parameters
401UnauthorizedRefresh token, check scopes
403ForbiddenVerify workspace/item permissions
404Not foundConfirm workspace/item IDs
429ThrottledWait Retry-After seconds, then retry
430Capacity exhaustedReduce concurrent jobs or scale SKU

remediate

ProblemLikely CauseResolution
All calls slow (>2s)Token not cachedImplement MSAL token caching
Intermittent 429sBurst patternAdd rate limiter with token bucket
LRO never completesOperation failed silentlyCheck for Failed status in poll response
Pagination returns duplicatesStale continuationTokenAlways use fresh continuationUri from latest response
430 on Spark submitCapacity queue fullCheck Monitoring Hub, scale SKU, or wait
Token acquisition >3sNetwork/DNS issueTest connectivity to login.microsoftonline.com

References