AgentSkillsCN

fabric-data-factory-perf-remediate

诊断并解决 Microsoft Fabric 数据工厂管道的性能问题。适用于在管道运行缓慢、复制活动超时、数据流停滞、活动卡住、吞吐量偏低、容量被限流,或作业无限期排队时使用。涵盖复制活动调优(parallelCopies、DIU、ITO、分区)、通过 Monitoring Hub 和工作区监控进行管道监控、Spark 作业队列管理、容量 SKU 限制、错误码解析,以及数据流优化。关键词包括 Fabric 管道缓慢、复制活动性能、数据工厂限流、管道超时、活动卡住、TooManyRequestsForCapacity、HTTP 430、管道故障排查、数据流性能、复制并行度、智能吞吐量优化。

SKILL.md
--- frontmatter
name: fabric-data-factory-perf-remediate
description: Diagnose and resolve Microsoft Fabric Data Factory pipeline performance issues. Use when pipelines are slow, copy activities timeout, dataflows stall, activities are stuck, throughput is low, capacity is throttled, or jobs queue indefinitely. Covers copy activity tuning (parallelCopies, DIU, ITO, partitioning), pipeline monitoring via Monitoring Hub and workspace monitoring, Spark job queueing, capacity SKU limits, error code resolution, and dataflow optimization. Keywords include Fabric pipeline slow, copy activity performance, Data Factory throttling, pipeline timeout, activity stuck, TooManyRequestsForCapacity, HTTP 430, pipeline troubleshoot, dataflow performance, copy parallelism, intelligent throughput optimization.
license: Complete terms in LICENSE.txt

Microsoft Fabric Data Factory Performance remediate

Systematic approach to diagnosing and resolving performance issues in Microsoft Fabric Data Factory pipelines, copy activities, and dataflows.

When to Use This Skill

  • Pipeline execution takes longer than expected
  • Copy activities are slow or appear stuck
  • Activities show "Not Started" status for extended periods
  • Capacity throttling errors (HTTP 430, TooManyRequestsForCapacity)
  • Throughput is lower than expected for copy operations
  • Dataflow Gen2 refresh is slow or timing out
  • Pipeline monitoring shows performance degradation over time
  • Need to optimize parallelism, DIU, or partitioning settings

Prerequisites

  • Access to Microsoft Fabric workspace with Contributor or higher role
  • Familiarity with the Fabric Monitoring Hub
  • Understanding of Fabric capacity SKUs and their limits
  • PowerShell 7+ for running diagnostic scripts

Diagnostic Workflow

Step 1: Identify the Bottleneck Category

Determine which category your issue falls into:

CategorySymptomsStart Here
Copy Activity SlowLow throughput, long transfer durationcopy-activity-tuning.md
Pipeline StuckActivity shows In Progress with no movementpipeline-stuck-resolution.md
Capacity ThrottlingHTTP 430 errors, jobs queuedcapacity-throttling-guide.md
Dataflow SlowDataflow Gen2 refresh takes too longdataflow-optimization.md
Spark Job QueueJobs stuck in "Not Started" statuscapacity-throttling-guide.md

Step 2: Collect Diagnostics

Run the diagnostic script to gather baseline metrics:

powershell
./scripts/Get-FabricPipelineDiagnostics.ps1 -WorkspaceId "<guid>" -PipelineName "MyPipeline"

Or manually collect from the Monitoring Hub:

  1. Open Fabric portal and navigate to Monitoring Hub
  2. Filter by pipeline name and time range
  3. Select the run details (glasses icon) for the slow run
  4. Capture the Duration Breakdown for copy activities
  5. Note the queue time, transfer time, and pre/post-copy script duration

Step 3: Apply Targeted Fixes

Based on the bottleneck category, apply the appropriate optimization from the reference guides.

Quick Fixes for Common Issues

Copy Activity Running Slowly

  1. Set Intelligent Throughput Optimization to Maximum (or custom 4-256)
  2. Configure Degree of Copy Parallelism based on source type
  3. Enable Partition Option for SQL sources (Dynamic Range or Physical)
  4. Pre-calculate partition upper/lower bounds to avoid overhead
  5. Enable Staging when sink is Fabric Warehouse

Pipeline Activity Stuck

  1. Cancel the stuck activity and retry
  2. Check source/sink connectivity and credentials
  3. Verify Fabric capacity is not in throttled state
  4. Review if payload exceeds 896 KB limit
  5. Check for connection timeout or network interruption

Capacity Throttling (HTTP 430)

  1. Check current Spark concurrency against SKU limits
  2. Cancel unnecessary active Spark jobs via Monitoring Hub
  3. Consider upgrading to a larger capacity SKU
  4. Distribute pipeline trigger times to avoid burst load
  5. Use job queueing for non-interactive Spark workloads

Dataflow Gen2 Performance

  1. Reduce data volume with query folding and filters
  2. Avoid unnecessary data type conversions
  3. Minimize the number of transformation steps
  4. Use staging for large datasets
  5. Check for connector-specific throttling

Capacity SKU Quick Reference

SKUMax Spark CoresQueue LimitEquivalent Power BI
F2Limited4-
F4Limited4-
F8Limited8-
F16Limited16-
F32Limited32-
F64Standard64P1
F128Standard128P2
F256Standard256P3
F512Standard512P4
F1024Large1024-
F2048Large2048-
TrialP1 equivN/A (no queue)P1

Copy Activity Performance Settings Reference

SettingPropertyRangeRecommendation
Intelligent Throughput OptimizationdataIntegrationUnitsAuto, Standard (64), Balanced (128), Maximum (256), Custom (4-256)Start with Auto, increase for large datasets
Degree of Copy ParallelismparallelCopies1-256Auto for most; limit to 32 for Fabric Warehouse sink
Partition OptionSource settingsNone, Physical, Dynamic RangeUse Dynamic Range for large SQL tables
Enable StagingenableStagingtrue/falseRequired for Fabric Warehouse sink
Source Retry CountsourceRetryCountIntegerSet 2-3 for transient failures
Fault ToleranceenableSkipIncompatibleRowtrue/falseEnable for non-critical loads

Error Code Quick Reference

ErrorMeaningAction
HTTP 430Capacity compute limit reachedReduce concurrent jobs or upgrade SKU
Payload too largeActivity config exceeds 896 KBReduce parameter sizes
TooManyRequestsForCapacitySpark compute or API rate limitCancel active jobs or wait
Connection timeoutSource/sink unreachableCheck network, credentials, firewall
Deflate64 unsupportedCompression format not supportedRe-compress with deflate algorithm

Monitoring Setup

Enable workspace monitoring for ongoing performance analysis:

  1. Go to Workspace Settings > Monitoring
  2. Add a Monitoring Eventhouse and enable Log workspace activity
  3. Query the ItemJobEventLogs table with KQL for pipeline-level insights

Example KQL query for failure trends:

kql
ItemJobEventLogs
| where ItemKind == "Pipeline"
| summarize count() by JobStatus

See workspace-monitoring-setup.md for detailed configuration.

References

External Resources