AgentSkillsCN

fabric-data-agent-perf-remediate

诊断并解决 Microsoft Fabric 数据代理的性能问题,包括查询生成缓慢、容量限流(HTTP 430)、Spark 会话启动延迟、KQL/SQL/DAX 查询超时、数据源配置错误、示例查询验证失败、资源配置调优、VOrder 优化、自动调优设置,以及湖仓表维护等问题。适用于在被要求排查 Fabric 数据代理响应时间、修复代理查询准确性、调试 Operations Agent 的执行效率、解决容量 SKU 限制、优化代理的 Spark 计算,或诊断数据代理数据源连接问题时使用。

SKILL.md
--- frontmatter
name: fabric-data-agent-perf-remediate
description: >-
  Diagnose and resolve Microsoft Fabric Data Agent performance issues including slow query
  generation, capacity throttling (HTTP 430), Spark session startup delays, KQL/SQL/DAX query
  timeouts, data source misconfiguration, example query validation failures, resource profile
  tuning, VOrder optimization, autotune settings, and Lakehouse table maintenance. Use when
  asked to troubleshoot Fabric Data Agent response times, fix agent query accuracy, debug
  Operations Agent playbook performance, resolve capacity SKU limits, optimize Spark compute
  for agents, or diagnose Data Agent data source connection issues.
license: Complete terms in LICENSE.txt

Fabric Data Agent Performance remediate

Systematic toolkit for diagnosing and resolving performance issues in Microsoft Fabric Data Agents, Operations Agents, and their underlying Spark compute and data source infrastructure.

When to Use This Skill

  • Data Agent responses are slow or timing out
  • Agent-generated SQL/KQL/DAX queries return errors or produce incorrect results
  • Spark session startup takes longer than expected (>10 seconds)
  • Capacity throttling errors (HTTP 430) during agent workloads
  • Operations Agent playbooks fail to execute or trigger actions
  • Data source connections are unreliable or stale
  • Example queries fail validation against schema
  • Resource profiles need tuning for agent-specific workloads
  • Table maintenance (VOrder, bin-compaction, vacuum) is overdue
  • Autoscale Billing configuration needs optimization

Prerequisites

  • Access: Fabric workspace Contributor or Admin role
  • Tools: PowerShell 7+, Fabric REST API access (Microsoft Entra token)
  • Endpoints: https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items
  • Monitoring: Access to Fabric Admin Portal and Azure Cost Analysis
  • Optional: Azure subscription admin for SKU resizing and Autoscale Billing

Diagnostic Decision Tree

code
Agent performance issue reported
├── Slow response times?
│   ├── First query after idle? → Spark session startup (see Workflow 1)
│   ├── All queries slow? → Resource profile mismatch (see Workflow 2)
│   └── Intermittent slowness? → Capacity throttling (see Workflow 3)
├── Incorrect query results?
│   ├── Wrong tables/columns? → Data source instructions (see Workflow 4)
│   ├── SQL syntax errors? → Example query validation (see Workflow 5)
│   └── Stale data? → Lakehouse table maintenance (see Workflow 6)
├── Agent not responding?
│   ├── HTTP 430 errors? → Concurrency limits (see Workflow 3)
│   ├── Connection failures? → Data source config (see Workflow 4)
│   └── Operations Agent stuck? → Playbook/action config (see Workflow 7)
└── Cost concerns?
    ├── Unexpected charges? → Autoscale Billing audit (see Workflow 8)
    └── Over-provisioned? → SKU right-sizing (see Workflow 8)

Step-by-Step Workflows

Workflow 1: Spark Session Startup Delays

Symptoms: First query takes 2-5 minutes, subsequent queries are fast.

ScenarioExpected Startup
Default, no custom libraries5-10 seconds
Default + library dependencies35 seconds - 5 minutes
High regional traffic2-5 minutes
Private Links / Managed VNet2-5 minutes
Network security + libraries2.5-10 minutes

Resolution steps: See workflow-spark-startup.md

Workflow 2: Resource Profile Optimization

Symptoms: Consistently slow reads or writes across all agent queries.

ProfileUse CaseVOrderOptimizeWrite
writeHeavyHigh-frequency ingestion (default)DisabledDisabled
readHeavyForPBIPower BI dashboard queriesEnabledEnabled (1GB bin)
readHeavyForSparkInteractive Spark analyticsDisabledEnabled (128MB bin)
customUser-defined workload tuningConfigurableConfigurable

Resolution steps: See workflow-resource-profiles.md

Workflow 3: Capacity Throttling (HTTP 430)

Symptoms: TooManyRequestsForCapacity errors, jobs queued or rejected.

SKUSpark VCoresQueue Limit
F244
F8168
F6412864
F128256128
F256512256

Formula: 1 Capacity Unit = 2 Spark VCores

Resolution steps: See workflow-capacity-throttling.md

Workflow 4: Data Source Configuration Issues

Symptoms: Agent queries wrong tables, returns irrelevant results, or fails to connect.

Data Agents use three configuration layers that all affect query quality:

  1. Agent Instructions — Global routing rules (which data source for which topic)
  2. Data Source Instructions — Schema context, table descriptions, column details
  3. Example Queries — Few-shot SQL/KQL/DAX examples for query generation

Resolution steps: See workflow-data-source-config.md

Workflow 5: Example Query Validation

Symptoms: Agent ignores example queries, generates incorrect SQL/KQL syntax.

Key rule: The Fabric Data Agent only uses queries that contain valid SQL/KQL syntax AND match the schema of the selected tables. Queries that fail validation are silently ignored.

Resolution steps: See workflow-example-queries.md

Workflow 6: Lakehouse Table Maintenance

Symptoms: Queries over Delta tables are slow, small file problem, stale statistics.

Three maintenance operations available via REST API:

OperationPurpose
Bin-compactionConsolidate small files into optimal sizes
V-OrderOptimize Parquet layout for read performance
VacuumRemove unreferenced files older than retention

Resolution steps: See workflow-table-maintenance.md

Workflow 7: Operations Agent Debugging

Symptoms: Operations Agent not triggering actions, playbook failures.

Operations Agent definition requires: goals, instructions, at least one KustoDatabase data source, at least one PowerAutomateAction, and shouldRun: true.

Resolution steps: See workflow-operations-agent.md

Workflow 8: Autoscale Billing and SKU Right-Sizing

Symptoms: Unexpected costs, capacity contention between Spark and other workloads.

Resolution steps: See workflow-autoscale-billing.md

Available Scripts

ScriptPurpose
Get-FabricAgentDiagnostics.ps1Collect agent config, capacity state, and Spark metrics
Test-ExampleQueries.ps1Validate example queries against data source schema
Invoke-TableMaintenance.ps1Run bin-compaction, V-Order, and vacuum on Lakehouse tables

remediate Quick Reference

ErrorLikely CauseFirst Action
HTTP 430Capacity VCore limitCheck Monitoring Hub for active jobs
Query timeoutResource profile mismatchSwitch to readHeavyForSpark
Wrong columnsMissing data source instructionsUpdate schema descriptions
Ignored examplesInvalid SQL/KQL syntaxValidate with Test-ExampleQueries.ps1
2-5 min startupPrivate Links or high trafficCheck workspace networking config
Stale resultsMissing table maintenanceRun bin-compaction + V-Order
Agent not runningshouldRun: falseCheck Operations Agent definition
Autotune disabledRuntime > 1.2 or HC modeVerify Fabric Runtime version

References