AgentSkillsCN

fabric-network-remediate

诊断并解决 Microsoft Fabric 网络性能问题,包括连接失败、延迟、私有端点、托管 VNet、出站访问保护、网关诊断、OneLake 端点路由、服务标签配置、防火墙白名单、DNS 解析,以及 Spark 会话启动延迟等问题。适用于在修复 Fabric 网络、解决 Spark 作业运行缓慢、连接超时、私有链接错误、托管私有端点审批,或容量限流时使用。

SKILL.md
--- frontmatter
name: fabric-network-remediate
description: Diagnose and resolve Microsoft Fabric network performance issues including connectivity failures, latency, private endpoints, managed VNets, outbound access protection, gateway diagnostics, OneLake endpoint routing, service tag configuration, firewall allowlisting, DNS resolution, and Spark session startup delays. Use when remediate Fabric networking, slow Spark jobs, connection timeouts, private link errors, managed private endpoint approval, or capacity throttling.
license: Complete terms in LICENSE.txt

Microsoft Fabric Network Performance remediate

Systematic toolkit for diagnosing and resolving network performance issues across Microsoft Fabric workloads including Spark, OneLake, Data Warehouse, Pipelines, and Dataflows.

When to Use This Skill

  • Fabric Spark sessions take longer than expected to start (>10 seconds)
  • Connection timeouts to external data sources from notebooks or pipelines
  • Managed private endpoint status shows Pending or Failed
  • DNS resolution returns public IPs instead of private IPs
  • Outbound access protection blocks required dependencies (PyPI, Conda)
  • On-premises data gateway connectivity failures
  • OneLake API calls returning 403 or timeout errors
  • Capacity throttling errors (HTTP 430)
  • Dataflow Gen2 staging failures behind firewalls
  • Cross-workspace environment attachment failures due to network mismatch

Prerequisites

  • PowerShell 7+ with Az module installed (Install-Module Az -Scope CurrentUser)
  • Fabric Admin or Workspace Admin role for network configuration changes
  • Azure portal access for Private Link Service and DNS zone management
  • Network access to run nslookup, Test-NetConnection, and Resolve-DnsName

Step-by-Step Workflows

Workflow 1: Diagnose Spark Session Startup Delays

Spark startup times vary based on networking configuration. Consult the reference table:

ScenarioTypical Startup Time
Default settings, no libraries5-10 seconds
Default settings + library dependencies5-10 sec + 30 sec-5 min
High traffic in region, no libraries2-5 minutes
High traffic + library dependencies2-5 min + 30 sec-5 min
Network security (Private Links/VNet)2-5 minutes
Network security + library dependencies2-5 min + 30 sec-5 min

Run the diagnostic script for automated assessment:

powershell
.\scripts\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType SparkStartup

When Private Links or Managed VNets are enabled, Starter Pools are unavailable and Fabric must create clusters on demand, adding 2-5 minutes to session start time.

Workflow 2: Validate Managed Private Endpoint Connectivity

  1. Navigate to Fabric workspace Settings > Network security
  2. Under Managed private endpoints, verify Status shows Approved
  3. If Pending or Failed, see private-endpoint-remediate.md
  4. Validate DNS routing from a Fabric Notebook:
bash
nslookup sqlserver.corp.contoso.com

Confirm the returned IP is a private range (10.x.x.x or 172.x.x.x), not public.

  1. Run the automated validation:
powershell
.\scripts\Test-FabricNetworkHealth.ps1 -WorkspaceId "<workspace-id>" -CheckType PrivateEndpoint

Workflow 3: Configure Firewall Allowlisting

Fabric requires specific endpoints and service tags. Run the firewall audit script:

powershell
.\scripts\Test-FabricNetworkHealth.ps1 -CheckType FirewallEndpoints

For the complete endpoint reference, see firewall-endpoints.md.

Key service tags for Azure Firewall / NSG rules:

TagPurposeDirection
Power BIFabric core servicesBoth
DataFactoryPipeline operationsBoth
PowerQueryOnlineDataflow processingBoth
SQLWarehouse connectivityOutbound
EventHubReal-Time AnalyticsOutbound
KustoAnalyticsReal-Time AnalyticsBoth

Workflow 4: Troubleshoot Outbound Access Protection

When outbound access protection is enabled, public repositories (PyPI, Conda) are blocked. To install libraries in secured environments:

  1. Prepare a requirements.txt on a machine with internet access
  2. Download packages and dependencies using pip:
bash
pip download -r requirements.txt -d ./packages
  1. Upload packages as custom libraries in the Fabric Environment
  2. See outbound-access-guide.md for detailed steps

Workflow 5: Resolve Capacity Throttling (HTTP 430)

When all Spark VCores are consumed, new jobs receive HTTP 430 errors. Formula: 1 Capacity Unit = 2 Spark VCores.

  1. Check current utilization in the Monitoring Hub
  2. Cancel idle or stuck Spark sessions
  3. Consider upgrading capacity SKU if sustained
  4. Enable queueing for pipeline and Spark Job Definition workloads

For queue limits by SKU, see capacity-throttling.md.

remediate Quick Reference

SymptomLikely CauseFirst Action
Spark startup >2 minPrivate Link/VNet enabledExpected; Starter Pools unavailable
Connection timeout from SparkFirewall blocking Fabric subnetOpen required ports (1433 for SQL)
DNS resolves to public IPPrivate DNS zone not linkedAdd A record pointing to private IP
MPE status = FailedPLS rejected or deletedRe-create MPE, verify PLS exists
HTTP 430 errorCapacity VCores exhaustedCancel jobs or upgrade SKU
PyPI install blockedOutbound access protectionUpload packages as custom libraries
Cross-workspace env failsNetwork settings mismatchEnsure same capacity and network config
OneLake API 403Endpoint URL validationUse *.dfs.fabric.microsoft.com

References