AgentSkillsCN

fabric-data-agent

使用 Data Agent SDK,在 Microsoft Fabric 湖仓上构建、配置并验证对话式数据代理。在创建 Fabric 数据代理、配置少样本示例、管理 Livy 会话,或对照湖仓数据验证代理响应时,可使用此技能。

SKILL.md
--- frontmatter
name: "fabric-data-agent"
description: 'Build, configure, and validate conversational data agents on Microsoft Fabric Lakehouses using the Data Agent SDK. Use when creating Fabric data agents, configuring few-shot examples, managing Livy sessions, or validating agent responses against Lakehouse data.'
metadata:
  author: "AgentX"
  version: "1.0.0"
  created: "2025-07-13"
  updated: "2025-07-13"
compatibility:
  languages: ["python", "sql", "pyspark"]
  frameworks: ["microsoft-fabric", "fabric-data-agent-sdk"]
  platforms: ["windows", "linux", "macos"]
prerequisites:
  - "Microsoft Fabric workspace with active capacity"
  - "Fabric MCP Server (ms-fabric-mcp-server)"
  - "fabric-data-agent-sdk (pre-installed in Fabric environment)"
  - "Lakehouse with populated Delta tables"

Fabric Data Agent

Build conversational data agents that answer natural language questions against Fabric Lakehouses.

When to Use

  • Creating a natural language query interface over Lakehouse data
  • Building self-service analytics agents for business users
  • Configuring a Data Agent with table selection, joins, and measures
  • Validating agent accuracy against known metrics
  • Generating reproducible notebooks for agent provisioning

Decision Tree

code
Building a Fabric Data Agent?
├─ First time with this Lakehouse?
│   └─ Start at Phase 1 (Plan) — discover schema and relationships
├─ Have an implementation plan already?
│   └─ Start at Phase 2 (Create) — build and publish the agent
├─ Agent exists but needs validation?
│   └─ Start at Phase 3 (Validate) — test against expected metrics
├─ Need to modify an existing agent?
│   ├─ Schema changes → Re-run Phase 1 (new plan)
│   └─ Config tweaks → Phase 2 only (update agent)
└─ Not sure if Data Agent is right tool?
    ├─ Users need ad-hoc SQL → Use Warehouse + SQL endpoint instead
    ├─ Users need dashboards → Use Semantic Model + Power BI
    └─ Users need chat-based Q&A → Data Agent ✅

Workflow Overview

Data Agent development follows a 3-phase workflow with checkpoint stops between phases:

code
Phase 1: Plan ──checkpoint──→ Phase 2: Create ──checkpoint──→ Phase 3: Validate

Critical Rule: Execute ONE phase per conversation turn to prevent context rot. Use the completion report as a handover document between phases.

Phase 1: Plan

Goal: Analyze the Lakehouse to produce a comprehensive implementation plan.

StepActionOutput
1Gather inputs (workspace, lakehouse name, scope)User requirements
2Discover tables and row counts via SQL endpointTable inventory
3Identify primary/foreign keys and relationshipsRelationship map
4Calculate baseline metrics for validationExpected values
5Generate implementation plan documentimplementation_plan.md

Checkpoint: Present plan summary, get user approval before proceeding.

Phase 2: Create

Goal: Execute the plan to create and publish a configured Data Agent.

StepActionOutput
1Initialize the Data Agent Management ClientSDK connection
2Create agent with name and descriptionAgent instance
3Configure instructions (system prompt for the agent)Agent knowledge
4Add Lakehouse datasources and select tablesData bindings
5Add few-shot examples for common queriesQuery templates
6Publish the agentLive agent
7Generate reproducible notebookagent_creation.ipynb

Checkpoint: Confirm agent is published and accessible before validation.

Phase 3: Validate

Goal: Verify agent accuracy against the metrics from Phase 1.

StepActionOutput
1Initialize the Query ClientSDK connection
2Execute test queries from the planQuery responses
3Compare actual vs expected valuesAccuracy metrics
4Generate validation reportvalidation_report.md
5Generate reproducible notebookagent_validation.ipynb

Checkpoint: Present accuracy results and recommendations.

Core Concepts

Data Agent Architecture

code
User Question (natural language)
        ↓
Data Agent (instructions + knowledge)
        ↓
SQL Generation (against Lakehouse SQL endpoint)
        ↓ 
Query Execution
        ↓
Natural Language Answer

Agent Configuration Components

ComponentPurposeExample
InstructionsSystem prompt guiding agent behavior"You are a sales analytics assistant..."
DatasourcesLakehouse bindings with table selectionBronze_LH: [fact_sales, dim_product, ...]
Few-shot examplesQuery-answer pairs for accuracyQ: "Total sales?" → SQL: SELECT SUM(amount)...
KnowledgeAdditional context documentsBusiness rules, glossary, KPIs

Table Selection Strategy

IncludeExclude
Gold-layer fact and dimension tablesBronze/Silver raw tables
Tables with clear business meaningSystem/metadata tables
Tables with documented relationshipsTemp/staging tables
Tables referenced in business KPIsLarge log/event tables

Few-Shot Example Best Practices

Few-shot examples teach the agent how to generate correct SQL:

SQL Syntax (Fabric SQL Endpoint = T-SQL)

Correct (T-SQL)Incorrect (Spark SQL)
SELECT TOP 10LIMIT 10
DATEPART(QUARTER, date)QUARTER(date)
FORMAT(date, 'yyyy-MM')DATE_FORMAT(date, '%Y-%m')
CONVERT(DATE, value)CAST(value AS DATE)
ISNULL(col, default)COALESCE(col, default)

Example Quality Checklist

  • Covers the top 10 most common business questions
  • Uses correct T-SQL syntax (not Spark SQL)
  • Includes aggregations (SUM, COUNT, AVG)
  • Includes time-based filters (YTD, MTD, date ranges)
  • Includes joins across fact and dimension tables
  • All examples validated against SQL endpoint before adding
  • Covers edge cases (nulls, empty results, large numbers)

Livy Session Management

When using Livy for SDK operations (Phase 2 & 3):

code
1. Always check for existing sessions FIRST
2. Reuse idle sessions (state: idle → reuse)
3. Create only if none exist (cold start: 3-6+ minutes)
4. Never close sessions unless explicitly requested
5. Use timestamped session names: data-agent-{lakehouse}-{timestamp}
6. Check session status before submitting statements

Error Handling

Retry Protocol

code
Attempt 1 → Execute operation
  ↓ (on failure)
Attempt 2 → Diagnose error, apply fix, retry
  ↓ (on failure)
Attempt 3 → Try alternative approach
  ↓ (on failure)
Escalate to user with error details + options

Common Errors

ErrorCauseSolution
Agent creation failsSDK initialization issueVerify workspace access and SDK version
Table not found in agentTable not in selected scopeRe-add datasource with correct table list
Query returns wrong resultsIncorrect few-shot SQL syntaxValidate SQL against endpoint first
Session timeoutLivy cold startIncrease timeout, reuse existing sessions
Permission deniedWorkspace role insufficientNeed Contributor or higher role

Output Artifacts

All output goes to timestamped folders:

code
run/{timestamp}_{lakehouse}/
├── implementation_plan.md       # Phase 1 output
├── agent_creation.ipynb          # Phase 2 reproducible notebook
├── agent_validation.ipynb        # Phase 3 reproducible notebook
├── validation_report.md          # Phase 3 accuracy results
└── completion_report.md          # Cross-phase handover document

Anti-Patterns

  • Skip planning phase: Creating agents without understanding schema → poor accuracy
  • Use Bronze tables: Raw data with duplicates/nulls → unreliable answers
  • Spark SQL in few-shots: Agent generates invalid SQL → query failures
  • No validation: Deploying without testing → users lose trust quickly
  • Monolithic instructions: Long, unfocused system prompts → agent confusion
  • Too many tables: Adding all tables → slow queries, irrelevant joins

Boundaries

Always Do

  • Gather workspace and lakehouse inputs before starting
  • Discover and verify schema before creating agent
  • Validate all few-shot SQL against the SQL endpoint
  • Get user approval at checkpoint between phases
  • Generate reproducible notebooks for all SDK operations
  • Use timestamped output folders (never overwrite)
  • Document decisions in completion report

Ask First

  • Creating new Data Agents (confirm name and scope)
  • Running expensive queries on large tables
  • Modifying existing agent configurations
  • Any operation that affects production data

Never Do

  • Proceed without required inputs (workspace, lakehouse)
  • Execute modifications without user approval
  • Delete Data Agents without confirmation
  • Hardcode credentials or connection strings
  • Assume table relationships without verification
  • Include unvalidated SQL in few-shot examples
  • Close Livy sessions that were already open

Reference Index

DocumentDescription
references/agent-sdk-patterns.mdData Agent SDK code patterns and API reference
references/instruction-templates.mdSystem prompt templates for different domains

Asset Templates

FileDescription
assets/completion-report-template.mdCross-phase handover document template
assets/sample-few-shot-queries.sqlExample few-shot query templates