AgentSkillsCN

ai-agent-development

借助 Microsoft Foundry 和 Agent Framework 构建生产就绪的 AI 代理。涵盖代理架构设计、模型选型、编排调度、追踪记录与评估验证。

SKILL.md
--- frontmatter
name: ai-agent-development
description: 'Build production-ready AI agents with Microsoft Foundry and Agent Framework. Covers agent architecture, model selection, orchestration, tracing, and evaluation.'

AI Agent Development

Purpose: Build production-ready AI agents with Microsoft Foundry and Agent Framework.
Scope: Agent architecture, model selection, orchestration, observability, evaluation.


Quick Start

Installation

Python (Recommended):

bash
pip install agent-framework-azure-ai --pre  # --pre required during preview

.NET:

bash
dotnet add package Microsoft.Agents.AI.AzureAI --prerelease
dotnet add package Microsoft.Agents.AI.Workflows --prerelease

Model Selection

Top Production Models (Microsoft Foundry):

ModelBest ForContextCost/1M
gpt-5.2Enterprise agents, structured outputs200K/100KTBD
gpt-5.1-codex-maxAgentic coding workflows272K/128K$3.44
claude-opus-4-5Complex agents, coding, computer use200K/64K$10
gpt-5.1Multi-step reasoning200K/100K$3.44
o3Advanced reasoning200K/100K$3.5

Deploy Model: Ctrl+Shift+PAI Toolkit: Deploy Model


Agent Patterns

Single Agent

python
from agent_framework.openai import OpenAIChatClient

client = OpenAIChatClient(
    model="gpt-5.1",
    api_key=os.getenv("FOUNDRY_API_KEY"),
    endpoint=os.getenv("FOUNDRY_ENDPOINT")
)

agent = {
    "name": "Assistant",
    "instructions": "You are a helpful assistant.",
    "tools": []  # Add tools as needed
}

response = await client.chat(
    messages=[{"role": "user", "content": "Hello"}],
    agent=agent
)

Multi-Agent Orchestration

python
from agent_framework.workflows import SequentialWorkflow

researcher = {"name": "Researcher", "instructions": "Gather information."}
writer = {"name": "Writer", "instructions": "Write based on research."}

workflow = SequentialWorkflow(
    agents=[researcher, writer],
    handoff_strategy="on_completion"
)

result = await workflow.run(query="Write about AI agents")

Advanced Patterns: Search github.com/microsoft/agent-framework for:

  • Group Chat, Concurrent, Conditional, Loop
  • Human-in-the-Loop, Reflection, Fan-out/Fan-in
  • MCP, Multimodal, Custom Executors

Observability (Tracing)

Setup OpenTelemetry

python
from agent_framework.observability import configure_otel_providers

# Before running agent - must open trace viewer first!
configure_otel_providers(
    vs_code_extension_port=4317,  # AI Toolkit gRPC port
    enable_sensitive_data=True
)

Open Trace Viewer: Ctrl+Shift+PAI Toolkit: Open Trace Viewer

⚠️ CRITICAL: Open trace viewer BEFORE running your agent.


Evaluation

Workflow

  1. Upload dataset (JSONL)
  2. Define evaluators (built-in or custom)
  3. Create evaluation
  4. Run evaluation
  5. Analyze results

Prerequisites

bash
pip install "azure-ai-projects>=2.0.0b2"

Built-in Evaluators

Agent Evaluators:

  • builtin.intent_resolution - Intent correctly identified?
  • builtin.task_adherence - Instructions followed?
  • builtin.task_completion - Task completed end-to-end?
  • builtin.tool_call_accuracy - Tools used correctly?
  • builtin.tool_selection - Right tools chosen?

Quality Evaluators:

  • builtin.coherence - Natural text flow?
  • builtin.fluency - Grammar correct?
  • builtin.groundedness - Claims substantiated? (RAG)
  • builtin.relevance - Answers key points? (RAG)

Evaluation Example

python
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
from openai.types.eval_create_params import DataSourceConfigCustom
from openai.types.evals.create_eval_jsonl_run_data_source_param import (
    CreateEvalJSONLRunDataSourceParam, SourceFileID
)

endpoint = os.getenv("FOUNDRY_PROJECT_ENDPOINT")
model_deployment = os.getenv("MODEL_DEPLOYMENT_NAME")

with (
    DefaultAzureCredential() as credential,
    AIProjectClient(endpoint=endpoint, credential=credential) as project_client,
    project_client.get_openai_client() as openai_client,
):
    # 1. Upload Dataset
    dataset = project_client.datasets.upload_file(
        name="eval-data",
        version="1",
        file_path="data.jsonl"
    )

    # 2. Define Data Schema
    data_source_config = DataSourceConfigCustom({
        "type": "custom",
        "item_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "response": {"type": "string"}
            },
            "required": ["query", "response"]
        },
        "include_sample_schema": True
    })

    # 3. Define Evaluators
    testing_criteria = [
        {
            "type": "azure_ai_evaluator",
            "name": "coherence",
            "evaluator_name": "builtin.coherence",
            "data_mapping": {
                "query": "{{item.query}}", 
                "response": "{{item.response}}"
            },
            "initialization_parameters": {"deployment_name": model_deployment}
        }
    ]

    # 4. Create Evaluation
    evaluation = openai_client.evals.create(
        name="agent-eval",
        data_source_config=data_source_config,
        testing_criteria=testing_criteria
    )

    # 5. Run Evaluation
    run = openai_client.evals.runs.create(
        eval_id=evaluation.id,
        name="eval-run",
        data_source=CreateEvalJSONLRunDataSourceParam(
            type="jsonl", 
            source=SourceFileID(type="file_id", id=dataset.id)
        )
    )

    # 6. Wait for Completion
    while run.status not in ["completed", "failed"]:
        run = openai_client.evals.runs.retrieve(run_id=run.id, eval_id=evaluation.id)
        time.sleep(3)

    print(f"Report: {run.report_url}")

Custom Evaluators

Code-based (objective metrics):

python
code_evaluator = project_client.evaluators.create_version(
    name="response_length_check",
    evaluator_version={
        "name": "response_length_check",
        "definition": {
            "type": "CODE",
            "code_text": """
def grade(sample, item):
    length = len(item.get("response", ""))
    return 1.0 if 100 <= length <= 500 else 0.5
""",
            # ... schema omitted for brevity
        }
    }
)

Prompt-based (subjective metrics):

python
prompt_evaluator = project_client.evaluators.create_version(
    name="friendliness_check",
    evaluator_version={
        "name": "friendliness_check",
        "definition": {
            "type": "PROMPT",
            "prompt_text": """
Rate friendliness (1-5):
Query: {{query}}
Response: {{response}}

Output JSON: {"result": <int>, "reason": "<text>"}
""",
            # ... schema omitted for brevity
        }
    }
)

Best Practices

Development

DO:

  • Plan agent architecture before coding (Research → Design → Implement)
  • Use Microsoft Foundry models for production
  • Implement tracing from day one
  • Test with evaluation datasets before deployment
  • Use structured outputs for reliable agent responses
  • Implement error handling and retry logic
  • Version your agents and track changes

DON'T:

  • Hardcode API keys or endpoints
  • Skip tracing setup (critical for debugging)
  • Deploy without evaluation
  • Use GitHub models in production (free tier has limits)
  • Ignore token limits and context windows
  • Mix agent logic with business logic

Security

  • Store credentials in environment variables or Azure Key Vault
  • Validate all tool inputs and outputs
  • Implement rate limiting for agent APIs
  • Log agent actions for audit trails
  • Use role-based access control (RBAC) for Foundry resources
  • Review OWASP Top 10 for AI: owasp.org/AI-Security-and-Privacy-Guide

Performance

  • Cache model responses when appropriate
  • Use batch processing for multiple requests
  • Monitor token usage and costs
  • Implement timeout handling
  • Use async/await for I/O operations
  • Consider model size vs. latency tradeoffs

Monitoring

  • Track key metrics: latency, success rate, token usage, cost
  • Set up alerts for failures and anomalies
  • Use structured logging with context
  • Integrate with Azure Monitor / Application Insights
  • Review traces regularly for optimization opportunities

Production Checklist

Development

  • Agent architecture documented
  • Model selected and deployed
  • Tools/plugins implemented and tested
  • Error handling with retries
  • Structured outputs configured
  • No hardcoded secrets

Observability

  • OpenTelemetry tracing enabled
  • Trace viewer tested
  • Structured logging implemented
  • Metrics collection configured

Evaluation

  • Evaluation dataset created
  • Evaluators defined (built-in + custom)
  • Evaluation runs passing
  • Results meet quality thresholds

Security & Compliance

  • Credentials in Key Vault/env vars
  • Input validation implemented
  • RBAC configured
  • Audit logging enabled
  • OWASP AI Top 10 reviewed

Operations

  • Health checks implemented
  • Rate limiting configured
  • Monitoring alerts set up
  • Deployment strategy defined
  • Rollback plan documented
  • Cost monitoring enabled

Resources

Official Documentation:

AI Toolkit:

  • Model Catalog: Ctrl+Shift+PAI Toolkit: Model Catalog
  • Trace Viewer: Ctrl+Shift+PAI Toolkit: Open Trace Viewer
  • Playground: Ctrl+Shift+PAI Toolkit: Model Playground

Security:


Related: AGENTS.md for agent behavior guidelines • Skills.md for general production practices

Last Updated: January 17, 2026