Troubleshoot Foundry Hosted Agents
Use this skill when users are experiencing errors or issues with hosted agents.
For creating agents, see the foundry-hosted-agents-create skill.
For testing agents, see the foundry-hosted-agents-test skill.
For deploying agents, see the foundry-hosted-agents-deploy skill.
WHEN USER REPORTS AN ERROR - START HERE:
Step 1: Check Agent Status
az cognitiveservices agent status \
--account-name <account> \
--project-name <project> \
--name <agent-name> \
--agent-version 1
Step 2: Check Agent Logs
az cognitiveservices agent logs show \
--account-name <account> \
--project-name <project> \
--name <agent-name> \
--agent-version 1
Step 3: Match Error to Solution Below
WHEN USER SEES: "Azure AI project endpoint is required"
Cause
agent.yaml is using the wrong environment variable name.
Solution
In agent.yaml, use ${AZURE_AI_PROJECT_ENDPOINT} (the azd variable), NOT ${PROJECT_ENDPOINT}:
environment_variables:
- name: PROJECT_ENDPOINT
value: ${AZURE_AI_PROJECT_ENDPOINT} # ✓ Correct
# value: ${PROJECT_ENDPOINT} # ✗ Wrong
WHEN USER SEES: "PROJECT_ENDPOINT environment variable is required"
Cause
When using az cognitiveservices agent create, environment variables were not passed.
Solution
Add --env flag with required variables:
az cognitiveservices agent create \
--account-name <account> \
--project-name <project> \
--name <agent-name> \
--source . \
--registry <acr-name> \
--env PROJECT_ENDPOINT=https://<account>.services.ai.azure.com/api/projects/<project> MODEL_DEPLOYMENT_NAME=gpt-4.1 \
--show-logs
WHEN USER SEES: "AcrPullUnauthorized" or Container Pull Errors
Cause
The project's managed identity doesn't have permission to pull from the container registry.
Solution
Grant AcrPull role:
# Get project managed identity
PROJECT_IDENTITY=$(az cognitiveservices account project show \
--name <foundry-account> \
--resource-group <resource-group> \
--project-name <project-name> \
--query identity.principalId -o tsv)
# Get ACR resource ID
ACR_ID=$(az acr show --name <acr-name> --resource-group <resource-group> --query id -o tsv)
# Grant AcrPull
az role assignment create \
--assignee $PROJECT_IDENTITY \
--role "AcrPull" \
--scope $ACR_ID
WHEN USER SEES: 403 Error, "Model access denied", or Authorization Errors
Cause
The project's managed identity doesn't have the Azure AI User role on the Foundry account.
Solution
Grant Azure AI User role:
# Get project managed identity
PROJECT_IDENTITY=$(az cognitiveservices account project show \
--name <foundry-account> \
--resource-group <resource-group> \
--project-name <project-name> \
--query identity.principalId -o tsv)
# Get Foundry account resource ID
FOUNDRY_ID=$(az cognitiveservices account show \
--name <foundry-account> \
--resource-group <resource-group> \
--query id -o tsv)
# Grant Azure AI User
az role assignment create \
--assignee $PROJECT_IDENTITY \
--role "Azure AI User" \
--scope $FOUNDRY_ID
WHEN USER SEES: "AuthenticationError" During Local Testing
Cause
User is not logged into Azure CLI.
Solution
az login az account show # Verify you're logged in
If using a specific subscription:
az account set --subscription <subscription-id>
WHEN USER SEES: Agent Status "Failed" or "Unhealthy"
Diagnosis
Check the logs for specific error:
az cognitiveservices agent logs show \
--account-name <account> \
--project-name <project> \
--name <agent-name> \
--agent-version 1
Common Causes
| Log Message | Cause | Solution |
|---|---|---|
PROJECT_ENDPOINT is required | Missing env var | Redeploy with --env flag |
Model not found | Wrong model name | Check MODEL_DEPLOYMENT_NAME matches deployed model |
Import error | Missing dependency | Add to requirements.txt and redeploy |
Connection refused | Agent crashed on startup | Check main.py for errors |
Restart Agent
az cognitiveservices agent stop \
--account-name <account> \
--project-name <project> \
--name <agent-name> \
--agent-version 1
az cognitiveservices agent start \
--account-name <account> \
--project-name <project> \
--name <agent-name> \
--agent-version 1
WHEN USER SEES: "Connection refused" or Port 8088 Issues (Local)
Cause
Agent is not running, or port is blocked/in use.
Solution
Check if port is in use:
lsof -i:8088
Kill existing process:
lsof -ti:8088 | xargs kill -9
Restart agent:
python main.py # Or for azd projects: python src/<agent-name>/main.py
WHEN USER SEES: "Invalid connection string" for App Insights
Cause
Application Insights connection string is not set or invalid.
Impact
This is usually NOT a critical error. The agent will work without App Insights, but you lose valuable observability.
Solution
Step 1: Check if project has AppInsights connection (auto-injection)
# If this returns a result, connection string should be auto-injected
az rest --method GET \
--url "https://management.azure.com/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<account>/projects/<project>/connections?api-version=2025-06-01" \
--query "value[?properties.category=='AppInsights'].name" -o tsv
If AppInsights connection exists: The connection string should be auto-injected. Try redeploying the agent.
If NO AppInsights connection: Continue to find and connect Application Insights.
Step 2: Find Application Insights resources
# Check resource group first
az resource list --resource-type "Microsoft.Insights/components" \
--resource-group <resource-group> \
--query "[].{name:name, id:id}" -o table
# If not found, search entire subscription
az resource list --resource-type "Microsoft.Insights/components" \
--query "[].{name:name, resourceGroup:resourceGroup, id:id}" -o table
Step 3a: If App Insights exists - Create project connection (RECOMMENDED)
# Set variables
SUBSCRIPTION_ID="<subscription-id>"
RESOURCE_GROUP="<resource-group>"
ACCOUNT_NAME="<foundry-account>"
PROJECT_NAME="<project>"
APPINSIGHTS_NAME="<app-insights-name>"
CONNECTION_NAME="${APPINSIGHTS_NAME}-connection"
# Get App Insights resource ID and connection string
APPINSIGHTS_ID=$(az monitor app-insights component show \
--app $APPINSIGHTS_NAME \
--resource-group $RESOURCE_GROUP \
--query id -o tsv)
CONN_STRING=$(az monitor app-insights component show \
--app $APPINSIGHTS_NAME \
--resource-group $RESOURCE_GROUP \
--query connectionString -o tsv)
# Create JSON body file (avoids shell escaping issues)
cat > /tmp/appinsights-connection.json << EOF
{
"properties": {
"authType": "ApiKey",
"category": "AppInsights",
"credentials": {
"key": "${CONN_STRING}"
},
"group": "ServicesAndApps",
"isDefault": true,
"metadata": {
"ApiType": "Azure",
"ResourceId": "${APPINSIGHTS_ID}"
},
"target": "${APPINSIGHTS_ID}"
}
}
EOF
# Create the connection
az rest --method PUT \
--url "https://management.azure.com/subscriptions/${SUBSCRIPTION_ID}/resourceGroups/${RESOURCE_GROUP}/providers/Microsoft.CognitiveServices/accounts/${ACCOUNT_NAME}/projects/${PROJECT_NAME}/connections/${CONNECTION_NAME}?api-version=2025-06-01" \
--body @/tmp/appinsights-connection.json
# Redeploy agent (connection string will be auto-injected)
Step 3b: If NO App Insights exists - Create one first
az monitor app-insights component create \
--app <app-insights-name> \
--location <location> \
--resource-group <resource-group> \
--kind web \
--application-type web
# Then create the connection (Step 3a)
Step 4: Verify observability is working
Check startup logs for: Observability setup completed with provided exporters
WHEN USER SEES: Remote Test Not Working (No Response)
Cause
Usually one of:
- •Wrong API being used
- •Missing
extra_bodyparameter - •Wrong agent name
Solution
Use the correct API pattern for deployed agents:
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
project_client = AIProjectClient(
endpoint=PROJECT_ENDPOINT,
credential=DefaultAzureCredential(),
)
# Must use get_openai_client()
openai_client = project_client.get_openai_client()
# Must include extra_body
response = openai_client.responses.create(
conversation=conversation.id,
extra_body={"agent": {"name": "<agent-name>", "type": "agent_reference"}}, # Required!
input="Hello!",
store=True,
)
Common mistakes:
- •Using
AgentsClientinstead ofAIProjectClient.get_openai_client() - •Forgetting
extra_body={"agent": {...}} - •Agent name doesn't match agent.yaml
namefield
WHEN USER ASKS TO VERIFY ROLE ASSIGNMENTS:
Check All Role Assignments for Project Identity
# Get project managed identity
PROJECT_IDENTITY=$(az cognitiveservices account project show \
--name <foundry-account> \
--resource-group <resource-group> \
--project-name <project-name> \
--query identity.principalId -o tsv)
# List all roles
az role assignment list \
--assignee $PROJECT_IDENTITY \
--query "[].{Role:roleDefinitionName, Scope:scope}" \
-o table
Expected Roles
| Role | Scope |
|---|---|
| AcrPull | Container Registry |
| Azure AI User | Foundry Account |
COMPLETE TROUBLESHOOTING CHECKLIST:
For Local Testing Issues
- • Azure CLI logged in:
az account show - •
.envfile exists withPROJECT_ENDPOINTandMODEL_DEPLOYMENT_NAME - • Virtual environment activated:
source .venv/bin/activate - • Dependencies installed:
pip install -r requirements.txt - • No other process on port 8088:
lsof -i:8088 - • Agent started successfully:
python main.py
For Deployment Issues
- • ACR connected to Foundry project
- • AcrPull role granted to project identity
- • Azure AI User role granted to project identity
- •
--envincludesPROJECT_ENDPOINTandMODEL_DEPLOYMENT_NAME - • Model deployment exists and name matches
- • Dockerfile and requirements.txt are correct
- • (Optional)
APPLICATIONINSIGHTS_CONNECTION_STRINGincluded for observability
For Remote Testing Issues
- • Agent status is "Running":
az cognitiveservices agent status ... - • Using
AIProjectClient.get_openai_client()(notAgentsClient) - • Including
extra_body={"agent": {...}} - • Agent name matches agent.yaml exactly
- • Azure CLI logged in:
az login
For Observability Issues
- • Application Insights exists:
az resource list --resource-type "Microsoft.Insights/components" --resource-group <rg> - • Agent deployed with
APPLICATIONINSIGHTS_CONNECTION_STRING - • Startup logs show:
Observability setup completed with provided exporters - • Telemetry appearing:
az monitor app-insights query --app <name> --analytics-query 'traces | take 5'
WHEN USER ASKS TO DIAGNOSE WITH APPLICATION INSIGHTS:
Query Agent Request Logs
az monitor app-insights query \
--app <app-insights-name> \
--resource-group <resource-group> \
--analytics-query 'traces | where timestamp > ago(30m) | where message has "CreateResponse" or message has "Error" or message has "Exception" | project timestamp, message, severityLevel | order by timestamp desc | take 30' \
-o json
Query for Errors Only
az monitor app-insights query \
--app <app-insights-name> \
--resource-group <resource-group> \
--analytics-query 'traces | where timestamp > ago(1h) | where severityLevel >= 3 | project timestamp, message | order by timestamp desc | take 50' \
-o json
Query Model Call Performance
az monitor app-insights query \
--app <app-insights-name> \
--resource-group <resource-group> \
--analytics-query 'dependencies | where timestamp > ago(1h) | where name has "chat" | summarize avgDuration=avg(duration), count=count() by name' \
-o json
Query Failed Dependencies
az monitor app-insights query \
--app <app-insights-name> \
--resource-group <resource-group> \
--analytics-query 'dependencies | where timestamp > ago(1h) | where success == false | project timestamp, name, duration, resultCode | order by timestamp desc' \
-o json