AgentSkillsCN

ff-diagnostics

通过协调实体图检查、遥测数据分析、日志搜索以及源码关联,诊断并调试FireFoundry相关问题。在调查故障、追踪请求,或调试智能体组合行为时使用此功能。

SKILL.md
--- frontmatter
name: ff-diagnostics
description: Diagnose and debug FireFoundry issues by coordinating entity graph inspection, telemetry analysis, log searching, and source code correlation. Use when investigating failures, tracing requests, or debugging agent bundle behavior.
version: 1.1.0
tags: [firefoundry, diagnostics, debugging, orchestration]
skills: ff-eg-read, ff-telemetry-read, ff-cli

FireFoundry Diagnostics Skill

Orchestrate diagnostics across the FireFoundry platform. This skill routes you to the appropriate mode file based on your starting point.

Prerequisites

Required

  • ff-eg-read CLI installed (auto-configures from .env)
  • ff-telemetry-read CLI installed (auto-configures from .env)
  • Access to local log files (./logs/)

Optional

  • Azure MCP server for App Insights queries
  • ff-cli for cluster operations
  • kubectl / helm for direct cluster access

Tools auto-configure from environment variables or .env files. For connection issues, see the tool-specific configuration modes:


Decision Flowchart

Route to the correct mode file based on what you have:

code
What do you have?
│
├─► Entity ID (failed, stuck, or misbehaving)
│   └─► Load: modes/entity-graph.md
│       Start: ff-eg-read node get <entity-id>
│
├─► Error message from logs
│   └─► Load: modes/logs-local.md
│       Start: grep "<error-pattern>" logs/*.log
│       Then: Extract entity_id from breadcrumbs, continue with entity path
│
├─► Broker request ID or telemetry trace
│   └─► Load: modes/telemetry.md
│       Start: ff-telemetry-read trace get <broker-request-id>
│
├─► LLM or bot failure
│   └─► Load: modes/telemetry.md (for LLM request details)
│       Then: modes/agent-bundle-source.md (for bot/prompt correlation)
│
├─► Pod not starting, OOMKilled, cluster issues
│   └─► Load: modes/cluster.md
│       Start: kubectl get pods -n <namespace>
│
├─► Mysterious behavior (works sometimes, logs missing, impossible values)
│   └─► Load: modes/mysterious-failures.md
│       Covers: async issues, race conditions, memory leaks, FFError patterns
│
├─► Need to find where a log message originates in code
│   ├─► Agent bundle code → Load: modes/agent-bundle-source.md
│   └─► Platform service code → Load: modes/platform-service-source.md
│
└─► Azure App Insights queries (platform-level, 5-10 min delay)
    └─► Load: modes/logs-azure.md

IF/THEN Quick Reference

code
IF you have entity_id:
  → ff-eg-read node get <id> | jq '{status, error}'
  → ff-eg-read node progress <id>  # for runnable entities
  → Extract breadcrumbs for cross-system correlation

ELSE IF you have error message:
  → grep "<pattern>" logs/*.log | jq '.properties.breadcrumbs[0].entity_id'
  → Continue with entity_id path

ELSE IF you have broker_request_id:
  → ff-telemetry-read trace get <id>
  → Extract entity from breadcrumbs

ELSE IF entity stuck in "Waiting":
  → ff-eg-read node progress <id> | jq '.[] | select(.type == "WAITING")'
  → Check expected input type, verify external trigger

ELSE IF no logs appearing:
  → Load modes/mysterious-failures.md (infinite loop, process exit, async issues)

Correlation Matrix

How to search across systems using the same identifier:

IdentifierEntity GraphTelemetryLogs
entity_idff-eg-read node get <id>ff-telemetry-read trace by-breadcrumb <type> <id>grep "<id>" logs/*.log
entity_typeff-eg-read search nodes-scoped --condition '{"entity_type":{"$eq":"<type>"}}'ff-telemetry-read trace by-breadcrumb <type> <id>grep "<type>" logs/*.log
correlation_id--grep "<corr_id>" logs/*.log
broker_request_id-ff-telemetry-read trace get <id>-
llm_request_id-ff-telemetry-read llm get <id>-

Field Paths in Each System

ConceptEntity Graph FieldTelemetry FieldLog Field
Entity IDidbreadcrumbs[].entity_idproperties.breadcrumbs[].entity_id
Entity Typeentity_typebreadcrumbs[].entity_typeproperties.breadcrumbs[].entity_type
Correlation ID-breadcrumbs[].correlation_idproperties.breadcrumbs[].correlation_id
Statusstatusstatuslevel (error/warn/info)
Timestampcreated_at, updated_atstarted_at, completed_attimestamp
Errorerrorerror_messagemessage (when level=error)

Key Concepts

Breadcrumbs

Breadcrumbs are the correlation thread linking entities, telemetry, and logs. They are automatically injected by the SDK via AsyncLocalStorage.

json
{
  "breadcrumbs": [
    {
      "entity_type": "ReportReviewWorkflowEntity",
      "entity_id": "5f3c35ef-e28b-4d1a-b9d5-2e8148d54ec1",
      "correlation_id": "279f4ee6-4cc4-4880-9736-4c64c5ab39be"
    }
  ]
}
  • entity_type - The class name of the entity
  • entity_id - Unique identifier (UUID) for the entity instance
  • correlation_id - Links related operations across a single execution flow

Multiple breadcrumbs indicate nested entity calls (parent → child).

Entity Types

TypeDescriptionKey Diagnostic
WorkflowOrchestrates multi-step processesCheck child entity statuses
RunnableSingle execution, ID = idempotency keyCheck progress envelopes
WaitableRunnable that pauses for external inputCheck WAITING envelope, input delivery
BotStateless AI processorCheck telemetry for LLM/tool calls

Diagnostic Flow (Overview)

code
1. ENTITY GRAPH → Identify entity, get state and breadcrumbs
2. TELEMETRY → Trace requests (broker → LLM → tool calls)
3. LOGS → Search by entity_id or correlation_id
4. SOURCE CODE → Correlate logs to code location
5. CLUSTER (if needed) → Pod status, resource issues

Mode Files

ModeWhen to LoadContent
entity-graph.mdHave entity ID, investigating entity stateEntity queries, progress envelopes, relationships
telemetry.mdTracing requests, LLM failures, tool callsBroker/LLM/tool queries, trace hierarchy
logs-local.mdSearching local Winston JSON logsgrep/jq patterns, log structure, filtering
logs-azure.mdPlatform-level logs (5-10 min delay)KQL queries, App Insights via MCP
cluster.mdPod issues, deployments, Kong gatewaykubectl patterns, namespace structure
agent-bundle-source.mdCorrelating diagnostics to agent bundle coderun_impl, progress envelopes, bot patterns
platform-service-source.mdInternal platform service debuggingProvider patterns, route mapping
source-correlation.mdGeneral log-to-source correlationFinding log origins in code
mysterious-failures.mdEdge cases, async issues, "works sometimes"FFError, race conditions, memory, event loop

See Also