AgentSkillsCN

querying-mlflow-metrics

从 MLflow 跟踪服务器中获取聚合的追踪指标(如 Token 使用量、延迟时间、追踪次数、质量评估结果)。当用户请求展示指标、分析 Token 使用情况、查看 LLM 成本、监测用量趋势,或查询追踪统计数据时,该工具便会自动触发相应操作。

SKILL.md
--- frontmatter
name: querying-mlflow-metrics
description: Fetches aggregated trace metrics (token usage, latency, trace counts, quality evaluations) from MLflow tracking servers. Triggers on requests to show metrics, analyze token usage, view LLM costs, check usage trends, or query trace statistics.

MLflow Metrics

Run scripts/fetch_metrics.py to query metrics from an MLflow tracking server.

Examples

Token usage summary:

bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM,AVG

Output: AVG: 223.91 SUM: 7613

Hourly token trend (last 24h):

bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m total_tokens -a SUM \
    -t 3600 --start-time="-24h" --end-time=now

Output: Time-bucketed token sums per hour

Latency percentiles by trace:

bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m latency -a AVG,P95 -d trace_name

Error rate by status:

bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -m trace_count -a COUNT -d trace_status

Quality scores by evaluator (assessments):

bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_value -a AVG,P50 -d assessment_name

Output: Average and median scores for each evaluator (e.g., correctness, relevance)

Assessment count by name:

bash
python scripts/fetch_metrics.py -s http://localhost:5000 -x 1 -v ASSESSMENTS \
    -m assessment_count -a COUNT -d assessment_name

JSON output: Add -o json to any command.

Arguments

ArgRequiredDescription
-s, --serverYesMLflow server URL
-x, --experiment-idsYesExperiment IDs (comma-separated)
-m, --metricYestrace_count, latency, input_tokens, output_tokens, total_tokens
-a, --aggregationsYesCOUNT, SUM, AVG, MIN, MAX, P50, P95, P99
-d, --dimensionsNoGroup by: trace_name, trace_status
-t, --time-intervalNoBucket size in seconds (3600=hourly, 86400=daily)
--start-timeNo-24h, -7d, now, ISO 8601, or epoch ms
--end-timeNoSame formats as start-time
-o, --outputNotable (default) or json

For SPANS metrics (span_count, latency), add -v SPANS. For ASSESSMENTS metrics, add -v ASSESSMENTS.

See references/api_reference.md for filter syntax and full API details.