vLLM-Ascend Profiler Diff Workflow

Overview

This skill provides a structured workflow to compare two Ascend Profiler runs, quantify performance deltas, and optionally trace findings back to source and PRs. Conclusions should be driven by aligned evidence across multiple dimensions.

Required Inputs

•Profile A directory (baseline): ASCEND_PROFILER_OUTPUT path
•Profile B directory (regression): ASCEND_PROFILER_OUTPUT path

Optional Inputs (Strongly Recommended)

•Benchmark JSON A/B (throughput, TTFT, TPOT alignment)
•Source repo path (vLLM-Ascend) for git blame attribution
•A/B version tags/commits

Workflow (Execute in Order)

1) Directory Sanity Check

Verify each profile directory contains (note any missing):

•step_trace_time.csv
•op_statistic.csv
•api_statistic.csv
•kernel_details.csv (or equivalent)
•operator_details.csv (if PyTorch op stats exist)
•trace_view.json (may be large)

Output a presence table for A/B and whether deeper analysis is possible.

2) Overall Metrics Alignment

If benchmark JSON is provided, extract and compare:

•Throughput (tokens/s or req/s)
•TTFT (time to first token)
•TPOT (time per output token)

State whether the regression appears mainly in prefill or decode.

If benchmark JSON is not provided, use step-level timing trends from step_trace_time.csv for a directional judgment.

3) Stage-Level Attribution

From step_trace_time.csv, compare stage breakdowns:

•Computing
•Free (Host wait)
•Communication/Memcpy/Stage fields if present

Provide A/B totals and an evidence-based conclusion.

4) Kernel/Op Dimension

From op_statistic.csv and kernel_details.csv:

•List TopN by total time, call count, and average time
•Compare new hotspots or large deltas on shared items
•Relate changes to stage-level shifts where possible

Provide a data-backed assessment of whether kernel/op shifts contribute to the regression.

5) API Dimension

From api_statistic.csv, compare A/B across dimensions:

•TopN APIs (total time, count, average)
•Newly appearing or significantly changed APIs
•Alignment with stage-level changes (Computing/Free/Communication, etc.)

Only conclude when the data aligns; otherwise proceed to other dimensions.

6) PyTorch Operator Dimension (Optional)

If operator_details.csv exists, compare A/B across dimensions:

•TopN operators (total time, count, average)
•Newly appearing or significantly changed operators
•Consistency with stage-level or API-level shifts

Treat findings as signals, not definitive causes.

7) Large Trace Scan (trace_view.json)

When trace_view.json is too large to open, stream/scan to estimate:

•Event counts
•Total duration (if available)
•Time span (max(ts) - min(ts))
•Distribution histogram (e.g., 50 bins)

Use the temporal distribution to infer whether changes concentrate in prefill/initialization or are evenly spread during decode.

8) Source Attribution (If Repo Provided)

Search the repo for plausible trigger points (explicit/implicit/observer logic), then identify top 3 suspects (file, function, rationale).

9) PR/Commit Attribution

Use git blame for suspect lines and git show to identify PR numbers when available.

Output an evidence chain: trigger point → commit hash → PR link → change intent.

10) Final Deliverables (Always Provide)

•
Summary (3–6 lines):
- •prefill vs decode impact
- •dominant dimension(s) based on evidence
- •key metric magnitudes and direction
•
Evidence Tables:
- •step_trace_time breakdown
- •api_statistic dimension comparison
- •op/kernel TopN deltas
•
Trace distribution stats (if trace_view.json exists)
•
Source + PR attribution (if repo provided)
•
Actionable mitigations:
- •configuration/env workarounds
- •isolate or reduce high-cost paths
- •possible revert/cherry-pick options

User Prompt Template

Ask the user to provide:

code

Please execute the “vLLM-Ascend Profiler Diff Workflow”.

Profile A (baseline): <ASCEND_PROFILER_OUTPUT path>
Profile B (regression): <ASCEND_PROFILER_OUTPUT path>

(Optional) Benchmark JSON A: <path or empty>
(Optional) Benchmark JSON B: <path or empty>

(Optional) Source repo path: /data/tjl/vllm-workspace/vllm-ascend
(Optional) Version/Tag: A=<tag/commit>, B=<tag/commit>

Focus: prefill / decode / both
Output: final conclusion + evidence tables + PR link if caused by code change