AgentSkillsCN

perf-workflow

通过 perf report 与 perf annotate,结合 C/C++/PostgreSQL 代码库中的源码与汇编代码,对 perf.data 文件进行性能剖析,尤其在 O3 优化模式下,-s 参数可能带来误导。当用户提出性能热点分析、perf report/annotate 命令,或希望将循环次数映射至 C/C++/PostgreSQL 代码中的源码与汇编代码时,此技能将助您高效完成任务。

SKILL.md
--- frontmatter
name: perf-workflow
description: Perf profiling workflow for analyzing `perf.data` files with `perf report` and `perf annotate`, mapping hotspots to source/assembly (especially with O3 where `-s` can mislead). Use when users ask for perf hotspot analysis, perf report/annotate commands, or mapping cycles to source/asm in C/C++/PostgreSQL codebases.

Perf Workflow

Overview

Analyze perf profiles end-to-end: summarize hotspots with perf report, drill down with perf annotate -l, then map to source and explain the exact hot statements and accessed data structures.

Workflow

Step 0: Locate input and binaries

  • By default, use test/profile_output/index_latest_profile_perf and test/profile_output/query_latest_profile_perf.
  • If the defaults are missing, fall back to any perf.data (or symlinks like *_latest_profile_perf) and note the binary/shared object shown in perf report.
  • If symbols look stripped or source paths are wrong, plan to use --symfs, --buildid-dir, or perf buildid-cache --add.

Step 1: Summarize hotspots

Run a non-interactive report to get top symbols and call chains:

bash
perf report -i path/to/perf.data --stdio --no-children

Capture the top hot symbols (percent, shared object, symbol) and the relevant call path.

If the user explicitly asks to compare latest vs old, run the same report on both and compare:

  • Percent deltas for top symbols
  • New symbols entering/exiting the top set
  • Any shift between shared objects (e.g., postgres vs vector.so)

Step 2: Annotate hotspots with source mapping

O3 can break -s source correlation; use -l to get line-level mapping, then read the assembly around the hot lines.

bash
perf annotate -i path/to/perf.data --stdio -l --symbol <symbol> --percent-limit 1

Prefer --symbol over full-file annotate to keep output focused.

Step 3: Annotate hotspots (comparison flow)

If the user explicitly asks to compare latest vs old, run annotate on the same symbols and compare:

  • Which source lines or basic blocks gained/lost percent
  • Whether the hot block moved to a different inlined path
  • Any difference in the memory access pattern (loads/stores) visible in assembly

Step 4: Map to source and explain data access

  • Use rg to find the symbol definition.
  • Use nl -ba (or equivalent) to show the hot source lines by number.
  • Identify accessed data structures/arrays by matching the assembly loads/stores to source statements.
bash
rg -n "<symbol>" -S .
nl -ba path/to/file.c | sed -n 'START,ENDp'

Step 5: Report results

Summarize each hotspot with:

  • Symbol + percent from perf report
  • Source file:line from perf annotate -l
  • The key statement(s) causing cost
  • The data accessed (arrays, structs, pointer-chasing)
  • Any lock/spin or memory access patterns seen in assembly

Practical Notes

  • Use -l for O3 builds; -s often mismatches inlined/optimized code.
  • If perf annotate shows only assembly, verify debug symbols and build-id cache.
  • When data are in shared objects (e.g., vector.so), point to the matching source file in the repo, not just the .so.