AgentSkillsCN

perf-mem-asm-top

利用 perf report/annotate,结合加权的无子节点占比与页面错误归属,对性能剖析文件中的内存访问指令进行排名。当用户要求列出热门的加载/存储指令、将其映射至数据结构,或根据 perf.data 中的符号无子节点开销对指令级占比进行加权时,此技能将助您高效完成任务。

SKILL.md
--- frontmatter
name: perf-mem-asm-top
description: Rank top memory-access instructions from perf profiles with weighted no-children percentages and pagefault attribution using perf report/annotate. Use when asked to list top load/store instructions, map them to data structures, or weight instruction-local percentages by symbol no-children overhead in perf.data.

Perf Mem Asm Top

Overview

Produce top memory-access instructions by weighted cycles and pagefaults using perf report/annotate, then map each instruction to source-level data structures.

Workflow

1) Locate perf.data

Prefer these defaults when present:

  • test/profile_output/query_latest_profile_perf
  • test/profile_output/index_latest_profile_perf

If missing, pick the relevant *.perf.data file and note the binary/DSO from perf report.

2) Get no-children symbol overhead

Use grouped, no-children output to avoid duplicate/mismatched symbol percentages:

bash
perf report -i path/to/perf.data --stdio --group --no-children

3) Rank top memory instructions (weighted)

Use the bundled script to:

  • run perf annotate -l per symbol
  • extract load/store instructions
  • weight local instruction % by symbol no-children %
  • report pagefault-triggering instructions
bash
python3 .codex/skills/perf-mem-asm-top/scripts/perf_mem_top.py -i path/to/perf.data --auto-top 5 --topn 20

For explicit symbols:

bash
python3 .codex/skills/perf-mem-asm-top/scripts/perf_mem_top.py -i path/to/perf.data --symbols hnsw_search_with_context HalfvecL2SquaredDistanceF32HalfBatch4Avx512f

4) Map to source and data structures

For each hot instruction line emitted by the script, open the referenced source and identify the data structure or array:

bash
rg -n "<symbol>" -S .
nl -ba path/to/file.c | sed -n 'START,ENDp'

Use the inline source comments from perf annotate (e.g., // hnswlib.c:1901) to jump to the exact line.

5) Report results

For each instruction, summarize:

  • weighted cycles % and local %
  • symbol name
  • source file:line
  • data structure accessed (array/struct/pointer chase)
  • pagefault notes (minor/major) when present

Resources

scripts/

  • scripts/perf_mem_top.py: Extract and rank memory-access instructions with weighted cycles and pagefault attribution.