Perf Mem Asm Top
Overview
Produce top memory-access instructions by weighted cycles and pagefaults using perf report/annotate, then map each instruction to source-level data structures.
Workflow
1) Locate perf.data
Prefer these defaults when present:
- •
test/profile_output/query_latest_profile_perf - •
test/profile_output/index_latest_profile_perf
If missing, pick the relevant *.perf.data file and note the binary/DSO from perf report.
2) Get no-children symbol overhead
Use grouped, no-children output to avoid duplicate/mismatched symbol percentages:
perf report -i path/to/perf.data --stdio --group --no-children
3) Rank top memory instructions (weighted)
Use the bundled script to:
- •run
perf annotate -lper symbol - •extract load/store instructions
- •weight local instruction % by symbol no-children %
- •report pagefault-triggering instructions
python3 .codex/skills/perf-mem-asm-top/scripts/perf_mem_top.py -i path/to/perf.data --auto-top 5 --topn 20
For explicit symbols:
python3 .codex/skills/perf-mem-asm-top/scripts/perf_mem_top.py -i path/to/perf.data --symbols hnsw_search_with_context HalfvecL2SquaredDistanceF32HalfBatch4Avx512f
4) Map to source and data structures
For each hot instruction line emitted by the script, open the referenced source and identify the data structure or array:
rg -n "<symbol>" -S . nl -ba path/to/file.c | sed -n 'START,ENDp'
Use the inline source comments from perf annotate (e.g., // hnswlib.c:1901) to jump to the exact line.
5) Report results
For each instruction, summarize:
- •weighted cycles % and local %
- •symbol name
- •source file:line
- •data structure accessed (array/struct/pointer chase)
- •pagefault notes (minor/major) when present
Resources
scripts/
- •
scripts/perf_mem_top.py: Extract and rank memory-access instructions with weighted cycles and pagefault attribution.