Profiling with CPU Sampler
Time-based sampling profiler that identifies WHERE your program spends execution time. Shows wall-clock time distribution across functions with compilation tier breakdown.
When to Use This Skill
Use FIRST when investigating any performance issue:
- •Identify hot functions consuming most time
- •Verify code is compiling (tier distribution)
- •Measure time spent in interpreter vs compiled code
- •Generate flame graphs for visualization
Quick Start
bash
# Basic profiling <launcher> --cpusampler <program> # With tier breakdown (RECOMMENDED) <launcher> --cpusampler --cpusampler.ShowTiers=true <program> # Skip warmup (recommended for steady-state analysis) <launcher> --cpusampler --cpusampler.Delay=5000 --cpusampler.ShowTiers=true <program>
⚠️ REQUIRED: Fermi Verification (Every Tool Invocation)
Before running:
- • Pre-calculate: Expected hot functions (1-5 names), T0/T1/T2 split
- • Smoke test:
<launcher> --cpusampler -c 'print 1;'→ Verify output format
After running:
- • Validate: Actual vs estimate within 1 order of magnitude? YES / NO
- • If NO: STOP - Debug tool before proceeding (run
--help:cpusampler, test on known-good input) - • Save output:
tool-outputs/cpu-sampler-[benchmark].txt
Gate: All boxes checked? → Proceed to analysis
Key Options
| Option | Description | Recommended Value |
|---|---|---|
--cpusampler.ShowTiers=true | Show T0/T1/T2 breakdown | Always use |
--cpusampler.Delay=<ms> | Skip warmup | 2000-10000 |
--cpusampler.Period=<ms> | Sample interval | 10 (default) |
--cpusampler.Output=calltree | Output format | Default: histogram |
--cpusampler.OutputFile=<file> | Save to file | For later analysis |
--cpusampler.SampleInternal=true | Include internal frames | For Truffle debugging |
Understanding Output
Tier Columns
| Tier | Meaning | Target |
|---|---|---|
| T0 | Interpreter | <10% for hot functions |
| T1 | First-tier compiled | Transitional |
| T2 | Fully optimized | >80% for hot functions |
Sample Output
code
Sampling Histogram. Recorded 412 samples with period 10ms. Self Time: Time spent in function (excluding callees) Total Time: Time in function including callees Name || Total Time || Self Time || T0 | T1 | T2 queens || 1850ms 88.0% || 1850ms 88.0% || 5.2% | 3.1% | 91.7% hasConflict || 250ms 11.9% || 250ms 11.9% || 8.8% | 4.2% | 87.0%
Interpretation Guidelines
Good Performance
- •✅ Hot functions show >80% T2 time
- •✅ <10% T0 (interpreter) time
- •✅ Time concentrated in expected hot functions
Performance Problems
- •⚠️ >30% T0 time → Compilation issues
- •⚠️ High T1 but low T2 → Optimization barriers
- •⚠️ Time in unexpected functions → Algorithm issues
Integration with Other Skills
Next steps based on findings:
| Finding | Next Skill |
|---|---|
| High T0 time | tracing-compilation-events |
| Optimization barriers | detecting-performance-warnings |
| Memory issues suspected | profiling-memory-allocations |
| Inlining problems | tracing-inlining-decisions |
Related Skills
- •
tracing-execution-counts- Execution frequency (not time) - •
detecting-performance-warnings- Find optimization barriers - •
tracing-compilation-events- Compilation behavior - •
establishing-benchmark-baseline- Set up benchmarks first
Reference
bash
# Full help <launcher> --help:cpusampler
See PATTERNS.md for common problem patterns and solutions.