Low-Latency Audit
Deep compliance audit driven by AI reading perf report text + source code directly.
Shell scripts collect data (perf record/report). AI does the analysis. This audit:
- •Reads perf report text to understand what code is hot (sample percentages, call chains)
- •Reads source code to understand function boundaries, templates, lambdas
- •Identifies banned patterns in hot code with function-level precision
- •Produces two outputs: identification report (what is hot, what patterns exist) and fix suggestions (how to address findings, for human review)
When to Use
- •After implementing hot/warm path code — final compliance gate
- •Before marking implementation tasks complete (touching HOT/WARM code)
- •When spec latency claims seem ungrounded — cross-check with data
- •Periodic review — catch drift as code evolves
When NOT to Use
- •Spec quality review — use
/spec-reviewinstead - •Running benchmarks — run them first, then use this to interpret
Inputs
- •No argument: audit all hot/warm path code in the project
- •File path: audit a specific file
- •
--spec: also audit spec files for latency claim compliance
Workflow
Phase 1: Profiling Data Check
- •Check
profile-results/perf-reports/exists and has.txtfiles - •If missing: warn "No profiling data — run
tools/profile-hot-path.sh". Cannot identify hot paths without data. Do NOT fall back to directory-based guessing. - •If present: read the perf report text files
- •Check freshness: compare timestamp against most recent
.h/.cppmodification - •If stale: warn "Profiling data may be outdated — consider re-running"
- •If
profile-results/flamegraph.svgexists, reference it for visual analysis
Phase 2: Hot-Path Identification (AI-Driven)
Read perf report text files in profile-results/perf-reports/. Each file contains:
- •Default view: symbol-level profiling (comm, dso, symbol with sample percentages)
- •Source file attribution: which source files contribute to hot code
For each perf report:
- •Identify project source files in the "Source file attribution" section
- •Read those source files using Read tool
- •Cross-reference symbols from the default view with source code to identify specific hot functions
- •For each hot function, check for banned patterns
- •Classify findings:
- •Banned pattern found: report file:line, pattern type, sample context
- •Design constraint: pattern exists due to external requirements (e.g., exchange JSON protocol requires float parsing) — report as finding, do NOT filter out
- •Coverage gap detection: identify modules/elements that have no corresponding benchmark. These represent blind spots in hot-path coverage. Recommend writing benchmarks.
Key principle: report ALL findings. Do not judge whether a finding is "acceptable" or "fixable" during identification. That is a separate concern for the fix suggestions section.
Do NOT call check-hot-path.sh — that is a freshness reminder, not an analysis tool.
Phase 3: Fix Suggestions
For each finding from Phase 2, provide a fix suggestion:
- •Actionable fix: concrete code change (e.g., "convert float price to int64_t tick units")
- •Design constraint: explain why the pattern exists, suggest isolation strategies (e.g., "move float→int conversion to a dedicated function, minimize hot-path exposure")
- •No fix needed: explain why (e.g., "compiler intrinsic, not actual floating point")
Fix suggestions are for human review — they are recommendations, not actions to take.
Phase 4: Spec Compliance (with --spec)
For each HOT/WARM element described in specs:
- •
Latency budget is explicit and grounded
- •Every stage has a ns/us budget
- •Budget is labeled: [M]easured, [D]esign estimate, or [T]heoretical
- •No unqualified claims ("fast", "low latency", "approximately")
- •
Zone classification is consistent
- •Element classified as HOT/WARM/COLD in spec matches profiling data
- •If spec says HOT but source file doesn't appear in perf reports — investigate
- •
Data structures fit target cache level
- •Hot-path structs (<=64 bytes) should fit L1
- •Per-exchange book data should fit L2 or use prefetch
- •Total SHM footprint should fit L3 or use huge pages
- •
I/O model matches zone
- •HOT: no blocking syscalls (verify tier-specific behavior)
- •WARM: non-blocking allowed, no mutexes
- •COLD/CONSTRAINED: anything goes
Phase 5: Design Compliance
- •CRTP verification: hot-path polymorphism uses CRTP, not virtual
- •Integer arithmetic: prices/quantities use int64_t, band walk uses __int128
- •Cache layout audit: for every struct on HOT/WARM path, check:
- •
sizeofandalignas— documented and appropriate for use case - •Cross-thread atomic fields on separate cache lines (
alignas(64)) — no false sharing - •
static_assert(sizeof(...))present — guards against silent struct growth - •Struct fits target cache level: <=64B for L1-hot, <=1KB for L2, prefetch for larger
- •Producer/consumer fields isolated (e.g., write_pos and read_pos on different cache lines)
- •
Phase 6: Measurement Verification
- •Cross-reference spec performance claims with benchmark results and profiling data
- •For each claim: find corresponding benchmark in
test-reports/ - •Classify: MEETS | EXCEEDS | MISSES | NO_DATA
- •Flag any claim without measured backing
Zone Model (Reference)
HOT <10us 0 alloc, 0 syscall, 0 indirect call, 0 float, 0 exception WARM <500us same as HOT + larger sequential scans with prefetch COLD ms-level STL, exceptions, heap alloc, virtual all OK CONSTRAINED imposed accept cost, isolate from HOT/WARM (gRPC, HTTP/2, TLS)
Banned Patterns (HOT/WARM)
| Category | Patterns |
|---|---|
| Indirect calls | virtual, std::function, std::any, dynamic_cast, typeid |
| Heap allocation | new, delete, malloc, make_shared, make_unique, std::string ctor, vector::push_back |
| System calls | clock_gettime, gettimeofday, std::cout, printf |
| Floating point | double, float, stod, atof |
| Exceptions | throw, try, catch, .at() |
| Blocking | std::mutex, std::lock_guard, pthread_mutex_lock |
Output Format
# Low-Latency Audit Report **Scope**: [all code | specific file] **Date**: YYYY-MM-DD HH:MM **Profiling data**: [present, dated YYYY-MM-DD | missing | stale] ## Summary | Category | Findings | Coverage gaps | |----------|----------|---------------| | Banned patterns | N | - | | Spec compliance | N | - | | Design compliance | N | - | | Measurement gaps | - | N | ## Hot-Path Identification ### File: path/to/file.h (X.X% samples in benchmark_name) 1. **line:N** — `float` in `FeedElement::parse_price()` Context: exchange sends JSON with float prices, conversion required 2. **line:M** — `std::string` ctor in `FeedElement::on_message()` Context: WebSocket frame handling ## Fix Suggestions 1. **file.h:N** — `float` → convert to int64_t tick units at parse boundary Type: design constraint — float→int conversion unavoidable, minimize scope 2. **file.h:M** — `std::string` → use std::string_view or pre-allocated buffer Type: actionable fix ## Coverage Gaps - AggregatorElement: no benchmark exists - GrpcBridgeElement: COLD zone, benchmark not required
Principles
- •Identification ≠ optimization: report all findings without judging fixability. Fix suggestions are separate.
- •AI reads perf report text, shell runs perf: clear separation of concerns
- •Measured > estimated > ungrounded: prefer benchmark data over theoretical claims
- •Re-profile after changes: new modules or optimizations invalidate old profiling data