AgentSkillsCN

low-latency-audit

针对热点/热路径代码进行深度合规性审计。AI可逐行读取性能报告文本,并结合源代码,实现函数级别的精准分析。识别热点路径,报告禁用模式,并提出优化建议。识别与优化是相互独立的考量因素。

SKILL.md
--- frontmatter
name: low-latency-audit
description: >
  Deep compliance audit for hot/warm path code. AI reads perf report text
  + source code for function-level precision. Identifies hot paths, reports
  banned patterns, and suggests fixes. Identification and optimization are
  separate concerns.

Low-Latency Audit

Deep compliance audit driven by AI reading perf report text + source code directly.

Shell scripts collect data (perf record/report). AI does the analysis. This audit:

  • Reads perf report text to understand what code is hot (sample percentages, call chains)
  • Reads source code to understand function boundaries, templates, lambdas
  • Identifies banned patterns in hot code with function-level precision
  • Produces two outputs: identification report (what is hot, what patterns exist) and fix suggestions (how to address findings, for human review)

When to Use

  • After implementing hot/warm path code — final compliance gate
  • Before marking implementation tasks complete (touching HOT/WARM code)
  • When spec latency claims seem ungrounded — cross-check with data
  • Periodic review — catch drift as code evolves

When NOT to Use

  • Spec quality review — use /spec-review instead
  • Running benchmarks — run them first, then use this to interpret

Inputs

  • No argument: audit all hot/warm path code in the project
  • File path: audit a specific file
  • --spec: also audit spec files for latency claim compliance

Workflow

Phase 1: Profiling Data Check

  1. Check profile-results/perf-reports/ exists and has .txt files
  2. If missing: warn "No profiling data — run tools/profile-hot-path.sh". Cannot identify hot paths without data. Do NOT fall back to directory-based guessing.
  3. If present: read the perf report text files
  4. Check freshness: compare timestamp against most recent .h/.cpp modification
  5. If stale: warn "Profiling data may be outdated — consider re-running"
  6. If profile-results/flamegraph.svg exists, reference it for visual analysis

Phase 2: Hot-Path Identification (AI-Driven)

Read perf report text files in profile-results/perf-reports/. Each file contains:

  • Default view: symbol-level profiling (comm, dso, symbol with sample percentages)
  • Source file attribution: which source files contribute to hot code

For each perf report:

  1. Identify project source files in the "Source file attribution" section
  2. Read those source files using Read tool
  3. Cross-reference symbols from the default view with source code to identify specific hot functions
  4. For each hot function, check for banned patterns
  5. Classify findings:
    • Banned pattern found: report file:line, pattern type, sample context
    • Design constraint: pattern exists due to external requirements (e.g., exchange JSON protocol requires float parsing) — report as finding, do NOT filter out
  6. Coverage gap detection: identify modules/elements that have no corresponding benchmark. These represent blind spots in hot-path coverage. Recommend writing benchmarks.

Key principle: report ALL findings. Do not judge whether a finding is "acceptable" or "fixable" during identification. That is a separate concern for the fix suggestions section.

Do NOT call check-hot-path.sh — that is a freshness reminder, not an analysis tool.

Phase 3: Fix Suggestions

For each finding from Phase 2, provide a fix suggestion:

  • Actionable fix: concrete code change (e.g., "convert float price to int64_t tick units")
  • Design constraint: explain why the pattern exists, suggest isolation strategies (e.g., "move float→int conversion to a dedicated function, minimize hot-path exposure")
  • No fix needed: explain why (e.g., "compiler intrinsic, not actual floating point")

Fix suggestions are for human review — they are recommendations, not actions to take.

Phase 4: Spec Compliance (with --spec)

For each HOT/WARM element described in specs:

  1. Latency budget is explicit and grounded

    • Every stage has a ns/us budget
    • Budget is labeled: [M]easured, [D]esign estimate, or [T]heoretical
    • No unqualified claims ("fast", "low latency", "approximately")
  2. Zone classification is consistent

    • Element classified as HOT/WARM/COLD in spec matches profiling data
    • If spec says HOT but source file doesn't appear in perf reports — investigate
  3. Data structures fit target cache level

    • Hot-path structs (<=64 bytes) should fit L1
    • Per-exchange book data should fit L2 or use prefetch
    • Total SHM footprint should fit L3 or use huge pages
  4. I/O model matches zone

    • HOT: no blocking syscalls (verify tier-specific behavior)
    • WARM: non-blocking allowed, no mutexes
    • COLD/CONSTRAINED: anything goes

Phase 5: Design Compliance

  1. CRTP verification: hot-path polymorphism uses CRTP, not virtual
  2. Integer arithmetic: prices/quantities use int64_t, band walk uses __int128
  3. Cache layout audit: for every struct on HOT/WARM path, check:
    • sizeof and alignas — documented and appropriate for use case
    • Cross-thread atomic fields on separate cache lines (alignas(64)) — no false sharing
    • static_assert(sizeof(...)) present — guards against silent struct growth
    • Struct fits target cache level: <=64B for L1-hot, <=1KB for L2, prefetch for larger
    • Producer/consumer fields isolated (e.g., write_pos and read_pos on different cache lines)

Phase 6: Measurement Verification

  1. Cross-reference spec performance claims with benchmark results and profiling data
  2. For each claim: find corresponding benchmark in test-reports/
  3. Classify: MEETS | EXCEEDS | MISSES | NO_DATA
  4. Flag any claim without measured backing

Zone Model (Reference)

code
HOT         <10us     0 alloc, 0 syscall, 0 indirect call, 0 float, 0 exception
WARM        <500us    same as HOT + larger sequential scans with prefetch
COLD        ms-level  STL, exceptions, heap alloc, virtual all OK
CONSTRAINED imposed   accept cost, isolate from HOT/WARM (gRPC, HTTP/2, TLS)

Banned Patterns (HOT/WARM)

CategoryPatterns
Indirect callsvirtual, std::function, std::any, dynamic_cast, typeid
Heap allocationnew, delete, malloc, make_shared, make_unique, std::string ctor, vector::push_back
System callsclock_gettime, gettimeofday, std::cout, printf
Floating pointdouble, float, stod, atof
Exceptionsthrow, try, catch, .at()
Blockingstd::mutex, std::lock_guard, pthread_mutex_lock

Output Format

markdown
# Low-Latency Audit Report

**Scope**: [all code | specific file]
**Date**: YYYY-MM-DD HH:MM
**Profiling data**: [present, dated YYYY-MM-DD | missing | stale]

## Summary

| Category | Findings | Coverage gaps |
|----------|----------|---------------|
| Banned patterns | N | - |
| Spec compliance | N | - |
| Design compliance | N | - |
| Measurement gaps | - | N |

## Hot-Path Identification

### File: path/to/file.h (X.X% samples in benchmark_name)
1. **line:N** — `float` in `FeedElement::parse_price()`
   Context: exchange sends JSON with float prices, conversion required
2. **line:M** — `std::string` ctor in `FeedElement::on_message()`
   Context: WebSocket frame handling

## Fix Suggestions

1. **file.h:N** — `float` → convert to int64_t tick units at parse boundary
   Type: design constraint — float→int conversion unavoidable, minimize scope
2. **file.h:M** — `std::string` → use std::string_view or pre-allocated buffer
   Type: actionable fix

## Coverage Gaps
- AggregatorElement: no benchmark exists
- GrpcBridgeElement: COLD zone, benchmark not required

Principles

  • Identification ≠ optimization: report all findings without judging fixability. Fix suggestions are separate.
  • AI reads perf report text, shell runs perf: clear separation of concerns
  • Measured > estimated > ungrounded: prefer benchmark data over theoretical claims
  • Re-profile after changes: new modules or optimizations invalidate old profiling data