AgentSkillsCN

optimizing-critical-paths

当代码运行“过于缓慢”、需要“优化”、存在性能问题、“耗时过长”、“卡顿”、“运行迟缓”、超时、内存溢出(OOM)错误、CPU或内存占用过高,或者“无法扩展”时使用。

SKILL.md
--- frontmatter
name: optimizing-critical-paths
description: Use when code is "too slow", needs "optimization", has performance issues, "takes forever", "hangs", "laggy", timeouts, OOM errors, high CPU/memory, or "doesn't scale".

Optimize Critical Paths

Overview

Core Principle: Clean design and high performance are compatible. Simpler code usually runs faster because there are fewer special cases to check and fewer layer crossings.

Key Insight: Don't optimize based on intuition—measure first. Intuitions about performance are unreliable, even for experienced developers.

When to Use

  • Code is "too slow" or has performance issues
  • Optimizing hot paths or critical sections
  • Analyzing system bottlenecks
  • Choosing between equally clean alternatives
  • Symptom keywords: "optimize", "make faster", "too slow", "performance", "bottleneck", "takes forever", "hangs", "laggy", "timeout", "OOM", "memory issues", "high CPU", "doesn't scale"

When NOT to Use

  • Creating new modules (see: designing-deep-modules)
  • Evaluating design quality (see: reviewing-module-design)
  • Simplifying without performance focus (see: simplifying-complexity)
  • Improving clarity (see: improving-code-clarity)
  • Modifying behavior (see: maintaining-design-quality)

The Simplicity-Performance Relationship

MythReality
"Performance requires complexity"Simpler code usually runs faster
"Clean design sacrifices speed"Clean design and high performance are compatible
"Optimization means adding code"Optimization often means removing code

Why simplicity improves performance:

  • Fewer special cases = no code to check for those cases
  • Deep classes = more work per call, fewer layer crossings
  • Each layer crossing adds overhead
  • Complicated code does extraneous or redundant work

Expensive Operations Reference

Know these costs when choosing between alternatives:

OperationCostContext
Network (datacenter)10–50 μsTens of thousands of instructions
Network (wide-area)10–100 msMillions of instructions
Disk I/O5–10 msMillions of instructions
Flash storage10–100 μsThousands of instructions
Dynamic memory allocationSignificantmalloc/new, freeing, GC overhead
Cache missFew hundred cyclesOften determines overall performance

Performance Optimization Workflow

Stage 1: Measurement First (MANDATORY GATE)

code
BEFORE making any performance changes:

1. MEASURE existing system behavior
   - Where does the system spend most time?
   - Not just "system is slow" — identify specific locations

2. IDENTIFY small number of very specific places
   - With ideas for improvement
   - Focus on what matters most

3. ESTABLISH baseline
   - You'll need this to verify improvements

What counts as valid measurement:

  • Actual profiling data (timing, call counts, memory usage)
  • Multiple runs to account for variance
  • Specific hotspot identification, not just "it's slow"

What does NOT count:

  • "User said it's slow" (user perception ≠ bottleneck location)
  • Pattern-matching to Expensive Operations table
  • "This is obviously expensive" (intuition)
  • "I'll measure after I make the change" (confirmation bias)

⚠️ Stage 1 is a GATE, not a suggestion. You cannot proceed to Stage 2 without completing measurement. If measurement is genuinely impossible, document why and what proxy you're using.

Performance Dimensions: When measuring, identify WHICH dimension is the problem:

  • Throughput: Operations per second
  • Latency: Time per operation
  • Memory: Peak/average usage, allocation rate
  • CPU: Utilization percentage

Different problems require different solutions. A memory optimization won't fix a latency problem.

Stage 2: Look for Fundamental Fixes

code
FIRST, check for fundamental fixes (preferred over code tweaks):

□ Can you add a cache?
□ Can you use a different algorithm? (e.g., balanced tree vs. list)
□ Can you bypass layers? (e.g., kernel bypass for networking)

IF fundamental fix exists → implement using standard design techniques
IF NOT → proceed to critical path redesign

Stage 3: Critical Path Redesign (Last Resort)

code
ONLY when no fundamental fix is available:

1. ASK: What is the smallest amount of code for the common case?

2. DISREGARD existing code structure entirely
   - Imagine writing a new method that implements JUST the critical path

3. IGNORE special cases in current code
   - Consider only data needed for critical path
   - Choose most convenient data structure

4. DEFINE "the ideal"
   - The simplest and fastest code assuming complete redesign freedom
   - Even if not practically achievable, it's your target

5. DESIGN the rest of the class around these critical paths
   - Apply design principles from other skills

After Making Changes

code
1. RE-MEASURE to verify measurable performance difference

2. EVALUATE the tradeoff:
   - Did changes provide significant speedup (with data)? → Keep
   - Did changes make system simpler AND at least as fast? → Keep
   - Neither? → BACK THEM OUT

⚠️ "Simpler" alone is not enough. You must verify the simpler version is at least as fast. Don't assume simpler = faster without measurement.


Anti-Rationalization Table

RationalizationCounter
"User said it's slow, that's my measurement"User perception ≠ bottleneck location. Measure to find WHERE.
"Looking at the table, this is obviously expensive"Pattern-matching isn't profiling. Measure actual time.
"I'll make the change then measure to verify"Confirmation bias. Measure FIRST to find the real bottleneck.
"Setting up profiling is too complex"If you can't measure, you can't verify improvement. Do the work.
"This scope is too small to measure"Micro-optimizations without measurement add complexity for nothing.
"I checked the fundamental fix checklist"Checklist is for ideas AFTER measurement shows the bottleneck.
"The code is simpler now, so it's faster"Simpler doesn't automatically mean faster. Verify with measurement.
"I found a red flag pattern"Red flags are descriptive, not prescriptive. Measure if it's actually slow.
"I already profiled extensively"Share the data. Without data, it's still intuition.

Red Flags

Red FlagSymptomPerformance Impact
Death by Thousand CutsMany small inefficiencies everywhereSystem 5–10x slower; no single fix helps
Pass-Through MethodsMethod with identical signature to callerUnnecessary layer crossing overhead
Shallow LayersMultiple layers providing same abstractionEach call adds overhead
Repeated Special CasesSame conditions checked multiple timesRedundant work on every call
Premature OptimizationOptimizing without measurementAdds complexity without verified benefit
Intuition-Based Changes"This should be faster" without dataUnreliable even for experts

Performance-Aware Development

Default approach during normal development:

  1. Develop awareness of fundamentally expensive operations (see reference table)
  2. Choose naturally efficient alternatives when equally clean options exist
  3. If performance turns out to be a problem, optimize later
  4. Exception: Clear evidence performance is critical → implement faster approach immediately

When to Optimize Immediately

SituationAction
Clear evidence performance is criticalImplement faster approach now
Faster design adds only small, hidden complexityMay be worthwhile
Faster design adds lot of complexity OR complicates interfacesStart simple, optimize later

Consolidation Techniques

When optimizing critical paths, look for ways to consolidate:

TechniqueExample
Encode multiple conditions in single valueVariable that is 0 when any of several special cases apply
Single test for multiple casesReplace 6 individual checks with 1 combined check
Combine layers into single methodCritical path handled in one method, not three
Merge variablesCombine multiple values into single structure

Quick Reference

code
PERFORMANCE OPTIMIZATION PRIORITY:

1. MEASURE (MANDATORY GATE)
   - Actual profiling data, not intuition
   - Identify which dimension: throughput, latency, memory, CPU
   - Establish baseline before any changes
   - No measurement = no optimization

2. FUNDAMENTAL FIX - Preferred approach
   - Cache? Better algorithm? Bypass layers?
   - Only consider AFTER measurement shows the bottleneck

3. CRITICAL PATH - Last resort
   - What's the minimum code for common case?
   - Disregard existing structure
   - Define "the ideal"

4. VERIFY - After changes
   - Re-measure with same methodology
   - Faster with data? Keep
   - Simpler AND at least as fast? Keep
   - Neither? BACK OUT

THE RULE:
Measure → Identify → Fix → Verify
Never skip steps. Never assume.

References: aposd-foundations for complexity symptoms