AgentSkillsCN

performance

分析并提升性能:进行性能剖析、定位瓶颈、优化代码,并通过可观测性(日志记录、指标、追踪)为代码注入可观测能力。适用于用户询问性能问题、代码运行缓慢、存在瓶颈、进行性能剖析、优化代码,或希望为代码添加可观测性时使用。

SKILL.md
--- frontmatter
name: performance
description: "Analyze and improve performance: profile, find bottlenecks, optimize, and instrument code with observability (logging, metrics, tracing). Use when the user asks about performance, slow code, bottlenecks, profiling, optimization, or adding observability."
triggers:
  - "/perf"
  - "performance"
  - "slow"
  - "bottleneck"
  - "profile"
  - "optimize"
  - "speed up"
  - "benchmark"
  - "add logging"
  - "add metrics"
  - "add tracing"
  - "instrument"
  - "observability"

Performance Skill

Core Philosophy

"Measure first; optimize where it matters."

Find real bottlenecks with profiling or benchmarks, then improve. Avoid premature or speculative optimization.


Protocol

1. Measure

  • Profile: Use language/runtime profilers (e.g. Node: --inspect / Chrome DevTools; Python: cProfile, py-spy; Go: pprof; Rust: cargo flamegraph).
  • Benchmark: Add or run benchmarks for the hot path (e.g. benchmark.js, pytest-benchmark, go test -bench, cargo bench).
  • Baseline: Record current metrics (time, memory, throughput) so improvements are verifiable.

2. Identify Bottlenecks

  • Hot spots: Where the profiler shows most time or allocations.
  • N+1 / redundant work: Repeated queries, duplicate computation, unnecessary allocations.
  • Algorithm/design: Wrong data structure, O(n²) where O(n) is possible, blocking I/O on hot path.
  • I/O: Disk, network, or DB; consider caching, batching, or async.

Focus on the top one or two bottlenecks; avoid scattering small optimizations.

3. Optimize

  • Algorithm/data structure: Fix the dominant cost first.
  • Caching: Add only where there’s measurable gain and clear invalidation.
  • I/O: Batch, pool, async, or reduce round-trips.
  • Allocations: Reduce in hot loops (reuse, pool, or avoid unnecessary copies) when the profiler shows pressure.

Preserve correctness and readability; add a short comment or test for non-obvious optimizations.

4. Verify

  • Re-run profile or benchmarks; confirm improvement and no regression elsewhere.
  • Run the full test suite.

5. Observability & Instrumentation

Add observability to understand production behavior and diagnose issues:

Logging

PrincipleImplementation
Structured logsUse JSON format with consistent fields (timestamp, level, message, context)
Log levelsERROR (failures), WARN (degraded), INFO (business events), DEBUG (troubleshooting)
Correlation IDsPass request ID through all services; include in every log line
What to logRequest/response summaries, errors with stack traces, business events, slow operations
What NOT to logSecrets, PII, full payloads (unless debug), high-volume low-value events
javascript
// Good: Structured, contextual
logger.info(
  { requestId, userId, action: "checkout", itemCount: 3 },
  "Checkout started"
);

// Bad: Unstructured, no context
console.log("checkout started");

Metrics

Metric TypeUse ForExamples
CounterEvents that only increasehttp_requests_total, errors_total, orders_placed
GaugeValues that go up/downactive_connections, queue_depth, cache_size
HistogramDistributions (latency, size)request_duration_seconds, response_size_bytes

Key metrics to instrument:

  • Request rate, error rate, duration (RED method)
  • Saturation (queue depth, connection pool usage)
  • Business metrics (signups, purchases, API calls by endpoint)

Tracing

For distributed systems, add tracing to follow requests across services:

ConceptPurpose
TraceEnd-to-end journey of a request
SpanSingle operation within a trace (DB query, HTTP call, function)
Context propagationPass trace ID in headers between services

When to add spans:

  • External calls (HTTP, gRPC, DB, cache, queue)
  • Significant internal operations (batch processing, complex calculations)
  • Entry points (API handlers, queue consumers)

Instrumentation by Ecosystem

EcosystemLoggingMetricsTracing
Nodepino, winstonprom-client@opentelemetry/sdk-node
Pythonstructlog, loggingprometheus_clientopentelemetry-sdk
Gozap, zerologprometheus/client_golanggo.opentelemetry.io/otel
Rusttracing, slogprometheus crateopentelemetry crate

6. Commands (by ecosystem)

EcosystemProfileBenchmark
Nodenode --inspect, Chrome DevToolsbenchmark, built-in perf_hooks
Pythonpython -m cProfile, py-spypytest-benchmark, timeit
Gogo test -cpuprofile, pprofgo test -bench
Rustcargo flamegraph, perfcargo bench, criterion

Checklist

  • Bottleneck identified with data (profile or benchmark), not guess.
  • Change targets the hot path or dominant cost.
  • Improvement measured; tests still pass.
  • Trade-offs (e.g. readability, memory) noted when relevant.
  • Observability added: structured logging, key metrics, tracing for distributed calls.
  • No sensitive data in logs or metrics; correlation IDs propagated.