AgentSkillsCN

build-jvm-analysis

JVM 运行时分析。适用于:分析 JVM 性能、调优垃圾回收机制、调试内存泄漏问题,或借助生产环境数据查找死代码。不适用于一般的 Java 开发,也不适合解答语法相关问题。

SKILL.md
--- frontmatter
name: build-jvm-analysis
description: "JVM runtime analysis. Use when: profiling JVM performance, tuning GC, debugging memory leaks, finding dead code with production data. Not for general Java development or syntax questions."

JVM Analysis

Patterns for analyzing, optimizing, and debugging JVM applications — both static and runtime.

Philosophy

  1. Measure before tuning — profile first, optimize based on evidence
  2. Static + runtime complement — static finds structural issues, runtime finds behavioral issues
  3. Production-safe tooling — low overhead profilers that won't crash or slow production
  4. Understand tool limitations — safepoint bias, entry point coverage, what tools can't see
  5. Right tool for the job — no single tool does everything

Static Analysis

Dead Code Detection

code
Finding unused code?
├── Have production traffic → Runtime analysis (Scavenger)
├── Need static-only analysis →
│   ├── Simple → ProGuard -printusage
│   └── Custom analysis → SootUp or ProGuard Core
└── Just bug patterns → SpotBugs, Error Prone
ToolTypeWhat It DoesWhen to Use
ScavengerRuntimeTracks actual usage in productionHave production data
ProGuard -printusageStaticLists unreachable from entry pointsKnow your entry points
SootUpLibraryCall graphs, data flow analysisBuilding custom analysis
ProGuard CoreLibraryBytecode analysis primitivesBuilding custom tools
Dead Code AgentRuntimeTracks class loadingQuick prototype

Key insight: No turnkey "run this, get dead code" CLI exists. You either:

  • Deploy runtime agents (needs production) — Scavenger
  • Configure static analysis (needs entry points) — ProGuard
  • Build custom tooling (needs engineering) — SootUp

SootUp for Code Analysis

SootUp provides call graph construction and analysis primitives. Useful for:

  • Building reachability analysis (find what's called from entry points)
  • Data flow analysis (track how values propagate)
  • Dependency mapping (what calls what)

(Karakaya et al., TACAS 2024)

Gotcha: SootUp is a library, not a tool. You build analysis on top of it.

Bug Detection (Not Dead Code)

ToolFocusIntegration
SpotBugsBug patterns (400+)Gradle/Maven, CI
Error ProneCompile-time checksjavac plugin
NullAwayNull safetyError Prone plugin

Runtime Analysis

Profiler Selection

code
Need production profiling?
├── YES → CPU or memory?
│   ├── CPU → Need flame graphs?
│   │   ├── YES → async-profiler
│   │   └── NO → JFR (built-in, zero config)
│   └── Memory → JFR (allocation profiling)
└── NO (development only) → VisualVM or IntelliJ Profiler
ProfilerOverheadSafepoint-FreeOutputTradeoffBest For
async-profiler~2% CPUYesFlame graphs, JFRRequires native agent attachmentProduction CPU/allocation
JFR + JMC~1-2%Partial (improved Java 16+)Binary eventsLess granular CPU dataContinuous monitoring
VisualVM5-10%NoVariousSafepoint bias distorts resultsDevelopment only
IntelliJ Profiler~2%Yes (uses async-profiler)Flame graphsIDE dependencyIDE-integrated

(InfoQ 2025)

Why safepoint-free matters: JVM can only safely inspect threads at safepoints. JVMTI-based profilers (VisualVM, hprof) miss code between safepoints, skewing flame graphs toward safepoint-heavy code. async-profiler uses AsyncGetCallTrace to sample anytime (Wakart 2016).

Garbage Collector Selection

code
Heap size?
├── < 4 GB → G1 (default since JDK 9)
│   WHY: ZGC/Shenandoah overhead not justified; G1's region-based collection efficient at this scale
├── 4-32 GB → Latency-sensitive?
│   ├── YES → ZGC or Shenandoah
│   │   WHY: Concurrent marking/compaction keeps pauses <10ms regardless of heap size
│   └── NO → G1
│       WHY: G1's mixed collections handle this range well; simpler tuning
└── > 32 GB → ZGC (generational, JDK 21+)
    WHY: ZGC's concurrent compaction scales linearly; G1 pauses grow with heap
CollectorPause TargetHeap SizeTradeoffJDKBest For
G1GC200ms (tunable)AnyPauses scale with heap9+ defaultGeneral workloads
ZGC<1msLarge (100GB+)~15% throughput cost vs G115+ prod, 21+ genLatency-critical
Shenandoah<10msLargeHigher CPU for barriers12+ (Red Hat)Low-latency, older JDKs
ParallelMax throughputMediumStop-the-world onlyAllBatch processing

Why the thresholds:

  • <4GB: G1's region-based approach (2048 regions default) works well. ZGC's colored pointers and load barriers add overhead not justified at small scale (Oracle GC Tuning Guide).
  • 4-32GB: The "compressed OOPs" boundary. Above 32GB, object pointers expand from 4 to 8 bytes, increasing memory footprint ~20%. ZGC handles this better (Shipilev 2019).
  • >32GB: G1's pause times grow with live set size during mixed collections. ZGC's concurrent compaction maintains <1ms regardless (ZGC wiki, Oracle).

Key insight: ZGC generational (JDK 21+) closes the throughput gap — concurrent minor collections reduce allocation pressure (JEP 439).

(Oracle GC Tuning Guide, Shipilev JVM Anatomy Quarks, JEP 439)

Heap Dump Analysis

code
OOM or suspected leak?
├── Capture dump → -XX:+HeapDumpOnOutOfMemoryError
├── Analyze → Eclipse MAT or HeapHero
│   ├── Run "Leak Suspects" report
│   ├── Check retained heap (not just shallow)
│   └── Path to GC Roots (exclude weak refs)
└── Fix → Collections holding references, static fields, caches without eviction

Thread Dump Analysis

code
Application hanging or slow?
├── Capture → jstack -l <pid> (or jcmd <pid> Thread.print)
├── Take 3-5 dumps seconds apart
├── Analyze:
│   ├── BLOCKED threads → lock contention
│   ├── WAITING on same monitor → bottleneck
│   └── Same stack across dumps → stuck thread
└── Tools: FastThread.io, TDA, or manual grep

Production Gotchas

Safepoint Bias

  • Trap: Traditional profilers (VisualVM, hprof) only sample at safepoints
  • Impact: Misleading flame graphs — hot spots skewed to safepoint-heavy code
  • Detection: Compare async-profiler output vs traditional profiler
  • Fix: Use async-profiler or JFR (Java 16+ with JEP 376)

Why it happens: JVM can only safely inspect thread state at safepoints — points where all threads are known to be in a consistent state. JVMTI's GetStackTrace requires this. Safepoints occur at method returns, loop back-edges, and allocation. Tight loops without allocations may run millions of cycles between safepoints, becoming invisible (Wakart 2015).

(Wakart 2015, async-profiler docs)

DebugNonSafepoints Flag

  • Trap: Even async-profiler needs -XX:+DebugNonSafepoints for accurate frame resolution
  • Impact: Inlined methods may not appear in profiles
  • Fix: Start JVM with flag, or attach agent early
bash
# At JVM start (recommended)
java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -agentpath:/path/to/libasyncProfiler.so ...

# Late attach works but misses already-compiled methods

Container Memory Limits (OOMKilled)

  • Trap: JVM uses more memory than -Xmx (native memory, metaspace, stacks, codecache)
  • Impact: Kubernetes kills pod with OOMKilled even when heap looks fine
  • Detection: kubectl describe pod shows OOMKilled; native memory tracking shows usage
  • Fix: Budget 25-30% of container memory for non-heap
bash
# Good: percentage-based, container-aware
java -XX:MaxRAMPercentage=75.0 -XX:+UseContainerSupport ...

# Budget breakdown for 2GB container:
# Heap: ~1.5GB (75%)
# Metaspace: ~100MB (default MaxMetaspaceSize unbounded, set explicitly)
# Thread stacks: ~100MB (100 threads × 1MB default Linux stack)
# CodeCache: ~50MB (240MB reserved, typically uses ~50MB)
# Native/JNI: ~150MB buffer (JDBC drivers, compression libs, etc.)

Why 25-30%: Empirical guidance from production incidents. Exact overhead depends on workload — NMT (Native Memory Tracking) gives precise breakdown for your app (Schatzl, Oracle GC team). Spring Boot apps with web frameworks often need closer to 30%; minimal services can use 20% (Datadog 2024).

(JEP 345, Datadog JVM Container Best Practices)

Heap Dump Performance Impact

  • Trap: Heap dumps pause the JVM during capture
  • Impact: Pause duration depends on heap size and live objects — observed 1-3 seconds per GB in production (SAP Memory Analyzer docs, Eclipse MAT wiki)
  • Fix: Use continuous profiling (JFR) for allocation tracking; reserve dumps for post-mortem or during maintenance windows

Why the pause: Full GC + heap traversal + I/O. The JVM must walk all live objects to write the dump. Parallel GC can speed traversal but I/O often dominates (Eclipse MAT FAQ).

JMH Microbenchmark Pitfalls

  • Trap: Dead code elimination, constant folding, insufficient warmup
  • Impact: Benchmarks show 10x faster than production
  • Detection: Results too good to be true; -XX:+PrintCompilation shows unexpected inlining
  • Fix: Use Blackhole.consume(), @State objects, sufficient warmup
java
// Wrong: JIT may eliminate this
@Benchmark
public void bad() {
    compute(); // No side effects, may be removed
}

// Correct: Blackhole prevents DCE
@Benchmark
public void good(Blackhole bh) {
    bh.consume(compute());
}

Tool Selection by Scenario

ScenarioPrimary ToolAlternative
Production CPU profileasync-profilerJFR
Allocation hotspotsJFRasync-profiler --alloc
Memory leakHeap dump + MATHeapHero
Deadlockjstack -lJMC thread analysis
GC issuesGC logs + GCViewerJFR
Container sizingNMT + metricsVisualVM (dev)
MicrobenchmarksJMH(no alternative)

Quick Reference

Profiling Commands

bash
# async-profiler CPU flame graph
./profiler.sh -d 30 -f flamegraph.html <pid>

# JFR recording (no overhead until dump)
jcmd <pid> JFR.start duration=60s filename=recording.jfr

# Heap dump
jcmd <pid> GC.heap_dump /path/to/dump.hprof

# Thread dump
jcmd <pid> Thread.print > threads.txt

# Native memory tracking
java -XX:NativeMemoryTracking=summary ...
jcmd <pid> VM.native_memory summary

GC Flags

bash
# G1 (default, balanced)
-XX:+UseG1GC -XX:MaxGCPauseMillis=200

# ZGC (ultra-low latency, JDK 21+ generational default)
-XX:+UseZGC

# Shenandoah (low-latency, older JDKs)
-XX:+UseShenandoahGC

# Diagnostics
-Xlog:gc*:file=gc.log:time,tags

Container Flags

bash
# Production container setup
-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/dumps/ \
-XX:+ExitOnOutOfMemoryError

Specialized References

Load reference based on context:

DetectedLoad
Profiling, flame graphs, samplingprofiling.md
GC tuning, pause times, heap sizinggc.md
Memory leaks, thread dumps, OOMdebugging.md
Kubernetes, containers, cgroupscontainers.md

Obsolete Patterns

ObsoleteReplacementWhy
-XX:+PrintGCDetails-Xlog:gc*Unified logging (JDK 9+)
VisualVM for productionasync-profiler / JFRSafepoint bias, overhead
Manual -Xmx in containersMaxRAMPercentageContainer-aware
jmap -heapjcmd GC.heap_infojcmd preferred
hprofJFRhprof removed JDK 9+
CMSG1 or ZGCCMS removed JDK 14

Anti-Patterns

Don'tDoWhy
Profile with default VisualVM in prodUse async-profiler or JFRSafepoint bias, overhead
Set -Xmx = container limitLeave 25-30% for non-heapOOMKilled by cgroup
Trust microbenchmarks naivelyUse JMH properlyJIT optimizations mislead
Tune GC without measuringProfile first, tune secondPremature optimization
Use -XX:+PrintGCDetails (JDK 9+)Use unified logging -Xlog:gc*Old flags deprecated
Ignore safepoint biasCheck -XX:+DebugNonSafepointsHidden hot spots