JVM Analysis
Patterns for analyzing, optimizing, and debugging JVM applications — both static and runtime.
Philosophy
- •Measure before tuning — profile first, optimize based on evidence
- •Static + runtime complement — static finds structural issues, runtime finds behavioral issues
- •Production-safe tooling — low overhead profilers that won't crash or slow production
- •Understand tool limitations — safepoint bias, entry point coverage, what tools can't see
- •Right tool for the job — no single tool does everything
Static Analysis
Dead Code Detection
Finding unused code? ├── Have production traffic → Runtime analysis (Scavenger) ├── Need static-only analysis → │ ├── Simple → ProGuard -printusage │ └── Custom analysis → SootUp or ProGuard Core └── Just bug patterns → SpotBugs, Error Prone
| Tool | Type | What It Does | When to Use |
|---|---|---|---|
| Scavenger | Runtime | Tracks actual usage in production | Have production data |
| ProGuard -printusage | Static | Lists unreachable from entry points | Know your entry points |
| SootUp | Library | Call graphs, data flow analysis | Building custom analysis |
| ProGuard Core | Library | Bytecode analysis primitives | Building custom tools |
| Dead Code Agent | Runtime | Tracks class loading | Quick prototype |
Key insight: No turnkey "run this, get dead code" CLI exists. You either:
- •Deploy runtime agents (needs production) — Scavenger
- •Configure static analysis (needs entry points) — ProGuard
- •Build custom tooling (needs engineering) — SootUp
SootUp for Code Analysis
SootUp provides call graph construction and analysis primitives. Useful for:
- •Building reachability analysis (find what's called from entry points)
- •Data flow analysis (track how values propagate)
- •Dependency mapping (what calls what)
(Karakaya et al., TACAS 2024)
Gotcha: SootUp is a library, not a tool. You build analysis on top of it.
Bug Detection (Not Dead Code)
| Tool | Focus | Integration |
|---|---|---|
| SpotBugs | Bug patterns (400+) | Gradle/Maven, CI |
| Error Prone | Compile-time checks | javac plugin |
| NullAway | Null safety | Error Prone plugin |
Runtime Analysis
Profiler Selection
Need production profiling? ├── YES → CPU or memory? │ ├── CPU → Need flame graphs? │ │ ├── YES → async-profiler │ │ └── NO → JFR (built-in, zero config) │ └── Memory → JFR (allocation profiling) └── NO (development only) → VisualVM or IntelliJ Profiler
| Profiler | Overhead | Safepoint-Free | Output | Tradeoff | Best For |
|---|---|---|---|---|---|
| async-profiler | ~2% CPU | Yes | Flame graphs, JFR | Requires native agent attachment | Production CPU/allocation |
| JFR + JMC | ~1-2% | Partial (improved Java 16+) | Binary events | Less granular CPU data | Continuous monitoring |
| VisualVM | 5-10% | No | Various | Safepoint bias distorts results | Development only |
| IntelliJ Profiler | ~2% | Yes (uses async-profiler) | Flame graphs | IDE dependency | IDE-integrated |
(InfoQ 2025)
Why safepoint-free matters: JVM can only safely inspect threads at safepoints. JVMTI-based profilers (VisualVM, hprof) miss code between safepoints, skewing flame graphs toward safepoint-heavy code. async-profiler uses AsyncGetCallTrace to sample anytime (Wakart 2016).
Garbage Collector Selection
Heap size?
├── < 4 GB → G1 (default since JDK 9)
│ WHY: ZGC/Shenandoah overhead not justified; G1's region-based collection efficient at this scale
├── 4-32 GB → Latency-sensitive?
│ ├── YES → ZGC or Shenandoah
│ │ WHY: Concurrent marking/compaction keeps pauses <10ms regardless of heap size
│ └── NO → G1
│ WHY: G1's mixed collections handle this range well; simpler tuning
└── > 32 GB → ZGC (generational, JDK 21+)
WHY: ZGC's concurrent compaction scales linearly; G1 pauses grow with heap
| Collector | Pause Target | Heap Size | Tradeoff | JDK | Best For |
|---|---|---|---|---|---|
| G1GC | 200ms (tunable) | Any | Pauses scale with heap | 9+ default | General workloads |
| ZGC | <1ms | Large (100GB+) | ~15% throughput cost vs G1 | 15+ prod, 21+ gen | Latency-critical |
| Shenandoah | <10ms | Large | Higher CPU for barriers | 12+ (Red Hat) | Low-latency, older JDKs |
| Parallel | Max throughput | Medium | Stop-the-world only | All | Batch processing |
Why the thresholds:
- •<4GB: G1's region-based approach (2048 regions default) works well. ZGC's colored pointers and load barriers add overhead not justified at small scale (Oracle GC Tuning Guide).
- •4-32GB: The "compressed OOPs" boundary. Above 32GB, object pointers expand from 4 to 8 bytes, increasing memory footprint ~20%. ZGC handles this better (Shipilev 2019).
- •>32GB: G1's pause times grow with live set size during mixed collections. ZGC's concurrent compaction maintains <1ms regardless (ZGC wiki, Oracle).
Key insight: ZGC generational (JDK 21+) closes the throughput gap — concurrent minor collections reduce allocation pressure (JEP 439).
(Oracle GC Tuning Guide, Shipilev JVM Anatomy Quarks, JEP 439)
Heap Dump Analysis
OOM or suspected leak? ├── Capture dump → -XX:+HeapDumpOnOutOfMemoryError ├── Analyze → Eclipse MAT or HeapHero │ ├── Run "Leak Suspects" report │ ├── Check retained heap (not just shallow) │ └── Path to GC Roots (exclude weak refs) └── Fix → Collections holding references, static fields, caches without eviction
Thread Dump Analysis
Application hanging or slow? ├── Capture → jstack -l <pid> (or jcmd <pid> Thread.print) ├── Take 3-5 dumps seconds apart ├── Analyze: │ ├── BLOCKED threads → lock contention │ ├── WAITING on same monitor → bottleneck │ └── Same stack across dumps → stuck thread └── Tools: FastThread.io, TDA, or manual grep
Production Gotchas
Safepoint Bias
- •Trap: Traditional profilers (VisualVM, hprof) only sample at safepoints
- •Impact: Misleading flame graphs — hot spots skewed to safepoint-heavy code
- •Detection: Compare async-profiler output vs traditional profiler
- •Fix: Use async-profiler or JFR (Java 16+ with JEP 376)
Why it happens: JVM can only safely inspect thread state at safepoints — points where all threads are known to be in a consistent state. JVMTI's GetStackTrace requires this. Safepoints occur at method returns, loop back-edges, and allocation. Tight loops without allocations may run millions of cycles between safepoints, becoming invisible (Wakart 2015).
(Wakart 2015, async-profiler docs)
DebugNonSafepoints Flag
- •Trap: Even async-profiler needs
-XX:+DebugNonSafepointsfor accurate frame resolution - •Impact: Inlined methods may not appear in profiles
- •Fix: Start JVM with flag, or attach agent early
# At JVM start (recommended) java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -agentpath:/path/to/libasyncProfiler.so ... # Late attach works but misses already-compiled methods
Container Memory Limits (OOMKilled)
- •Trap: JVM uses more memory than
-Xmx(native memory, metaspace, stacks, codecache) - •Impact: Kubernetes kills pod with OOMKilled even when heap looks fine
- •Detection:
kubectl describe podshows OOMKilled; native memory tracking shows usage - •Fix: Budget 25-30% of container memory for non-heap
# Good: percentage-based, container-aware java -XX:MaxRAMPercentage=75.0 -XX:+UseContainerSupport ... # Budget breakdown for 2GB container: # Heap: ~1.5GB (75%) # Metaspace: ~100MB (default MaxMetaspaceSize unbounded, set explicitly) # Thread stacks: ~100MB (100 threads × 1MB default Linux stack) # CodeCache: ~50MB (240MB reserved, typically uses ~50MB) # Native/JNI: ~150MB buffer (JDBC drivers, compression libs, etc.)
Why 25-30%: Empirical guidance from production incidents. Exact overhead depends on workload — NMT (Native Memory Tracking) gives precise breakdown for your app (Schatzl, Oracle GC team). Spring Boot apps with web frameworks often need closer to 30%; minimal services can use 20% (Datadog 2024).
(JEP 345, Datadog JVM Container Best Practices)
Heap Dump Performance Impact
- •Trap: Heap dumps pause the JVM during capture
- •Impact: Pause duration depends on heap size and live objects — observed 1-3 seconds per GB in production (SAP Memory Analyzer docs, Eclipse MAT wiki)
- •Fix: Use continuous profiling (JFR) for allocation tracking; reserve dumps for post-mortem or during maintenance windows
Why the pause: Full GC + heap traversal + I/O. The JVM must walk all live objects to write the dump. Parallel GC can speed traversal but I/O often dominates (Eclipse MAT FAQ).
JMH Microbenchmark Pitfalls
- •Trap: Dead code elimination, constant folding, insufficient warmup
- •Impact: Benchmarks show 10x faster than production
- •Detection: Results too good to be true;
-XX:+PrintCompilationshows unexpected inlining - •Fix: Use
Blackhole.consume(),@Stateobjects, sufficient warmup
// Wrong: JIT may eliminate this
@Benchmark
public void bad() {
compute(); // No side effects, may be removed
}
// Correct: Blackhole prevents DCE
@Benchmark
public void good(Blackhole bh) {
bh.consume(compute());
}
Tool Selection by Scenario
| Scenario | Primary Tool | Alternative |
|---|---|---|
| Production CPU profile | async-profiler | JFR |
| Allocation hotspots | JFR | async-profiler --alloc |
| Memory leak | Heap dump + MAT | HeapHero |
| Deadlock | jstack -l | JMC thread analysis |
| GC issues | GC logs + GCViewer | JFR |
| Container sizing | NMT + metrics | VisualVM (dev) |
| Microbenchmarks | JMH | (no alternative) |
Quick Reference
Profiling Commands
# async-profiler CPU flame graph ./profiler.sh -d 30 -f flamegraph.html <pid> # JFR recording (no overhead until dump) jcmd <pid> JFR.start duration=60s filename=recording.jfr # Heap dump jcmd <pid> GC.heap_dump /path/to/dump.hprof # Thread dump jcmd <pid> Thread.print > threads.txt # Native memory tracking java -XX:NativeMemoryTracking=summary ... jcmd <pid> VM.native_memory summary
GC Flags
# G1 (default, balanced) -XX:+UseG1GC -XX:MaxGCPauseMillis=200 # ZGC (ultra-low latency, JDK 21+ generational default) -XX:+UseZGC # Shenandoah (low-latency, older JDKs) -XX:+UseShenandoahGC # Diagnostics -Xlog:gc*:file=gc.log:time,tags
Container Flags
# Production container setup -XX:+UseContainerSupport \ -XX:MaxRAMPercentage=75.0 \ -XX:+HeapDumpOnOutOfMemoryError \ -XX:HeapDumpPath=/dumps/ \ -XX:+ExitOnOutOfMemoryError
Specialized References
Load reference based on context:
| Detected | Load |
|---|---|
| Profiling, flame graphs, sampling | profiling.md |
| GC tuning, pause times, heap sizing | gc.md |
| Memory leaks, thread dumps, OOM | debugging.md |
| Kubernetes, containers, cgroups | containers.md |
Obsolete Patterns
| Obsolete | Replacement | Why |
|---|---|---|
-XX:+PrintGCDetails | -Xlog:gc* | Unified logging (JDK 9+) |
| VisualVM for production | async-profiler / JFR | Safepoint bias, overhead |
Manual -Xmx in containers | MaxRAMPercentage | Container-aware |
jmap -heap | jcmd GC.heap_info | jcmd preferred |
| hprof | JFR | hprof removed JDK 9+ |
| CMS | G1 or ZGC | CMS removed JDK 14 |
Anti-Patterns
| Don't | Do | Why |
|---|---|---|
| Profile with default VisualVM in prod | Use async-profiler or JFR | Safepoint bias, overhead |
Set -Xmx = container limit | Leave 25-30% for non-heap | OOMKilled by cgroup |
| Trust microbenchmarks naively | Use JMH properly | JIT optimizations mislead |
| Tune GC without measuring | Profile first, tune second | Premature optimization |
Use -XX:+PrintGCDetails (JDK 9+) | Use unified logging -Xlog:gc* | Old flags deprecated |
| Ignore safepoint bias | Check -XX:+DebugNonSafepoints | Hidden hot spots |