Java Performance Profiling (JFR + async-profiler workflow)

Intent

This skill gives you a repeatable profiling loop:

•Confirm the symptom and isolate a reproducible scenario
•Capture evidence with the lowest-risk tool first (JFR)
•Analyze: CPU hot paths, allocation hotspots, lock contention, thread states
•Make a small fix
•Re-measure and produce a concise report

Goal: ship performance fixes with confidence, not guesswork.

Scope

In scope

•CPU profiling (sampling)
•Allocation profiling (where memory is allocated)
•Lock contention analysis
•Thread state analysis (runnable/blocked/waiting)
•JFR capture and analysis
•async-profiler-style flamegraph workflow (conceptual + practical guidance)
•Verification via A/B load tests

Out of scope

•Microbenchmark-only optimization (use JMH skill if needed)
•OS kernel perf tuning
•Full distributed tracing diagnosis (covered by observability skill)

When to use

Triggers:

•CPU high in production
•p95/p99 latency spikes
•throughput regression after release
•frequent GC due to allocation storms
•lock contention symptoms (threads blocked, long waits)

Required inputs (context to attach in Cursor)

•The hot endpoints / services
•Recent changes (PR diff)
•Baseline metrics dashboards (latency, CPU, GC, error rate)
•
Deployment constraints:
- •can you run JFR in prod?
- •can you attach profilers?
- •can you reproduce in staging?

Safety-first tool selection

Use tools in this order:

•Metrics + logs + traces to locate where to profile
•JFR for production-safe profiling
•async-profiler for deeper CPU/alloc/lock signals when allowed
•Heap dump / GC deep dive if memory-leak suspected

Procedure (step-by-step)

Step 1 — Write a profiling question (avoid fishing)

Examples:

•“Which methods consume the most CPU during /search at p99?”
•“Where are we allocating the most objects per request?”
•“Which lock is contended during peak traffic?”

Deliverable: a single profiling question + success metric.

Step 2 — Reproduce and minimize noise

•Use a stable load scenario (same input size, same dataset, same concurrency)
•
Pin down:
- •request path
- •concurrency level
- •duration
•Confirm the symptom exists under the scenario.

Deliverable: “Repro recipe” (commands, parameters, expected symptom).

Step 3 — Capture JFR (preferred baseline)

Start a bounded recording:

•short duration (30–300s)
•include CPU, allocation, locks, thread events

Capture methods vary:

•jcmd <pid> JFR.start ...
•jfr tool (depending on environment)
•JVM startup flags in a controlled environment (not first choice)

Always store:

•recording file name
•time window
•workload parameters
•build/version hash

Deliverable: a JFR recording + metadata.

Step 4 — Analyze JFR: a structured checklist

When opening JFR (JMC or other tooling), check:

CPU / Execution

•hottest methods and call stacks
•suspicious loops
•high cost serialization/deserialization
•regex backtracking hotspots
•logging overhead (string concat, JSON encode)

Allocation

•top allocation sites (per request)
•big object arrays, large maps/lists
•repeated parsing of the same data
•missing caching for derived results

Locks

•monitors and contended locks
•synchronized blocks with I/O inside
•thread parking patterns

Thread states

•many threads blocked on DB pool?
•many runnable threads competing for CPU?
•many waiting threads due to missing timeouts?

Deliverable: “Top 3 findings” with evidence pointers (screen captures or notes).

Step 5 — Use async-profiler-style flamegraphs when permitted

If you can attach a profiler in staging or a safe prod window:

•Generate a CPU flamegraph for the same scenario
•Optionally allocation or lock flamegraphs (depending on tool support)

Rules:

•keep capture windows short
•run during controlled load
•avoid collecting sensitive data
•get explicit approval for production attaches

Deliverable: flamegraph(s) + short interpretation.

Step 6 — Convert findings into hypotheses and fixes

For each finding, produce:

•Hypothesis: “X is expensive because Y”
•Fix idea: “Change A to B”
•Expected impact: “Reduce allocations by ~N per request” or “Reduce CPU in method M”
•Risk: correctness risk and rollback plan

Prefer small fixes:

•avoid “rewrite everything”
•isolate changes
•add micro-level regression tests when appropriate

Deliverable: a small PR with focused perf fix.

Step 7 — Verify with the same workload and report

Re-run the same test:

•baseline vs new
•compare p95/p99, throughput, CPU, allocation rate, GC time, errors

Report template:

•Context (service/version)
•Workload
•Baseline metrics
•Changes
•After metrics
•Evidence (JFR/flamegraph deltas)
•Risk / rollback plan

Deliverable: perf report + artifacts in references/ folder (or attachment store).

Outputs / Artifacts

•Repro recipe
•JFR recording + metadata
•Optional flamegraphs
•“Top 3 findings” summary
•PR with measured improvements
•Report template filled

Definition of Done (DoD)

• Profiling question defined and answered with evidence
• Recording captured and stored with metadata
• Fix is small and has rollback plan
• Re-measurement confirms improvement or documents no-change
• No new correctness regressions (tests passed)
• Report written (baseline vs after)

Common failure modes & fixes

•
Symptom: flamegraph shows “native” or “unknown”
- •Cause: missing symbols, container restrictions
- •Fix: use JFR; ensure correct permissions; profile in staging
•
Symptom: results not reproducible
- •Cause: workload not controlled, noisy environment
- •Fix: stabilize inputs, duration, concurrency; repeat runs
•
Symptom: you optimize the wrong thing
- •Cause: not tying profiling to p95/p99 paths
- •Fix: start from metrics/traces; profile where pain exists

Guardrails (What NOT to do)

•Do NOT profile production with high-overhead tooling without explicit approval.
•Do NOT “optimize” without measurements.
•Do NOT micro-optimize before fixing algorithmic or I/O bottlenecks.
•Do NOT commit profiler configs that leak secrets or sensitive paths.

References (primary)

•Java Flight Recorder (JDK tooling): https://docs.oracle.com/en/java/javase/21/jfapi/using-java-flight-recorder.html
•jcmd and diagnostics overview: https://docs.oracle.com/en/java/javase/21/troubleshoot/diagnostic-tools.html
•jfr command (Oracle tool reference): https://docs.oracle.com/en/java/javase/21/docs/specs/man/jfr.html
•Flight Recorder (OpenJDK JEP 328): https://openjdk.org/jeps/328
•async-profiler (project): https://github.com/jvm-profiling-tools/async-profiler