Describe the bug
SIGSEGV crash in JfrConcurrentLinkedListHost::remove() called from JfrTraceIdKlassQueue::enqueue() during a JFR deoptimization event recording. The crash occurs when a JFR recording is being started (via async-profiler's JFR output) while the JVM is under load and writing JFR events from other threads.
The faulting address 0x0000000800000080 suggests a corrupted pointer in the JFR concurrent linked list (RAX=0x0000000800000000 with offset 0x80 dereference).
Crash stack:
V [libjvm.so+0x933664] JfrConcurrentLinkedListHost<JfrConcurrentQueue<JfrBuffer, JfrCHeapObj>, HeadNode, JfrCHeapObj>::remove(JfrBuffer*, JfrBuffer const*, JfrBuffer*, bool)+0x64
V [libjvm.so+0x996508] JfrTraceIdKlassQueue::enqueue(Klass const*)+0x2e8
V [libjvm.so+0x97ffd3] JfrStackTrace::record(JavaThread*, frame const&, int)+0x103
V [libjvm.so+0x9802a0] JfrStackTrace::record(JavaThread*, int)+0x50
V [libjvm.so+0x980a30] JfrStackTraceRepository::record(Thread*, int)+0xa0
V [libjvm.so+0x6e908e] JfrEvent<EventDeoptimization>::write_event()+0xfe
V [libjvm.so+0x6e5e49] Deoptimization::uncommon_trap_inner(JavaThread*, int)+0xef9
V [libjvm.so+0x6e7c9c] Deoptimization::uncommon_trap(JavaThread*, int, int)+0x1c
This appears related to JDK-8323631 and JDK-8326334, both of which are fixed in JDK 23 but have not been backported to JDK 21.
To Reproduce
- Start a second JFR recording via async-profiler (
one.profiler.AsyncProfiler) with JFR output enabled
- The crash occurs within ~1 second of async-profiler loading and starting the new JFR recording, when a thread hits a deoptimization and the JVM tries to record the
EventDeoptimization JFR event
Timeline from hs_err (elapsed seconds):
- 260.188s: async-profiler native library loaded
- 260.732s: JFR loads
SecuritySupport$SecureRecorderListener (new recording setup)
- 260.770s: JFR loads
WriteableUserPath (recording output configuration)
- 260.982s: JFR internal classes being JIT-compiled for first time
- 260.991s: SIGSEGV in
JfrConcurrentLinkedListHost::remove()
The crash appears to be a race between JFR recording initialization on the JFR Recorder Thread (which was in _thread_in_native state at crash time) and concurrent JFR event writing from application threads.
Expected behavior
Starting a new JFR recording while the JVM is running should not crash the JVM. Multiple concurrent JFR recordings should be safe.
Screenshots
N/A
Platform information
- OS: Amazon Linux 2023.5.20240903
- CPU: Intel Xeon Platinum 8275CL @ 3.00GHz, 8 cores, 14G RAM
- Version: Corretto-21.0.6.7.1 (build 21.0.6+7-LTS)
- GC: Parallel GC (
-XX:+UseParallelGC)
- Heap:
-Xmx2048m -XX:MetaspaceSize=512m
Additional context
- The fix for JDK-8323631 (
JfrTypeSet::write_klass can enqueue a CLD klass that is unloading) was applied to JDK 23 and backported to 22/22.0.1, but not to JDK 21.
- JDK-8326334 (
JFR failed assert(used(klass)) failed: invariant) explicitly lists 21-pool-oracle as affected, is fixed in JDK 23, but also has no JDK 21 backport.
- Both bugs involve the same JFR klass tracking/enqueuing subsystem (
JfrTraceIdKlassQueue, JfrTypeSet::write_klass) that our crash hits.
- We verified against the Corretto 21 develop branch (21.0.11.9.1) — neither fix is present.
- The hs_err log is attached (sanitized).
hs_err_pid17232_sanitized.log
Describe the bug
SIGSEGV crash in
JfrConcurrentLinkedListHost::remove()called fromJfrTraceIdKlassQueue::enqueue()during a JFR deoptimization event recording. The crash occurs when a JFR recording is being started (via async-profiler's JFR output) while the JVM is under load and writing JFR events from other threads.The faulting address
0x0000000800000080suggests a corrupted pointer in the JFR concurrent linked list (RAX=0x0000000800000000with offset0x80dereference).Crash stack:
This appears related to JDK-8323631 and JDK-8326334, both of which are fixed in JDK 23 but have not been backported to JDK 21.
To Reproduce
one.profiler.AsyncProfiler) with JFR output enabledEventDeoptimizationJFR eventTimeline from hs_err (elapsed seconds):
SecuritySupport$SecureRecorderListener(new recording setup)WriteableUserPath(recording output configuration)JfrConcurrentLinkedListHost::remove()The crash appears to be a race between JFR recording initialization on the
JFR Recorder Thread(which was in_thread_in_nativestate at crash time) and concurrent JFR event writing from application threads.Expected behavior
Starting a new JFR recording while the JVM is running should not crash the JVM. Multiple concurrent JFR recordings should be safe.
Screenshots
N/A
Platform information
-XX:+UseParallelGC)-Xmx2048m -XX:MetaspaceSize=512mAdditional context
JfrTypeSet::write_klass can enqueue a CLD klass that is unloading) was applied to JDK 23 and backported to 22/22.0.1, but not to JDK 21.JFR failed assert(used(klass)) failed: invariant) explicitly lists21-pool-oracleas affected, is fixed in JDK 23, but also has no JDK 21 backport.JfrTraceIdKlassQueue,JfrTypeSet::write_klass) that our crash hits.hs_err_pid17232_sanitized.log