Skip to content

Corretto-21.0.6.7.1 crashes due to JDK-8323631 JDK-8326334 #154

@carojkov

Description

@carojkov

Describe the bug

SIGSEGV crash in JfrConcurrentLinkedListHost::remove() called from JfrTraceIdKlassQueue::enqueue() during a JFR deoptimization event recording. The crash occurs when a JFR recording is being started (via async-profiler's JFR output) while the JVM is under load and writing JFR events from other threads.

The faulting address 0x0000000800000080 suggests a corrupted pointer in the JFR concurrent linked list (RAX=0x0000000800000000 with offset 0x80 dereference).

Crash stack:

V  [libjvm.so+0x933664]  JfrConcurrentLinkedListHost<JfrConcurrentQueue<JfrBuffer, JfrCHeapObj>, HeadNode, JfrCHeapObj>::remove(JfrBuffer*, JfrBuffer const*, JfrBuffer*, bool)+0x64
V  [libjvm.so+0x996508]  JfrTraceIdKlassQueue::enqueue(Klass const*)+0x2e8
V  [libjvm.so+0x97ffd3]  JfrStackTrace::record(JavaThread*, frame const&, int)+0x103
V  [libjvm.so+0x9802a0]  JfrStackTrace::record(JavaThread*, int)+0x50
V  [libjvm.so+0x980a30]  JfrStackTraceRepository::record(Thread*, int)+0xa0
V  [libjvm.so+0x6e908e]  JfrEvent<EventDeoptimization>::write_event()+0xfe
V  [libjvm.so+0x6e5e49]  Deoptimization::uncommon_trap_inner(JavaThread*, int)+0xef9
V  [libjvm.so+0x6e7c9c]  Deoptimization::uncommon_trap(JavaThread*, int, int)+0x1c

This appears related to JDK-8323631 and JDK-8326334, both of which are fixed in JDK 23 but have not been backported to JDK 21.

To Reproduce

  1. Start a second JFR recording via async-profiler (one.profiler.AsyncProfiler) with JFR output enabled
  2. The crash occurs within ~1 second of async-profiler loading and starting the new JFR recording, when a thread hits a deoptimization and the JVM tries to record the EventDeoptimization JFR event

Timeline from hs_err (elapsed seconds):

  • 260.188s: async-profiler native library loaded
  • 260.732s: JFR loads SecuritySupport$SecureRecorderListener (new recording setup)
  • 260.770s: JFR loads WriteableUserPath (recording output configuration)
  • 260.982s: JFR internal classes being JIT-compiled for first time
  • 260.991s: SIGSEGV in JfrConcurrentLinkedListHost::remove()

The crash appears to be a race between JFR recording initialization on the JFR Recorder Thread (which was in _thread_in_native state at crash time) and concurrent JFR event writing from application threads.

Expected behavior

Starting a new JFR recording while the JVM is running should not crash the JVM. Multiple concurrent JFR recordings should be safe.

Screenshots

N/A

Platform information

  • OS: Amazon Linux 2023.5.20240903
  • CPU: Intel Xeon Platinum 8275CL @ 3.00GHz, 8 cores, 14G RAM
  • Version: Corretto-21.0.6.7.1 (build 21.0.6+7-LTS)
  • GC: Parallel GC (-XX:+UseParallelGC)
  • Heap: -Xmx2048m -XX:MetaspaceSize=512m

Additional context

  • The fix for JDK-8323631 (JfrTypeSet::write_klass can enqueue a CLD klass that is unloading) was applied to JDK 23 and backported to 22/22.0.1, but not to JDK 21.
  • JDK-8326334 (JFR failed assert(used(klass)) failed: invariant) explicitly lists 21-pool-oracle as affected, is fixed in JDK 23, but also has no JDK 21 backport.
  • Both bugs involve the same JFR klass tracking/enqueuing subsystem (JfrTraceIdKlassQueue, JfrTypeSet::write_klass) that our crash hits.
  • We verified against the Corretto 21 develop branch (21.0.11.9.1) — neither fix is present.
  • The hs_err log is attached (sanitized).

hs_err_pid17232_sanitized.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions