Skip to content

test: remove timing race in serialization timeout test#283

Merged
heruan merged 2 commits into
mainfrom
fix/flaky-serialization-timeout-test
Apr 23, 2026
Merged

test: remove timing race in serialization timeout test#283
heruan merged 2 commits into
mainfrom
fix/flaky-serialization-timeout-test

Conversation

@heruan

@heruan heruan commented Apr 23, 2026

Copy link
Copy Markdown
Member

Summary

Fixes a rare flake in SerializationDebugRequestHandlerTest.handleRequest_serializationTimeout_timeoutReported observed on CI (run 24824699112) where the test reported [SERIALIZATION_FAILED, NOT_SERIALIZABLE_CLASSES] instead of the expected [NOT_SERIALIZABLE_CLASSES, SERIALIZATION_TIMEOUT].

Root cause

The test had SlowSerialization.writeObject sleep 2000 ms and set the timeout to 100 ms. The expectation is that the 100 ms timeout fires first, producing SERIALIZATION_TIMEOUT.

The flow is:

  1. Main thread: trySerialize() schedules async serialization, returns
  2. Main thread: waitForCompletion(100ms) — awaits serializationCompletedLatch
  3. Background: runs Thread.sleep(2000) then fails on non-serializable fields

If the main thread pauses (GC, JIT, loaded CI) for a couple of seconds between steps 1 and 2, the background task can reach its failure path and count the latch down via whenSerialized.accept(null) before the main thread enters waitForCompletion. The 100 ms await then returns immediately with completed=true, timeout() is never called, and only SERIALIZATION_FAILED ends up in the outcomes.

Fix

Replace Thread.sleep(2000) with a test-controlled CountDownLatch that writeObject awaits unbounded. The test releases the latch in a finally block after asserting. This guarantees the latch stays uncounted until the assertion is done, so the 100 ms timeout always fires regardless of main-thread pauses.

Test plan

  • mvn test -Dtest=SerializationDebugRequestHandlerTest#handleRequest_serializationTimeout_timeoutReported passes 5/5 runs locally
  • Full SerializationDebugRequestHandlerTest class (13 tests) passes

Use a test-controlled CountDownLatch instead of Thread.sleep so the
100 ms timeout always fires before the serializer finishes, even on
slow or loaded runners where a main-thread pause between
trySerialize() and waitForCompletion() could previously let the
background path complete first, hiding the expected
SERIALIZATION_TIMEOUT outcome.
@github-actions

github-actions Bot commented Apr 23, 2026

Copy link
Copy Markdown

Test Results

 35 files  ±0   35 suites  ±0   30s ⏱️ ±0s
179 tests ±0  178 ✅ ±0   1 💤 ±0  0 ❌ ±0 
236 runs  ±0  203 ✅ ±0  33 💤 ±0  0 ❌ ±0 

Results for commit 1577b3f. ± Comparison against base commit ee76d12.

♻️ This comment has been updated with latest results.

Use a test-controlled CountDownLatch to block session1's serialization
until session2 has completed, replacing the Thread.sleep+delay pattern
that assumed a specific timing relationship between two concurrent
serializations. On loaded CI runners the timing could drift enough
that session1 finished first, flipping the expected ordering.

Also wait explicitly for session1 to reach the started state before
scheduling session2, so the 'started' order is deterministic.
@heruan heruan self-assigned this Apr 23, 2026
@heruan heruan merged commit 66bd0e3 into main Apr 23, 2026
4 checks passed
@heruan heruan deleted the fix/flaky-serialization-timeout-test branch April 23, 2026 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants