Cancel TotalTimeoutHandler scheduled timeout on channel close by codexcoder21 · Pull Request #491 · libp2p/jvm-libp2p

codexcoder21 · 2026-06-17T03:03:23Z

Problem

TotalTimeoutHandler (installed by the multistream Negotiator to bound the time a stream may spend in protocol negotiation) cancels its scheduled timeout only in handlerRemoved.

A substream (MuxChannel) that is closed before negotiation completes is never removed by application code. Its TotalTimeoutHandler is removed only when the substream's pipeline is torn down, which happens during the channel's deferred deregistration — a regular task submitted to the channel's event loop. Cancellation of the negotiation-timeout task is therefore gated on that deferred task actually running.

Under a burst of substreams that open and abort mid-negotiation (a reconnect / negotiation-abort herd) on a CPU-constrained event loop, those deferred deregistration tasks are starved. handlerRemoved never fires, so each scheduled negotiation-timeout ScheduledFutureTask — which captures the entire closed substream pipeline (MuxChannel, Negotiator$ResponderHandler, the negotiation codecs, and their ChannelHandlerContexts) — stays pinned in the event loop's scheduled-task queue until its (10s) timeout elapses. When closes outpace timeout expiry these pipelines accumulate without bound until OutOfMemoryError.

This is a retention-after-close leak: the number of concurrently live substreams stays bounded, but closed substreams are not reclaimed. A heap dump from a memory-constrained node (128 MB heap) under such churn shows tens of thousands of pending TotalTimeoutHandler ScheduledFutureTasks, each rooting a closed MuxChannel / Negotiator$ResponderHandler pipeline.

Fix

Register the timeout cancellation on the channel's close future as well, via a listener added in handlerAdded. The close future completes while the channel is closing — independent of event-loop backlog or channel state — so the scheduled task is cancelled and its captured pipeline released promptly even when the deferred deregistration is starved. The listener is removed in handlerRemoved (and in cancel) so it does not linger on the normal negotiation-success path.

channelInactive is not a viable cancellation point: AbstractChildChannel does not fire channelInactive for a child channel that is closed while still in the OPEN state — the common case for an aborted mid-negotiation substream. Instrumentation over a churn run measured channelInactive firing only 141 times across 9.77M substreams, whereas the close-future listener cancelled all 9.77M scheduled tasks and held the heap flat.

Tests

TotalTimeoutHandlerTest (red → green):

closing the channel without removing the handler must cancel the scheduled timeout — fails before this change (the timeout still fires and closes the context) and passes after;
a sanity case asserting the timeout still fires when neither close nor removal occurs.

Verified locally: ./gradlew :libp2p:test --tests "io.libp2p.etc.util.netty.TotalTimeoutHandlerTest" (2 passed), spotlessCheck, and detekt all green.

🤖 Generated with Claude Code

TotalTimeoutHandler (installed by the multistream Negotiator to bound negotiation time) cancelled its scheduled timeout only in handlerRemoved. A substream (MuxChannel) that closes before negotiation completes is not removed by application code, so handlerRemoved depends on the pipeline being destroyed during the channel's deferred deregister, which runs as a regular task on the channel's event loop. When that event loop is backlogged (e.g. a reconnect / negotiation-abort herd on a CPU-constrained host) the deferred deregister is starved, handlerRemoved never fires, and the scheduled timeout task — which captures the whole closed substream pipeline (MuxChannel + Negotiator$ResponderHandler + codecs) — stays pinned in the event loop's scheduled-task queue until the timeout elapses. Under sustained churn these closed-but-pinned pipelines accumulate unbounded and exhaust the heap. Also cancel the timeout via a listener on the channel's close future, which completes while the channel is closing regardless of event-loop backlog or channel state. channelInactive is insufficient: AbstractChildChannel does not fire it for a channel closed while still in the OPEN state, the common case for an aborted mid-negotiation substream. The listener is removed in handlerRemoved/cancel so it does not linger on the negotiation-success path. Adds TotalTimeoutHandlerTest: closing the channel without removing the handler must cancel the timeout (red before this change, green after), plus a sanity case that the timeout still fires when neither close nor removal occurs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cancel TotalTimeoutHandler scheduled timeout on channel close#491

Cancel TotalTimeoutHandler scheduled timeout on channel close#491
codexcoder21 wants to merge 1 commit into
libp2p:developfrom
CodexCoder21Organization:upstream-cancel-totaltimeouthandler-on-close

codexcoder21 commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

codexcoder21 commented Jun 17, 2026

Problem

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant