Skip to content

[rb] fix silent hang on oversized WebSocket frames#17710

Open
Chandan25sharma wants to merge 4 commits into
SeleniumHQ:trunkfrom
Chandan25sharma:trunk
Open

[rb] fix silent hang on oversized WebSocket frames#17710
Chandan25sharma wants to merge 4 commits into
SeleniumHQ:trunkfrom
Chandan25sharma:trunk

Conversation

@Chandan25sharma

@Chandan25sharma Chandan25sharma commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Added MAX_FRAME_SIZE = 100 * 1024 * 1024 (100 MB) constant
In initialize: bumps WebSocket.max_frame_size to MAX_FRAME_SIZE only when the current global is lower, so user-configured values are never overridden
In attach_socket_listener: after the inner while (frame = incoming_frame.next) loop exits, checks incoming_frame.error? and raises the stored error — this breaks the outer read loop instead of spinning at 100% CPU
websocket_connection.rbs

Added MAX_FRAME_SIZE: Integer to the type signature
websocket_connection_spec.rb (new)

Tests that MAX_FRAME_SIZE is 100 MB
Tests that initialize bumps the global frame-size limit when it's below MAX_FRAME_SIZE
Tests that initialize leaves a higher value untouched

WebSocket.max_frame_size (websocket-ruby default 20 MB) is bumped to
100 MB in WebSocketConnection#initialize so typical DevTools payloads
pass through.  For frames that still exceed the limit, websocket-ruby
rescues TooLong internally and returns nil from incoming_frame.next,
leaving oversized bytes buffered and causing the read loop to spin at
~100% CPU.  After the inner while loop we now check incoming_frame.error?
and raise the stored error so the outer loop exits cleanly and the
connection is logged and closed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@selenium-ci selenium-ci added the C-rb Ruby Bindings label Jun 23, 2026
@qodo-code-review

Copy link
Copy Markdown
Contributor

PR Summary by Qodo

Fix Ruby WebSocket hang on oversized frames (raise TooLong + bump max size)
🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

Grey Divider

Description

• Increase websocket-ruby max_frame_size to allow large DevTools payloads.
• Surface oversized-frame errors to stop the listener loop from CPU-spinning.
• Add unit coverage and update Ruby RBS signature for the new constant.
Diagram

graph TD
  Client["DevTools client"] --> WSC["WebSocketConnection"] --> IF["Incoming frame parser"] --> ERR{{"Oversized frame error"}}
  WSC --> CFG["WebSocket.max_frame_size"]
  WSC --> SOCK["TCP/SSL socket"]
  WSC --> LOG["WebDriver logger"]
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Make max frame size configurable (option/env)
  • ➕ Avoids a hard-coded 100MB policy choice for all users
  • ➕ Lets deployments with strict memory constraints keep lower limits
  • ➖ Adds API surface and documentation burden
  • ➖ Still relies on a global websocket-ruby setting, so it’s not truly per-connection
2. Patch/upgrade websocket-ruby to raise TooLong directly
  • ➕ Removes the need for Selenium-side workaround logic
  • ➕ Potentially fixes other clients of websocket-ruby too
  • ➖ Requires coordination with upstream gem release cadence
  • ➖ May be harder to support across Selenium’s supported Ruby/gem matrix
3. Detect the spin by tracking read progress and breaking when next returns nil repeatedly
  • ➕ Avoids relying on websocket-ruby’s error?/error API shape
  • ➕ Can guard against other parser-stall scenarios
  • ➖ More stateful/heuristic logic; risk of false positives under normal backpressure
  • ➖ Doesn’t fix the root cause; still leaves error reporting indirect

Recommendation: The current approach is appropriate for a targeted fix: bump the global frame limit only when below the needed threshold (reducing surprise for users who increased it) and explicitly raise stored parser errors to terminate the listener thread cleanly. If oversized payload needs vary by user/environment, consider a follow-up to make MAX_FRAME_SIZE configurable, but keep the error-raising behavior regardless to prevent the 100% CPU spin.

Files changed (3) +74 / -0

Bug fix (1) +9 / -0
websocket_connection.rbIncrease frame limit and raise stored parser errors to avoid listener spin +9/-0

Increase frame limit and raise stored parser errors to avoid listener spin

• Introduces a 100MB MAX_FRAME_SIZE constant and conditionally raises websocket-ruby’s global WebSocket.max_frame_size during initialization. After draining frames, the listener now checks incoming_frame.error? and raises the stored error so oversized frames terminate the loop instead of spinning indefinitely.

rb/lib/selenium/webdriver/common/websocket_connection.rb

Tests (1) +63 / -0
websocket_connection_spec.rbAdd unit tests for frame size bump behavior +63/-0

Add unit tests for frame size bump behavior

• Adds specs asserting MAX_FRAME_SIZE is 100MB and that initialize raises WebSocket.max_frame_size when below the threshold while not lowering it when already higher. Stubs network setup to keep tests isolated from real sockets.

rb/spec/unit/selenium/webdriver/common/websocket_connection_spec.rb

Documentation (1) +2 / -0
websocket_connection.rbsExpose MAX_FRAME_SIZE in the WebSocketConnection RBS signature +2/-0

Expose MAX_FRAME_SIZE in the WebSocketConnection RBS signature

• Adds the MAX_FRAME_SIZE constant declaration to the RBS type definition to match the Ruby implementation.

rb/sig/lib/selenium/webdriver/common/websocket_connection.rbs

Chandan25sharma and others added 3 commits June 23, 2026 16:46
…> ambiguity

IEventStream<T> implements both IAsyncEnumerable<T> and IAsyncDisposable.
TaskAsyncEnumerableExtensions ships a ConfigureAwait overload for each
interface, so a plain stream.ConfigureAwait(bool) call on an IEventStream<T>
variable produces CS0121 (ambiguous call).

Add EventStreamExtensions.ConfigureAwait<T>(IEventStream<T>, bool) which
explicitly routes to the IAsyncEnumerable<T> overload (the behaviour callers
need for await foreach).  Because the new overload is more specific than the
two BCL overloads, the compiler resolves to it without any cast on the call
site.

Fixes: SeleniumHQ#17696

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem
-------
IEventStream<T> inherits both IAsyncEnumerable<T> and IAsyncDisposable.
System.Linq.Async ships a ConfigureAwait overload for each of those
interfaces in TaskAsyncEnumerableExtensions:

  ConfigureAwait<T>(IAsyncEnumerable<T>, bool)   -> ConfiguredCancelableAsyncEnumerable<T>
  ConfigureAwait(IAsyncDisposable, bool)          -> ConfiguredAsyncDisposable

Both overloads match when the receiver is an IEventStream<T>, so the
compiler emits CS0121 ("The call is ambiguous between ...").  Users had
to add an explicit cast to either interface to work around this, which
is both surprising and hard to discover.

Fix
---
Add EventStreamExtensions.ConfigureAwait<T>(this IEventStream<T>, bool)
that explicitly delegates to the IAsyncEnumerable<T> overload.  Because
a more-derived receiver type takes priority over the two BCL overloads,
the compiler now resolves the call unambiguously without any cast at the
call site.  The IAsyncEnumerable<T> variant is the correct choice here
since callers use ConfigureAwait to control context capture inside
`await foreach` loops, not during disposal.

Testing
-------
EventStreamExtensionsTests verifies at the type level that the return
type is ConfiguredCancelableAsyncEnumerable<T> (not ConfiguredAsyncDisposable)
and at the behavioural level that events flow through a
ConfigureAwait(false) enumeration correctly.

Fixes: SeleniumHQ#17696

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +22 to +25
namespace OpenQA.Selenium.BiDi;

public static class EventStreamExtensions
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Public class missing xml summary 📘 Rule violation ✧ Quality

EventStreamExtensions is a new public class but it has no XML documentation comment with a
<summary> immediately preceding the declaration. This violates the requirement that all public API
members include <summary> docs for tooling and consumers.
Agent Prompt
## Issue description
A new public C# type (`EventStreamExtensions`) was added without an XML documentation comment containing a non-empty `<summary>`.

## Issue Context
Compliance requires XML docs with `<summary>` for all public API members.

## Fix Focus Areas
- dotnet/src/webdriver/BiDi/EventStreamExtensions.cs[22-25]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +151 to +153
# websocket-ruby rescues TooLong internally and returns nil from next;
# raise the stored error so the loop exits instead of spinning at 100% cpu
raise incoming_frame.error if incoming_frame.error?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. Listener failure causes timeouts 🐞 Bug ☼ Reliability

When attach_socket_listener raises incoming_frame.error, the surrounding rescue only logs and
exits the listener thread without closing the socket or surfacing the error to waiting callers.
After the listener exits, send_cmd can still successfully write but then waits until its 30s
timeout because responses are never enqueued.
Agent Prompt
## Issue description
The listener thread can terminate on `incoming_frame.error`, but the error is only logged; callers waiting in `send_cmd` are not notified and may block until `Wait` times out.

## Issue Context
- `send_cmd` writes to the socket and then waits for `messages.delete(id)`.
- `attach_socket_listener` rescues and logs but does not close the socket or store/propagate the failure.

## Fix Focus Areas
- rb/lib/selenium/webdriver/common/websocket_connection.rb[104-118]
- rb/lib/selenium/webdriver/common/websocket_connection.rb[131-158]

### Suggested fix
1. In `attach_socket_listener`, when an error occurs (either `incoming_frame.error?` path or in the rescue), store it on the connection (e.g., `@listener_error = e`) and transition the connection to a closed/closing state.
2. Close the underlying socket in that error path so subsequent `send_cmd` writes fail fast.
3. In `send_cmd`’s wait loop, check `@listener_error` (or `@closing`) and raise immediately instead of waiting for the timeout.

Example sketch:
- In listener rescue: set `@listener_error ||= e`; set `@closing = true`; `socket.close rescue nil`.
- In `send_cmd` wait block: `raise @listener_error if @listener_error` before/alongside checking `messages.delete(id)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-rb Ruby Bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants