Bound CDP macrotask drains so commands aren't queued behind page work#2405
Open
navidemad wants to merge 1 commit intolightpanda-io:mainfrom
Open
Bound CDP macrotask drains so commands aren't queued behind page work#2405navidemad wants to merge 1 commit intolightpanda-io:mainfrom
navidemad wants to merge 1 commit intolightpanda-io:mainfrom
Conversation
Pages with sustained JS activity (Angular RAF chains, change detection on heavy SPAs) hold the V8 thread inside `Browser.runMacrotasks` for many seconds at a time. Because `Runner._tick` only polls the WebSocket *after* the drain returns, queued CDP commands sit in the kernel buffer for the duration — every session-scoped command stalled 14–20s on the reproducer in lightpanda-io#2402, even `Runtime.evaluate('1+1')`. Thread an optional monotonic-clock deadline through `Browser.runMacrotasks` → `Env.runMacrotasks` / `Env.pumpMessageLoop` → `Scheduler.run` / `Scheduler.runQueue`. Inner loops check the deadline after each task and yield back to the caller when it elapses, leaving still-ready tasks in the queue. `Runner._tick` sets a 50ms deadline in CDP mode and `null` (unbounded) in fetch mode, preserving existing fetch behavior. Other `pumpMessageLoop` callers (worker context, Context.deinit) and the single direct `scheduler.run` call in ScriptManagerBase pass `null`. Adds two unit tests on `Scheduler.run` covering the no-deadline drain and the elapsed-deadline yield behavior. Refs lightpanda-io#2402
Contributor
Author
|
Verified against the issue's
Every session-scoped command moves from "never returns" to ~2.75 s. The per-command latency floor matches the duration of a single Angular change-detection callback on this page — exactly the caveat about long single tasks I called out in the body. Sub-second response on pages like this needs V8 Full probe outputUnpatched: Patched: |
This was referenced May 9, 2026
Closed
Collaborator
|
I think #2393 does the job, while being much simpler. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this fixes
On pages with sustained JS activity (Angular SPAs in change-detection /
requestAnimationFramechains), every session-scoped CDP command —Runtime.evaluate,DOM.getDocument,DOM.getOuterHTML— stalls for 14–20 seconds inservemode.fetch --dump htmlof the same page works fine because--wait-mscaps it; CDP has no equivalent ceiling.The cleanest signal in the issue's reproducer is that
Runtime.evaluate('1+1')— a zero-cost roundtrip — takes 14.7 seconds. Whatever the command is, it isn't slow because of what it does; it's slow because it can't be read off the WebSocket. See #2402 for the full trace.Root cause
Runner._tick(CDP mode,.html/.completebranch) callsbrowser.runMacrotasks()before yielding to socket I/O viahttp_client.tick(...).Browser.runMacrotasksdrains three loops back-to-back, none of which yield:env.runMacrotasks()→Scheduler.runQueue'swhile (queue.peek())— every ready user-scheduled task (timers,setTimeout,requestAnimationFrame, …)env.pumpMessageLoop()→while (v8__Platform__PumpMessageLoop(...)) {}— every V8 platform taskenv.runMicrotasks()— every queued microtaskOn the OPSWAT page this drain runs for 14–20 s before returning. CDP commands sent during that window sit in the kernel WebSocket buffer the whole time. Once the drain finishes, the very next
http_client.tick(0)reads them and they execute in ~0 ms — confirming the bottleneck is the drain, not the command.What this PR changes
Threads an optional monotonic-clock deadline through the drain chain. Inner loops check it between tasks and yield to the caller when it has elapsed, leaving still-ready tasks queued for the next pass.
The hot edit is in
Runner.zig:The deadline is plumbed through
Browser.runMacrotasks→Env.runMacrotasks/Env.pumpMessageLoop→Scheduler.run/Scheduler.runQueueasdeadline_ms: ?u64. All five gain the parameter.nullpreserves the existing unbounded behavior — that's what every non-CDP caller passes (workerLocal.runMacrotasks,Context.deinit, the directscheduler.runinScriptManagerBase, andRunner._tickin fetch mode).Why 50 ms
It's a trade-off between page progress and CDP responsiveness. The deadline is checked between tasks, so:
Hard-coded as
CDP_MACROTASK_BUDGET_MSinRunner.zig. Could be exposed as aserveflag later — the issue reporter is comfortable with anything well under 1 s, so a follow-up is fine.Caveats
fetch --dumpreturns the page in 5s #2402, individual change-detection callbacks run ~2.75 s, so the per-command latency floor lands there even with the budget. Driving sub-second response on those pages needs V8RequestInterrupt(ask make: fix help w/ linux #3 in the issue), which is the right long-term answer.env.runMicrotasks) are not bounded. Bounding them requiresRequestInterrupt. They're typically cheap.Lightpanda.stopJs/--terminate-msequivalent), make: fix help w/ linux #3 (V8RequestInterrupt), Install: add build and test instructions #4 (msToNextMacrotaskexposure) are separate features and not addressed here.Where to focus review
The new parameter on
Browser.runMacrotasks/Env.runMacrotasks/Scheduler.run/Scheduler.runQueueis mechanical — each had exactly one caller before this PR.Env.pumpMessageLoophad three callers, and the two non-Browserones run in different lifecycles. They both passnull(i.e. no behavior change), but I'd appreciate a second read on:src/browser/js/Local.zig:111— worker context, called fromScriptManagerBase,WorkerGlobalScope,Workersrc/browser/js/Context.zig:229— context deinit, drains residual platform tasks beforeMicrotaskQueuedeletionBoth should be unchanged. But worker / shutdown paths are exactly the kind of edge case where it would be easy to miss something.
Test plan
make test— 523/523 pass, including two newScheduler.rununit testsScheduler.run(null)drains all 50 queued tasksScheduler.run(1)(deadline already in the past) runs exactly 1 task, leaves the rest queued; a follow-upScheduler.run(null)drains themfetch --dumpreturns the page in 5s #2402 — every session-scoped command moves from "TIMEOUT 20 s" (currentmain) to ~2.75 s (this PR). See the verification comment below for the side-by-side probe output.Refs #2402