This repository was archived by the owner on May 9, 2026. It is now read-only.
CI release-readiness audit#4
Draft
czyt wants to merge 11 commits into
Draft
Conversation
|
Note Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported. |
GitHub Actions surfaced a self-deadlock in BrowserSession startup: StartWithOptions held the session mutex while launch() called Stop() on connect failure, and Stop() tried to lock the same mutex again. Extracting stopLocked() lets the launch failure path clean up without re-locking, and the new regression test locks that cleanup behavior in place.\n\nConstraint: Keep BrowserSession startup/cleanup semantics unchanged outside the failure path\nRejected: Remove the browser Goal Runs E2E from CI | would hide a real release blocker instead of fixing it\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Any future BrowserSession cleanup called from a locked startup path should use stopLocked-style internal helpers rather than Stop() to avoid lock re-entry\nTested: go test -count=1 ./pkg/tools -run 'TestBrowserSessionStartWithModeAutoCleansUpAfterLaunchFailure|TestBrowserSessionStartWithModeAutoFallsBackToLaunch'; go test -count=1 ./pkg/tools; go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess|TestGoalRunRecoveryAcrossRealProcessRestart|TestDaemonBackedGoalRunRecoveryAcrossRealProcessRestart'\nNot-tested: Full GitHub Actions rerun not yet observed in this local step
GitHub Actions exposed a browser-session startup deadlock in the Goal Run browser E2E. The fix keeps cleanup on the locked failure path internal via stopLocked(), and also clears stale cancel state when Chrome discovery fails before launch. Two focused regressions now lock both failure modes.\n\nConstraint: Preserve normal browser-session startup behavior while fixing only failure-path cleanup\nRejected: Remove the browser E2E from CI | that would hide a real release blocker instead of resolving it\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Any BrowserSession failure path entered while StartWithOptions holds the mutex must avoid calling lock-taking public cleanup methods\nTested: go test -count=1 ./pkg/tools -run 'TestBrowserSessionStartWithModeAutoCleansUpAfterLaunchFailure|TestBrowserSessionLaunchWithoutChromeCleansUpCancel|TestBrowserSessionStartWithModeAutoFallsBackToLaunch'; go test -count=1 ./pkg/tools; go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess|TestGoalRunRecoveryAcrossRealProcessRestart|TestDaemonBackedGoalRunRecoveryAcrossRealProcessRestart'\nNot-tested: Post-fix GitHub Actions run not yet observed at commit time
The CI blocker fix needed one more hardening step: the no-Chrome cleanup regression must not depend on host environment. BrowserSession now accepts an internal findChromeFn hook, and the test injects an empty result so the chrome-not-found cleanup path is validated deterministically.\n\nConstraint: Keep the change internal to BrowserSession and avoid changing public browser tool behavior\nRejected: Rely on CI/local machine Chrome availability to test the no-Chrome path | environment-dependent and flaky\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: When startup paths need environment-specific branches tested, inject the discovery primitive instead of depending on machine state\nTested: go test -count=1 ./pkg/tools -run 'TestBrowserSessionStartWithModeAutoCleansUpAfterLaunchFailure|TestBrowserSessionLaunchWithoutChromeCleansUpCancel|TestBrowserSessionStartWithModeAutoFallsBackToLaunch'; go test -count=1 ./pkg/tools; go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess|TestGoalRunRecoveryAcrossRealProcessRestart|TestDaemonBackedGoalRunRecoveryAcrossRealProcessRestart'\nNot-tested: post-fix GitHub Actions rerun not yet observed at commit time
The CI browser regression was still failing because Chrome startup only slept for a fixed two seconds and then attempted a single DevTools attach. On GitHub runners that proved too early, producing transient 'could not resolve IP for 127.0.0.1' failures. BrowserSession now retries the debug attach until the timeout budget expires, and the retry helper is covered by a focused regression.\n\nConstraint: Preserve local fast path while making CI browser startup resilient to slower runner readiness\nRejected: Increase fixed sleep to an arbitrary larger constant | slower and still flaky compared with bounded retry\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Any future browser startup readiness check should prefer bounded retries over fixed sleeps when coordinating with external processes\nTested: go test -count=1 ./pkg/tools -run 'TestBrowserSessionConnectWithRetryEventuallySucceeds|TestBrowserSessionStartWithModeAutoCleansUpAfterLaunchFailure|TestBrowserSessionLaunchWithoutChromeCleansUpCancel|TestBrowserSessionStartWithModeAutoFallsBackToLaunch'; go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
The browser Goal Run E2E was still flaky in GitHub Actions because navigate() subscribed to DOMContentLoaded after Page.Navigate, which allowed fast page loads to outrun the listener and hang the wait. Navigation now subscribes before Page.Navigate and uses a bounded wait context when receiving the DOMContentLoaded event.\n\nConstraint: Keep navigation semantics unchanged while eliminating CI timing races\nRejected: Longer fixed sleeps in the E2E flow | slower and less reliable than fixing the event ordering in the tool\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Any future page-load wait in BrowserTool should subscribe before triggering navigation to avoid post-event listeners hanging indefinitely\nTested: go test -count=1 ./pkg/tools -run 'TestBrowserSessionConnectWithRetryEventuallySucceeds|TestBrowserSessionStartWithModeAutoCleansUpAfterLaunchFailure|TestBrowserSessionLaunchWithoutChromeCleansUpCancel|TestBrowserSessionStartWithModeAutoFallsBackToLaunch'; go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
The CI browser regression was still hanging inside BrowserTool.navigate because waiting on DOMContentLoaded via the CDP event stream remained fragile on GitHub runners. The navigation path now uses a bounded readyState poll after Page.Navigate, which is more resilient to event timing races while preserving the existing behavior contract.\n\nConstraint: Keep the browser tool behavior stable for existing callers while removing CI-specific page-load hangs\nRejected: Keep relying on DOMContentLoaded event subscription | still flaky under fast/slow runner timing differences\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: Prefer bounded state polling over event-stream waits for page-load completion when the event ordering is not guaranteed across environments\nTested: go test -count=1 ./pkg/tools; go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
The browser regression in CI had advanced from startup hangs to stale DOM node lookups when BrowserTool used click+type on the create dialog inputs. The test now uses execute_script to set input and textarea values directly and dispatch input events, which avoids stale-node timing races while still exercising the browser-visible Goal Run flow.\n\nConstraint: Keep the test browser-level without broadening BrowserTool just for one workflow\nRejected: Add more retries around low-level click/type in the test | still vulnerable to stale node handles on dynamic dialog DOM\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: For dynamic form UIs in browser-tool E2E tests, prefer direct value+input-event updates over low-level focus/click typing when DOM replacement is possible\nTested: go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
The CI browser regression had narrowed to the create dialog render timing: the one-shot script assumed all three form fields were already present and could throw an uncaught error on slower runners. The test now waits briefly after opening the dialog and sets each field with its own guarded script, which keeps the browser-level flow stable without widening the production surface.\n\nConstraint: Keep the browser regression focused on user-visible flow, not on micro-timing details of the dialog renderer\nRejected: Add larger arbitrary sleeps everywhere in the flow | slower and less disciplined than targeting the dialog render gap directly\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: For browser-tool E2E over dialog-driven UIs, treat dialog field availability as asynchronous and guard each critical selector independently\nTested: go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
The latest CI failure showed the create dialog inputs can still lag behind the click that opens the Goal Run form. The browser E2E now waits asynchronously for all required selectors to appear before trying to set values, which keeps the test aligned with the actual UI lifecycle instead of assuming a fixed render delay.\n\nConstraint: Keep the browser regression user-flow oriented while tolerating slower CI dialog rendering\nRejected: Increase the fixed post-click sleep again | less precise and still prone to timing races\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: When browser E2E depends on dynamic dialog fields, wait for the concrete selectors rather than relying on generic delays\nTested: go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
The CI browser regression showed that even when the create dialog button was clicked, the dialog container and its fields did not become available on the same schedule. The test now waits first for the dialog shell, then waits for each required field during hydration, which matches the UI lifecycle more closely and reduces timing-related false failures.\n\nConstraint: Keep the browser regression aligned with actual UI readiness states instead of assuming synchronous dialog hydration\nRejected: Keep only a single shared field wait with a short deadline | still too sensitive to staggered dialog rendering in CI\nConfidence: high\nScope-risk: narrow\nReversibility: clean\nDirective: For browser E2E on modal UIs, separate “dialog is open” readiness from “all fields are hydrated” readiness when CI rendering order matters\nTested: go test -count=1 ./pkg/webui -run 'TestGoalRunBrowserToolManualFlowE2E|TestManualOnlyGoalRunHTTPFlowCompletesOnFreshProcess'\nNot-tested: Latest GitHub Actions rerun not yet observed at commit time
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated CI audit branch for release-readiness verification. Empty commit to trigger GitHub Actions.