Keep provider diagnostics useful when discovery is slow#1724
Conversation
Run provider diagnostics and model refresh in parallel, capture failures as diagnostic rows, and add ACP phase probes so one slow or broken step does not hide the rest of the report.
|
| Filename | Overview |
|---|---|
| packages/server/src/server/agent/providers/acp-agent.ts | Adds buildACPProbeDiagnosticRows for per-phase ACP diagnostics; refactors spawnProcess into spawnTransport + initializeTransport; introduces onSpawned callback for early probe reference capture in fetchCatalog. Probe cleanup on timeout is functional but has a window where probe is null if the outer timeout fires during spawnTransport. |
| packages/server/src/server/agent/provider-snapshot-manager.ts | Refactors getProviderDiagnostic to run base diagnostic and snapshot refresh in parallel; adds diagnosticTimeoutMs; converts unknown-provider and probe failures to diagnostic rows instead of thrown errors. |
| packages/server/src/server/agent/providers/generic-acp-agent.ts | Extends getDiagnostic to include ACP phase rows; errors during launch resolution become rows instead of aborting the diagnostic. Adds diagnosticPhaseTimeoutMs option. |
| packages/server/src/server/agent/providers/generic-acp-agent.diagnostic.test.ts | Replaces static-info-only test with live ACP probe tests using a fake ACP agent script; covers success, hung session/new, catalog timeout with process-exit verification, and missing launcher path. |
| packages/server/src/server/agent/provider-snapshot-manager.test.ts | Adds comprehensive tests for timeout, error, and parallel-start scenarios in getProviderDiagnostic; adds withEnv helper to address prior feedback about env-var management in test bodies. |
| packages/server/src/executable-resolution/executable-resolution.test.ts | Adds two tests for the exists guard fix; one goes through findExecutable (correct), the other imports windowsExecutableResolution directly from windows.ts and uses vi.fn for call-count assertion. |
| packages/server/src/executable-resolution/windows.ts | Bug fix: adds an exists check in findFirstProbeable before probing each candidate, preventing false-positive probes on fabricated absolute paths. |
| packages/server/src/server/agent/providers/diagnostic-utils.ts | Exports truncateForDiagnostic for use in acp-agent.ts; no behavioral changes. |
| packages/client/src/daemon-client.ts | Doubles all provider-related client timeouts to align with new server-side parallel diagnostics and increased refresh window. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Client
participant Manager as ProviderSnapshotManager
participant BaseDiag as getBaseProviderDiagnostic
participant SnapRefresh as refreshDiagnosticSnapshotEntry
participant ACPDiag as buildACPProbeDiagnosticRows
participant Fetch as fetchCatalog
Client->>Manager: getProviderDiagnostic(provider)
Manager->>BaseDiag: start (no await)
Manager->>SnapRefresh: start (no await)
Manager->>Manager: Promise.all([base, snapshot])
par Base diagnostic
BaseDiag->>ACPDiag: "getDiagnostic() -> buildACPProbeDiagnosticRows"
ACPDiag-->>ACPDiag: "spawnTransport -> ACP spawn row"
ACPDiag-->>ACPDiag: "initializeTransport -> ACP initialize row"
ACPDiag-->>ACPDiag: "session/new -> ACP session row"
ACPDiag-->>BaseDiag: phase rows (or timeout error row)
and Snapshot refresh
SnapRefresh->>Fetch: "refreshSnapshotForCwd -> fetchCatalog"
Fetch-->>Fetch: spawnProcess (onSpawned captures probe ref)
Fetch-->>SnapRefresh: catalog or timeout error
SnapRefresh-->>Manager: ProviderSnapshotEntry
end
Manager-->>Client: "{ provider, diagnostic: baseDiag + Models + Status }"
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Client
participant Manager as ProviderSnapshotManager
participant BaseDiag as getBaseProviderDiagnostic
participant SnapRefresh as refreshDiagnosticSnapshotEntry
participant ACPDiag as buildACPProbeDiagnosticRows
participant Fetch as fetchCatalog
Client->>Manager: getProviderDiagnostic(provider)
Manager->>BaseDiag: start (no await)
Manager->>SnapRefresh: start (no await)
Manager->>Manager: Promise.all([base, snapshot])
par Base diagnostic
BaseDiag->>ACPDiag: "getDiagnostic() -> buildACPProbeDiagnosticRows"
ACPDiag-->>ACPDiag: "spawnTransport -> ACP spawn row"
ACPDiag-->>ACPDiag: "initializeTransport -> ACP initialize row"
ACPDiag-->>ACPDiag: "session/new -> ACP session row"
ACPDiag-->>BaseDiag: phase rows (or timeout error row)
and Snapshot refresh
SnapRefresh->>Fetch: "refreshSnapshotForCwd -> fetchCatalog"
Fetch-->>Fetch: spawnProcess (onSpawned captures probe ref)
Fetch-->>SnapRefresh: catalog or timeout error
SnapRefresh-->>Manager: ProviderSnapshotEntry
end
Manager-->>Client: "{ provider, diagnostic: baseDiag + Models + Status }"
Reviews (3): Last reviewed commit: "fix(server): avoid probing missing windo..." | Re-trigger Greptile
|
Addressed the follow-ups in e622ca1: fetchCatalog now owns the ACP probe as soon as the child is spawned so cleanup is not skipped during initialize/session timeouts, the missing-launcher diagnostic test uses an isolated temp dir for Windows, and env restoration moved into a withEnv helper. |
|
Fixed the Windows server-test failure in cdda4df: Windows executable resolution now skips fabricated absolute-path .exe/.cmd candidates unless the file actually exists, so missing ACP launchers report as unresolved instead of a phantom .cmd path. |
Linked issue
Refs #1475
Type of change
What does this PR do
Provider diagnostics now keep returning useful information even when model discovery or a provider probe is slow, errors, or times out.
The diagnostic request starts model refresh and provider-specific probes in parallel, reports failures as diagnostic rows instead of dropping the whole sheet, and adds ACP phase rows so the slow or broken phase is visible.
How did you verify it
Local verification:
npx vitest run packages/server/src/server/agent/provider-snapshot-manager.test.ts --bail=1npx vitest run packages/server/src/server/agent/providers/generic-acp-agent.diagnostic.test.ts --bail=1npm run typechecknpm run lintnpm run formatChecklist
npm run typecheckpassesnpm run lintpassesnpm run formatran (Biome)