Skip to content

Keep provider diagnostics useful when discovery is slow#1724

Merged
boudra merged 4 commits into
mainfrom
fix/provider-diagnostics-resilient
Jun 25, 2026
Merged

Keep provider diagnostics useful when discovery is slow#1724
boudra merged 4 commits into
mainfrom
fix/provider-diagnostics-resilient

Conversation

@boudra

@boudra boudra commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Linked issue

Refs #1475

Type of change

  • Bug fix
  • New feature (with prior issue + design alignment)
  • Refactor / code improvement
  • Docs

What does this PR do

Provider diagnostics now keep returning useful information even when model discovery or a provider probe is slow, errors, or times out.

The diagnostic request starts model refresh and provider-specific probes in parallel, reports failures as diagnostic rows instead of dropping the whole sheet, and adds ACP phase rows so the slow or broken phase is visible.

How did you verify it

Local verification:

  • npx vitest run packages/server/src/server/agent/provider-snapshot-manager.test.ts --bail=1
  • npx vitest run packages/server/src/server/agent/providers/generic-acp-agent.diagnostic.test.ts --bail=1
  • npm run typecheck
  • npm run lint
  • npm run format

Checklist

  • One focused change. Unrelated cleanups split out.
  • npm run typecheck passes
  • npm run lint passes
  • npm run format ran (Biome)
  • UI changes include screenshots or video for every affected platform (N/A, server/client diagnostics only)
  • Tests added or updated where it made sense

boudra added 2 commits June 25, 2026 17:12
Run provider diagnostics and model refresh in parallel, capture failures as diagnostic rows, and add ACP phase probes so one slow or broken step does not hide the rest of the report.
@greptile-apps

greptile-apps Bot commented Jun 25, 2026

Copy link
Copy Markdown

Greptile Summary

Provider diagnostics now return useful information even when model discovery or an ACP probe is slow, errors, or times out. The snapshot refresh and the provider-specific diagnostic probe run in parallel, each with its own timeout; failures in either leg produce diagnostic rows rather than aborting the whole response.

  • getProviderDiagnostic launches getBaseProviderDiagnostic and refreshDiagnosticSnapshotEntry concurrently via Promise.all, so a slow catalog fetch no longer blocks the provider probe. Unknown providers and probe failures return as formatted diagnostic text instead of thrown errors.
  • acp-agent.ts is refactored into spawnTransport + initializeTransport phases; buildACPProbeDiagnosticRows emits a timed row for each ACP lifecycle phase (spawn, initialize, session/new, cleanup). fetchCatalog gains an onSpawned callback to capture the child reference before initialization completes.
  • Client timeouts are doubled across the board to match the expanded server-side windows (diagnostic endpoint: 30 s \u2192 180 s).

Confidence Score: 5/5

Safe to merge; the parallel diagnostic flow is well-bounded by independent timeouts, errors are isolated to diagnostic rows, and the refactored ACP process lifecycle is covered by targeted integration tests.

Both the snapshot refresh and the provider probe are timeout-guarded independently, error paths convert to diagnostic text rather than propagating, and the onSpawned callback closes the largest cleanup gap in the ACP fetch path. Integration tests cover the live ACP lifecycle including hung-session timeout and process-exit verification.

The buildACPProbeDiagnosticRows method in acp-agent.ts and the fetchCatalog onSpawned interaction are the most complex paths; the cleanup sequencing around the spawnTransport phase is worth a second read.

Important Files Changed

Filename Overview
packages/server/src/server/agent/providers/acp-agent.ts Adds buildACPProbeDiagnosticRows for per-phase ACP diagnostics; refactors spawnProcess into spawnTransport + initializeTransport; introduces onSpawned callback for early probe reference capture in fetchCatalog. Probe cleanup on timeout is functional but has a window where probe is null if the outer timeout fires during spawnTransport.
packages/server/src/server/agent/provider-snapshot-manager.ts Refactors getProviderDiagnostic to run base diagnostic and snapshot refresh in parallel; adds diagnosticTimeoutMs; converts unknown-provider and probe failures to diagnostic rows instead of thrown errors.
packages/server/src/server/agent/providers/generic-acp-agent.ts Extends getDiagnostic to include ACP phase rows; errors during launch resolution become rows instead of aborting the diagnostic. Adds diagnosticPhaseTimeoutMs option.
packages/server/src/server/agent/providers/generic-acp-agent.diagnostic.test.ts Replaces static-info-only test with live ACP probe tests using a fake ACP agent script; covers success, hung session/new, catalog timeout with process-exit verification, and missing launcher path.
packages/server/src/server/agent/provider-snapshot-manager.test.ts Adds comprehensive tests for timeout, error, and parallel-start scenarios in getProviderDiagnostic; adds withEnv helper to address prior feedback about env-var management in test bodies.
packages/server/src/executable-resolution/executable-resolution.test.ts Adds two tests for the exists guard fix; one goes through findExecutable (correct), the other imports windowsExecutableResolution directly from windows.ts and uses vi.fn for call-count assertion.
packages/server/src/executable-resolution/windows.ts Bug fix: adds an exists check in findFirstProbeable before probing each candidate, preventing false-positive probes on fabricated absolute paths.
packages/server/src/server/agent/providers/diagnostic-utils.ts Exports truncateForDiagnostic for use in acp-agent.ts; no behavioral changes.
packages/client/src/daemon-client.ts Doubles all provider-related client timeouts to align with new server-side parallel diagnostics and increased refresh window.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client
    participant Manager as ProviderSnapshotManager
    participant BaseDiag as getBaseProviderDiagnostic
    participant SnapRefresh as refreshDiagnosticSnapshotEntry
    participant ACPDiag as buildACPProbeDiagnosticRows
    participant Fetch as fetchCatalog

    Client->>Manager: getProviderDiagnostic(provider)
    Manager->>BaseDiag: start (no await)
    Manager->>SnapRefresh: start (no await)
    Manager->>Manager: Promise.all([base, snapshot])

    par Base diagnostic
        BaseDiag->>ACPDiag: "getDiagnostic() -> buildACPProbeDiagnosticRows"
        ACPDiag-->>ACPDiag: "spawnTransport -> ACP spawn row"
        ACPDiag-->>ACPDiag: "initializeTransport -> ACP initialize row"
        ACPDiag-->>ACPDiag: "session/new -> ACP session row"
        ACPDiag-->>BaseDiag: phase rows (or timeout error row)
    and Snapshot refresh
        SnapRefresh->>Fetch: "refreshSnapshotForCwd -> fetchCatalog"
        Fetch-->>Fetch: spawnProcess (onSpawned captures probe ref)
        Fetch-->>SnapRefresh: catalog or timeout error
        SnapRefresh-->>Manager: ProviderSnapshotEntry
    end

    Manager-->>Client: "{ provider, diagnostic: baseDiag + Models + Status }"
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client
    participant Manager as ProviderSnapshotManager
    participant BaseDiag as getBaseProviderDiagnostic
    participant SnapRefresh as refreshDiagnosticSnapshotEntry
    participant ACPDiag as buildACPProbeDiagnosticRows
    participant Fetch as fetchCatalog

    Client->>Manager: getProviderDiagnostic(provider)
    Manager->>BaseDiag: start (no await)
    Manager->>SnapRefresh: start (no await)
    Manager->>Manager: Promise.all([base, snapshot])

    par Base diagnostic
        BaseDiag->>ACPDiag: "getDiagnostic() -> buildACPProbeDiagnosticRows"
        ACPDiag-->>ACPDiag: "spawnTransport -> ACP spawn row"
        ACPDiag-->>ACPDiag: "initializeTransport -> ACP initialize row"
        ACPDiag-->>ACPDiag: "session/new -> ACP session row"
        ACPDiag-->>BaseDiag: phase rows (or timeout error row)
    and Snapshot refresh
        SnapRefresh->>Fetch: "refreshSnapshotForCwd -> fetchCatalog"
        Fetch-->>Fetch: spawnProcess (onSpawned captures probe ref)
        Fetch-->>SnapRefresh: catalog or timeout error
        SnapRefresh-->>Manager: ProviderSnapshotEntry
    end

    Manager-->>Client: "{ provider, diagnostic: baseDiag + Models + Status }"
Loading

Reviews (3): Last reviewed commit: "fix(server): avoid probing missing windo..." | Re-trigger Greptile

Comment thread packages/server/src/server/agent/providers/acp-agent.ts
@boudra

boudra commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the follow-ups in e622ca1: fetchCatalog now owns the ACP probe as soon as the child is spawned so cleanup is not skipped during initialize/session timeouts, the missing-launcher diagnostic test uses an isolated temp dir for Windows, and env restoration moved into a withEnv helper.

@boudra

boudra commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Fixed the Windows server-test failure in cdda4df: Windows executable resolution now skips fabricated absolute-path .exe/.cmd candidates unless the file actually exists, so missing ACP launchers report as unresolved instead of a phantom .cmd path.

@boudra boudra merged commit 2cf041c into main Jun 25, 2026
15 checks passed
@boudra boudra deleted the fix/provider-diagnostics-resilient branch June 25, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant