perf(storage): speed up connection load and clarify in-memory wrapping#726
Conversation
…tion Rename the on-read transform helpers from migrateToV2/migrateToV3 to wrapV1AsV2/wrapV2AsCurrent and update trace messages to state explicitly that wrapping an older stored item into the current StoredItem shape is a pure in-memory operation that is recomputed on every read and never persisted. This removes the confusion of seeing 'migration' in the trace on every load when nothing is actually being migrated to disk.
Previously fixFolderConnectionStrings, cleanupDuplicateConnectionStringParameters and cleanupInvalidConnectionStrings each re-read every storage zone via getItems(), so startup cleanup performed ~6 full zone reads (3 cleaners x 2 zones). Each getItems() does a SecretStorage round-trip per item, which on Remote-WSL crosses the WSL2 <-> Windows boundary and dominates load time. resolvePostMigrationErrors now loads each zone exactly once and threads the same in-memory array through all three cleaners, cutting zone reads from ~6 to 2. The cleaners operate on disjoint item sets (folders vs. connections vs. invalid/unparseable connections) so a single read per zone is safe. Telemetry counters are now aggregated by the orchestrator.
Add a STORAGE_CLEANUP_VERSION marker ('0.8.1', the patch this ships with)
persisted to globalState once resolvePostMigrationErrors completes. On every
subsequent load, if the stored marker matches the current cleanup-schema
version, the whole cleanup pass (folder fixups, duplicate-param cleanup,
invalid-connection removal, orphan sweep) is skipped instead of re-scanning
both zones each session.
The marker doubles as a guarantee: any install carrying '0.8.1' is known to
have completed every one-time format upgrade and corruption cleanup up to that
release. The constant is only bumped when a new one-time step is added, in
which case existing installs re-run the pass exactly once. The marker is
written only after a successful run, so interrupted runs safely retry.
Adds cleanupSkipped/cleanupVersion telemetry for visibility.
getItems awaited ext.secretStorage.get() once per item inside a for-loop, so N items cost N sequential round trips to the main process. Under Remote-WSL each round trip also crosses the WSL2 <-> Windows boundary, making this the dominant component of connection load time. Collect item metadata from globalState first (synchronous, in-memory), then dispatch all secret reads together with Promise.all so the round trips pipeline over the shared RPC channel. Total wait drops from ~N round trips to ~1. This is safe because VS Code serializes secret access per-key rather than globally and caches the decrypted store after the first read, so reads for distinct item keys do not block one another. Behavior and return shape are unchanged.
Remove migrateFromAzureDatabases, getMongoMigrationApi, the MongoConnectionMigrationApi interface, the attempts-counter globalState key, and the gating call in getStorageService (plus now-unused apiUtils, vscode, l10n and isVCoreAndRURolloutEnabled imports). This path imported MongoDB cluster connections from the Azure Databases (ms-azuretools.vscode-cosmosdb) extension on first storage access. It shipped several releases ago, is self-limiting (only acts when that extension is installed and still holds un-imported connections), and otherwise added pure startup overhead on every load (extension lookup + cross-extension API negotiation) behind an attempts<20 counter. The remaining population is effectively flat. Unlike the on-disk format readers (wrapV1AsV2 etc., kept), dropping this only affects a narrow set of users who never opened the extension, and they have a clear workaround: re-add the connection. The read-time format wrapping is unaffected.
There was a problem hiding this comment.
Pull request overview
This PR reduces connection-load latency (notably under Remote-WSL) by cutting down SecretStorage round-trips and by avoiding repeated post-migration cleanup work on every activation. It also clarifies that the legacy-record “migration” work is an in-memory wrap performed at read time (not a persisted migration), and removes the legacy Azure Databases cross-extension import path.
Changes:
- Parallelize SecretStorage reads in
Storage.getItems()withPromise.all()to avoid N sequential RPC round trips. - Refactor startup cleanup to load each storage zone once, and gate the cleanup pass behind a persisted “cleanup completed” marker.
- Rename legacy “migration” helpers and traces to reflect in-memory wrapping; remove the Azure Databases import + related l10n keys; add a design write-up doc.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/services/storageService.ts | Dispatches SecretStorage reads concurrently in getItems() to reduce latency. |
| src/services/connectionStorageService.ts | Gates and refactors startup cleanup; clarifies in-memory wrapping; removes Azure Databases import path. |
| l10n/bundle.l10n.json | Removes localization entries that were only used by deleted Azure Databases import code. |
| docs/ai-and-plans/PRs/726-storage-load-optimization.md | Adds a rationale/design note documenting the performance work and tradeoffs. |
Previously the orphan cleanup ran fire-and-forget while the cleanup-completed marker was written immediately. An interrupted or failed orphan pass would leave orphans behind and the marker would suppress every future cleanup attempt, hiding affected items from the tree until the user reinstalled the extension or manually cleared globalState. Awaiting the orphan pass ensures the marker is only written on a successful run. callWithTelemetryAndErrorHandling still swallows exceptions, so a thrown orphan-cleanup error simply prevents the marker write and the next activation retries the whole pass. Addresses Copilot review thread PRRT_kwDOODtcO86GkTae on PR #726.
… dirty reads Concurrent activation-path callers (tree providers, URI handler, stats hook) could race the storage-service bootstrap. The previous code assigned _storageService before awaiting cleanup, so a second caller arriving in the same tick would see the field truthy, return immediately, and read dirty data (duplicate-param connections, invalid CS rows, orphans) while cleanup was still running in parallel. The cache then served that snapshot to later readers for up to TTL. Now getStorageService caches a single in-flight bootstrap promise; all external callers await it. _storageService is published only after resolvePostMigrationErrors and collectStorageStats complete, so no consumer ever observes a partially-bootstrapped service. To avoid re-entrant deadlock, the cleanup helpers (resolvePostMigrationErrors, cleanupOrphanedItems, collectStorageStats, and the inner cleaners) now receive the Storage instance as a parameter instead of calling getStorageService. A new private loadAllItemsFromService helper is shared between getAllItems and the cleanup helpers.
…nErrors The three cleaners in resolvePostMigrationErrors share a single pre-cleanup items snapshot. They are currently safe because they operate on disjoint categories (folders vs. connections vs. invalid/unparseable), but a future cleaner that depends on the in-memory state of a previous cleaner would silently see stale data. Make the invariant explicit so the next person extending the cleanup pass either preserves it or rebuilds the items array between cleaners (cheap now thanks to the per-workspace cache).
… changes Local push/delete invalidate the per-workspace cache, but a SecretStorage mutation from a second VS Code window (or any other extension writing through the same SecretStorage) was not observed. The first window would keep serving the stale snapshot for up to GET_ITEMS_CACHE_TTL_MS. Subscribe to SecretStorage.onDidChange and drop the cache entry for the affected workspace when a secret in this storage namespace changes. Unrelated secret churn is ignored.
|
Fix: serialize storage-service bootstrap behind a shared promise — ea95be1 Concurrent activation-path callers (tree providers, URI handler, stats hook) could race the lazy After this commit, To avoid re-entrant deadlock, the cleanup helpers ( |
|
Fix: invalidate Local This commit subscribes |
|
Doc: document cleaner-pass invariant in The three cleaners in This commit adds an explicit invariant comment so the next person extending the cleanup pass either preserves the disjoint-category contract or rebuilds the |
…ons doc - Move 726-storage-load-optimization.md into 726-storage-load-optimization/ subfolder. - Add review-and-resolutions.md capturing each review finding and what was done about it (commit SHAs, decisions, deferrals).
STORAGE_CLEANUP_VERSION was a semver string ('0.8.1') even though it has
nothing to do with the extension version — it's a cleanup-schema contract
marker. Using a semver here invited drift: future contributors could assume
it tracks package.json and either fail to bump it on a new cleanup step
(silent skip) or bump it on every release (unnecessary re-runs).
Switch to a plain integer counter starting at 1, and update the read-side
type to accept either number (current) or the legacy string ('0.8.1') so
existing installs simply fail the equality check and re-run the cleanup
once — harmless because every cleaner is idempotent.
Also surface the previous marker value via telemetry so we can observe how
many installs cross the legacy->integer boundary.
Addresses Copilot review thread PRRT_kwDOODtcO86GkTa4 on PR #726.
…-time Two reviewer-flagged contract gaps in storageService.ts that don't need code changes but do need accurate docs/comments so future maintainers understand the trade-offs: 1. getItems defensive copy is shallow: nested 'properties' is shared with the cached snapshot. Audit of every current consumer shows no problematic mutator (wrapV2AsCurrent is idempotent, updateParentId immediately awaits save which invalidates the cache). Cloning properties via structuredClone on every read would cost CPU on the activation critical path this cache exists to optimize. Tighten the JSDoc to spell out the contract — item and secrets are fresh, properties is shared, do not mutate. 2. getOrLoadItems timestamps the cache entry at creation, not resolution. A slow SecretStorage load shrinks (or, pathologically, eliminates) the post-resolution reuse window. In practice concurrent in-flight callers are coalesced via the shared promise, so the duplicate-load risk only matters after first resolution. Add an inline comment documenting the trade-off and noting that bumping GET_ITEMS_CACHE_TTL_MS is the cheap escape hatch if telemetry ever shows the window collapsing. Addresses Copilot review threads PRRT_kwDOODtcO86Gk_WK and PRRT_kwDOODtcO86Gk_Vd on PR #726.
The 'Commits: 7 on top of main' line was already inaccurate and would continue to drift on every push. The commit list is one click away on the PR page; keeping a frozen count in a checked-in doc only invites further drift. Addresses Copilot review thread PRRT_kwDOODtcO86GkTbL on PR #726.
Add a format census to connectionStorage.stats so we can decide when the read-time v1/v2 wrapping code in fromStorageItem is safe to remove. Per zone (clusters, emulators): <zone>_v1Items, <zone>_v2Items, <zone>_v3Items, <zone>_unknownVersionItems Aggregate across zones: v1Items, v2Items, v3Items, unknownVersionItems hasLegacyItems = 'true' | 'false' The counts are taken from the raw StorageItem.version field before wrapping (loadAllItemsFromService strips the version). 'unknown' covers anything that doesn't match a known version, including future formats that the service skips today. Once hasLegacyItems drops to ~0% across an entire release window in telemetry, wrapV1AsV2 and wrapV2AsCurrent become deletable along with the v1/v2 cases in fromStorageItem.
|
Feat: per-version item census in stats telemetry — e7f2169 Added a format census to the Per zone (
Aggregate across zones:
The counts are taken from the raw Plus the previously-mentioned Removal criteria: once |
…p refactor - Reset _bootstrap alongside _storageService in beforeEach and mid-test resets so cleanup re-runs per test (was reusing the cached promise) - Pass mockStorage explicitly to cleanupOrphanedItems() calls in orphans tests — signature now requires a storageService argument
✅ Code Quality Checks
This comment is updated automatically on each push. |
📦 Build Size Report
Download artifact · updated automatically on each push. |
Why
Connection storage code showed "migration" running in traces on every load, and connection loading was slow on Remote-WSL where each SecretStorage read crosses the WSL2↔Windows boundary. Investigation found the cost came from (a) re-reading every storage zone multiple times during startup cleanup, (b) running that cleanup on every session, (c) reading item secrets one-at-a-time, and (d) independent activation-time consumers each re-reading the same zone back-to-back.
What changed (one concern per commit)
migrateToV2/migrateToV3→wrapV1AsV2/wrapV2AsCurrentand fix trace text. Wrapping an old stored item into the current shape is a pure in-memory transform recomputed on every read; it was never persisted, so "migration" in the trace was misleading.STORAGE_CLEANUP_VERSIONmarker is persisted after a successful cleanup pass; subsequent loads skip the whole pass. The marker only bumps when a new one-time step is added.await secretStorage.get()loop withPromise.all, pipelining round trips over the shared RPC channel (~N round trips → ~1).ms-azuretools.vscode-cosmosdb. It's self-limiting, the remaining population is flat, and it added startup overhead on every load. Unlike on-disk format readers (kept), dropping it only affects users who never opened the extension, with a clear workaround (re-add the connection).StorageImpl.getItems(). Concurrent callers share one in-flight read (request coalescing), and callers within a short TTL reuse the resolved snapshot instead of issuing a fresh read. Every caller gets a defensive copy;push/deleteinvalidate immediately and failed reads are evicted so they retry. AddsstorageService.test.ts(6 tests).Performance impact
In our test environments, these changes reduce redundant secret-store reads during activation by well over half (roughly 60–80% depending on the connection profile). Because each avoided read is a round trip to secure storage, this is a noticeable speed-up in connection load time — with the largest gains on remote setups (e.g. Remote-WSL), where every saved read also avoids a cross-boundary round trip.
Beyond the read count, the previous behavior re-read the same storage zones repeatedly spread across several seconds of startup (cleanup ran on every session and independent consumers each triggered their own reads). These changes collapse that into a single pass per zone, so storage is no longer being re-hammered well after activation begins.
Validation
npm run l10n,npm run prettier-fix,npm run lint— cleannpx jest --no-coverage— 2001 tests / 102 suites pass (addedstorageService.test.ts, 6 tests)npm run build— cleanA detailed write-up of the reasoning is in
docs/ai-and-plans/PRs/.