perf(storage): speed up connection load and clarify in-memory wrapping by tnaum-ms · Pull Request #726 · microsoft/vscode-documentdb

tnaum-ms · 2026-06-02T21:02:59Z

Why

Connection storage code showed "migration" running in traces on every load, and connection loading was slow on Remote-WSL where each SecretStorage read crosses the WSL2↔Windows boundary. Investigation found the cost came from (a) re-reading every storage zone multiple times during startup cleanup, (b) running that cleanup on every session, (c) reading item secrets one-at-a-time, and (d) independent activation-time consumers each re-reading the same zone back-to-back.

What changed (one concern per commit)

refactor(storage): clarify read-time wrapping is in-memory, not migration — Rename migrateToV2/migrateToV3 → wrapV1AsV2/wrapV2AsCurrent and fix trace text. Wrapping an old stored item into the current shape is a pure in-memory transform recomputed on every read; it was never persisted, so "migration" in the trace was misleading.
perf(storage): load each zone once for post-migration cleanup — The three cleaners each re-read every zone (~6 zone reads). Now each zone is loaded once and the in-memory array is threaded through all cleaners (2 reads).
perf(storage): gate startup cleanup behind a one-time version marker — A STORAGE_CLEANUP_VERSION marker is persisted after a successful cleanup pass; subsequent loads skip the whole pass. The marker only bumps when a new one-time step is added.
perf(storage): read item secrets concurrently in getItems — Replace the per-item sequential await secretStorage.get() loop with Promise.all, pipelining round trips over the shared RPC channel (~N round trips → ~1).
chore(storage): drop one-time Azure Databases connection import — Remove the cross-extension import from ms-azuretools.vscode-cosmosdb. It's self-limiting, the remaining population is flat, and it added startup overhead on every load. Unlike on-disk format readers (kept), dropping it only affects users who never opened the extension, with a clear workaround (re-add the connection).
chore(l10n): drop strings for removed Azure Databases import — Mechanical follow-up so the localization bundle doesn't carry dead keys.
perf(storage): coalesce and briefly cache getItems during activation — Add a short-lived, per-workspace read-through cache to StorageImpl.getItems(). Concurrent callers share one in-flight read (request coalescing), and callers within a short TTL reuse the resolved snapshot instead of issuing a fresh read. Every caller gets a defensive copy; push/delete invalidate immediately and failed reads are evicted so they retry. Adds storageService.test.ts (6 tests).

Performance impact

In our test environments, these changes reduce redundant secret-store reads during activation by well over half (roughly 60–80% depending on the connection profile). Because each avoided read is a round trip to secure storage, this is a noticeable speed-up in connection load time — with the largest gains on remote setups (e.g. Remote-WSL), where every saved read also avoids a cross-boundary round trip.

Beyond the read count, the previous behavior re-read the same storage zones repeatedly spread across several seconds of startup (cleanup ran on every session and independent consumers each triggered their own reads). These changes collapse that into a single pass per zone, so storage is no longer being re-hammered well after activation begins.

Validation

npm run l10n, npm run prettier-fix, npm run lint — clean
npx jest --no-coverage — 2001 tests / 102 suites pass (added storageService.test.ts, 6 tests)
npm run build — clean

A detailed write-up of the reasoning is in docs/ai-and-plans/PRs/.

…tion Rename the on-read transform helpers from migrateToV2/migrateToV3 to wrapV1AsV2/wrapV2AsCurrent and update trace messages to state explicitly that wrapping an older stored item into the current StoredItem shape is a pure in-memory operation that is recomputed on every read and never persisted. This removes the confusion of seeing 'migration' in the trace on every load when nothing is actually being migrated to disk.

Previously fixFolderConnectionStrings, cleanupDuplicateConnectionStringParameters and cleanupInvalidConnectionStrings each re-read every storage zone via getItems(), so startup cleanup performed ~6 full zone reads (3 cleaners x 2 zones). Each getItems() does a SecretStorage round-trip per item, which on Remote-WSL crosses the WSL2 <-> Windows boundary and dominates load time. resolvePostMigrationErrors now loads each zone exactly once and threads the same in-memory array through all three cleaners, cutting zone reads from ~6 to 2. The cleaners operate on disjoint item sets (folders vs. connections vs. invalid/unparseable connections) so a single read per zone is safe. Telemetry counters are now aggregated by the orchestrator.

Add a STORAGE_CLEANUP_VERSION marker ('0.8.1', the patch this ships with) persisted to globalState once resolvePostMigrationErrors completes. On every subsequent load, if the stored marker matches the current cleanup-schema version, the whole cleanup pass (folder fixups, duplicate-param cleanup, invalid-connection removal, orphan sweep) is skipped instead of re-scanning both zones each session. The marker doubles as a guarantee: any install carrying '0.8.1' is known to have completed every one-time format upgrade and corruption cleanup up to that release. The constant is only bumped when a new one-time step is added, in which case existing installs re-run the pass exactly once. The marker is written only after a successful run, so interrupted runs safely retry. Adds cleanupSkipped/cleanupVersion telemetry for visibility.

getItems awaited ext.secretStorage.get() once per item inside a for-loop, so N items cost N sequential round trips to the main process. Under Remote-WSL each round trip also crosses the WSL2 <-> Windows boundary, making this the dominant component of connection load time. Collect item metadata from globalState first (synchronous, in-memory), then dispatch all secret reads together with Promise.all so the round trips pipeline over the shared RPC channel. Total wait drops from ~N round trips to ~1. This is safe because VS Code serializes secret access per-key rather than globally and caches the decrypted store after the first read, so reads for distinct item keys do not block one another. Behavior and return shape are unchanged.

Remove migrateFromAzureDatabases, getMongoMigrationApi, the MongoConnectionMigrationApi interface, the attempts-counter globalState key, and the gating call in getStorageService (plus now-unused apiUtils, vscode, l10n and isVCoreAndRURolloutEnabled imports). This path imported MongoDB cluster connections from the Azure Databases (ms-azuretools.vscode-cosmosdb) extension on first storage access. It shipped several releases ago, is self-limiting (only acts when that extension is installed and still holds un-imported connections), and otherwise added pure startup overhead on every load (extension lookup + cross-extension API negotiation) behind an attempts<20 counter. The remaining population is effectively flat. Unlike the on-disk format readers (wrapV1AsV2 etc., kept), dropping this only affects a narrow set of users who never opened the extension, and they have a clear workaround: re-add the connection. The read-time format wrapping is unaffected.

Copilot

Pull request overview

This PR reduces connection-load latency (notably under Remote-WSL) by cutting down SecretStorage round-trips and by avoiding repeated post-migration cleanup work on every activation. It also clarifies that the legacy-record “migration” work is an in-memory wrap performed at read time (not a persisted migration), and removes the legacy Azure Databases cross-extension import path.

Changes:

Parallelize SecretStorage reads in Storage.getItems() with Promise.all() to avoid N sequential RPC round trips.
Refactor startup cleanup to load each storage zone once, and gate the cleanup pass behind a persisted “cleanup completed” marker.
Rename legacy “migration” helpers and traces to reflect in-memory wrapping; remove the Azure Databases import + related l10n keys; add a design write-up doc.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
src/services/storageService.ts	Dispatches SecretStorage reads concurrently in `getItems()` to reduce latency.
src/services/connectionStorageService.ts	Gates and refactors startup cleanup; clarifies in-memory wrapping; removes Azure Databases import path.
l10n/bundle.l10n.json	Removes localization entries that were only used by deleted Azure Databases import code.
docs/ai-and-plans/PRs/726-storage-load-optimization.md	Adds a rationale/design note documenting the performance work and tradeoffs.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Previously the orphan cleanup ran fire-and-forget while the cleanup-completed marker was written immediately. An interrupted or failed orphan pass would leave orphans behind and the marker would suppress every future cleanup attempt, hiding affected items from the tree until the user reinstalled the extension or manually cleared globalState. Awaiting the orphan pass ensures the marker is only written on a successful run. callWithTelemetryAndErrorHandling still swallows exceptions, so a thrown orphan-cleanup error simply prevents the marker write and the next activation retries the whole pass. Addresses Copilot review thread PRRT_kwDOODtcO86GkTae on PR #726.

… dirty reads Concurrent activation-path callers (tree providers, URI handler, stats hook) could race the storage-service bootstrap. The previous code assigned _storageService before awaiting cleanup, so a second caller arriving in the same tick would see the field truthy, return immediately, and read dirty data (duplicate-param connections, invalid CS rows, orphans) while cleanup was still running in parallel. The cache then served that snapshot to later readers for up to TTL. Now getStorageService caches a single in-flight bootstrap promise; all external callers await it. _storageService is published only after resolvePostMigrationErrors and collectStorageStats complete, so no consumer ever observes a partially-bootstrapped service. To avoid re-entrant deadlock, the cleanup helpers (resolvePostMigrationErrors, cleanupOrphanedItems, collectStorageStats, and the inner cleaners) now receive the Storage instance as a parameter instead of calling getStorageService. A new private loadAllItemsFromService helper is shared between getAllItems and the cleanup helpers.

…nErrors The three cleaners in resolvePostMigrationErrors share a single pre-cleanup items snapshot. They are currently safe because they operate on disjoint categories (folders vs. connections vs. invalid/unparseable), but a future cleaner that depends on the in-memory state of a previous cleaner would silently see stale data. Make the invariant explicit so the next person extending the cleanup pass either preserves it or rebuilds the items array between cleaners (cheap now thanks to the per-workspace cache).

… changes Local push/delete invalidate the per-workspace cache, but a SecretStorage mutation from a second VS Code window (or any other extension writing through the same SecretStorage) was not observed. The first window would keep serving the stale snapshot for up to GET_ITEMS_CACHE_TTL_MS. Subscribe to SecretStorage.onDidChange and drop the cache entry for the affected workspace when a secret in this storage namespace changes. Unrelated secret churn is ignored.

tnaum-ms · 2026-06-03T08:20:33Z

Fix: serialize storage-service bootstrap behind a shared promise — ea95be1

Concurrent activation-path callers (tree providers, URI handler, stats hook) could race the lazy getStorageService() init: the field was assigned before awaiting cleanup, so a second caller arriving in the same tick saw it truthy, returned immediately, and read dirty data (duplicate-param connections, invalid CS rows, orphans) while cleanup was still running. The new getItems cache then served that snapshot for up to TTL.

After this commit, getStorageService() caches a single in-flight bootstrap promise; all external callers await it. _storageService is published only after resolvePostMigrationErrors and collectStorageStats complete, so no consumer ever observes a partially-bootstrapped service.

To avoid re-entrant deadlock, the cleanup helpers (resolvePostMigrationErrors, cleanupOrphanedItems, collectStorageStats, and the inner cleaners) now receive the Storage instance as a parameter instead of calling getStorageService themselves. A new private loadAllItemsFromService helper is shared between getAllItems and the cleanup helpers.

tnaum-ms · 2026-06-03T08:20:33Z

Fix: invalidate getItems cache on cross-window SecretStorage changes — 2acace9

Local push/delete invalidate the per-workspace cache, but a SecretStorage mutation from a second VS Code window (or any other extension writing through the same SecretStorage namespace) was not observed. The first window kept serving the stale snapshot for up to GET_ITEMS_CACHE_TTL_MS (10 s) — realistic for users who keep the DocumentDB tree open in two windows on the same profile.

This commit subscribes StorageImpl to SecretStorage.onDidChange and drops the cache entry for the affected workspace when a changed key belongs to this storage namespace. Unrelated secret churn is ignored. The subscription is registered via ext.context.subscriptions so it's disposed with the extension.

tnaum-ms · 2026-06-03T08:20:33Z

Doc: document cleaner-pass invariant in resolvePostMigrationErrors — d9dd080

The three cleaners in resolvePostMigrationErrors share a single pre-cleanup items snapshot loaded once per zone. They're currently safe because they operate on disjoint categories (folders / connections / invalid CS), but a future cleaner that depends on the in-memory state mutated by a previous cleaner would silently see stale data — saves/deletes are persisted, but the in-memory items are still the pre-cleanup snapshot.

This commit adds an explicit invariant comment so the next person extending the cleanup pass either preserves the disjoint-category contract or rebuilds the items array between cleaners (cheap now thanks to the per-workspace cache).

…ons doc - Move 726-storage-load-optimization.md into 726-storage-load-optimization/ subfolder. - Add review-and-resolutions.md capturing each review finding and what was done about it (commit SHAs, decisions, deferrals).

STORAGE_CLEANUP_VERSION was a semver string ('0.8.1') even though it has nothing to do with the extension version — it's a cleanup-schema contract marker. Using a semver here invited drift: future contributors could assume it tracks package.json and either fail to bump it on a new cleanup step (silent skip) or bump it on every release (unnecessary re-runs). Switch to a plain integer counter starting at 1, and update the read-side type to accept either number (current) or the legacy string ('0.8.1') so existing installs simply fail the equality check and re-run the cleanup once — harmless because every cleaner is idempotent. Also surface the previous marker value via telemetry so we can observe how many installs cross the legacy->integer boundary. Addresses Copilot review thread PRRT_kwDOODtcO86GkTa4 on PR #726.

…-time Two reviewer-flagged contract gaps in storageService.ts that don't need code changes but do need accurate docs/comments so future maintainers understand the trade-offs: 1. getItems defensive copy is shallow: nested 'properties' is shared with the cached snapshot. Audit of every current consumer shows no problematic mutator (wrapV2AsCurrent is idempotent, updateParentId immediately awaits save which invalidates the cache). Cloning properties via structuredClone on every read would cost CPU on the activation critical path this cache exists to optimize. Tighten the JSDoc to spell out the contract — item and secrets are fresh, properties is shared, do not mutate. 2. getOrLoadItems timestamps the cache entry at creation, not resolution. A slow SecretStorage load shrinks (or, pathologically, eliminates) the post-resolution reuse window. In practice concurrent in-flight callers are coalesced via the shared promise, so the duplicate-load risk only matters after first resolution. Add an inline comment documenting the trade-off and noting that bumping GET_ITEMS_CACHE_TTL_MS is the cheap escape hatch if telemetry ever shows the window collapsing. Addresses Copilot review threads PRRT_kwDOODtcO86Gk_WK and PRRT_kwDOODtcO86Gk_Vd on PR #726.

The 'Commits: 7 on top of main' line was already inaccurate and would continue to drift on every push. The commit list is one click away on the PR page; keeping a frozen count in a checked-in doc only invites further drift. Addresses Copilot review thread PRRT_kwDOODtcO86GkTbL on PR #726.

Add a format census to connectionStorage.stats so we can decide when the read-time v1/v2 wrapping code in fromStorageItem is safe to remove. Per zone (clusters, emulators): <zone>_v1Items, <zone>_v2Items, <zone>_v3Items, <zone>_unknownVersionItems Aggregate across zones: v1Items, v2Items, v3Items, unknownVersionItems hasLegacyItems = 'true' | 'false' The counts are taken from the raw StorageItem.version field before wrapping (loadAllItemsFromService strips the version). 'unknown' covers anything that doesn't match a known version, including future formats that the service skips today. Once hasLegacyItems drops to ~0% across an entire release window in telemetry, wrapV1AsV2 and wrapV2AsCurrent become deletable along with the v1/v2 cases in fromStorageItem.

tnaum-ms · 2026-06-03T08:44:14Z

Feat: per-version item census in stats telemetry — e7f2169

Added a format census to the connectionStorage.stats activation event so we can decide when the read-time v1/v2 wrapping code in fromStorageItem (i.e. wrapV1AsV2 / wrapV2AsCurrent) becomes safe to delete.

Per zone (clusters, emulators):

<zone>_v1Items, <zone>_v2Items, <zone>_v3Items, <zone>_unknownVersionItems

Aggregate across zones:

v1Items, v2Items, v3Items, unknownVersionItems
hasLegacyItems = 'true' | 'false'

The counts are taken from the raw StorageItem.version field before wrapping (loadAllItemsFromService strips the version). unknownVersionItems covers anything that doesn't match a known version, including future formats that the service skips today.

Plus the previously-mentioned previousCleanupVersion property on resolvePostMigrationErrors (added in 12f5163) tracks how many installs cross the legacy '0.8.1' → integer-counter marker boundary.

Removal criteria: once hasLegacyItems = 'false' across an entire release window in telemetry, wrapV1AsV2, wrapV2AsCurrent, and the v1/v2 branches in fromStorageItem become safe to delete.

…ration in tests

…p refactor - Reset _bootstrap alongside _storageService in beforeEach and mid-test resets so cleanup re-runs per test (was reusing the cached promise) - Pass mockStorage explicitly to cleanupOrphanedItems() calls in orphans tests — signature now requires a storageService argument

github-actions · 2026-06-03T09:30:28Z

✅ Code Quality Checks

Check	Status	How to fix
Localization (`l10n`)	✅ Passed
ESLint	✅ Passed
Prettier formatting	✅ Passed

This comment is updated automatically on each push.

github-actions · 2026-06-03T09:34:44Z

📦 Build Size Report

Metric	Base (`main`)	PR	Delta
VSIX (`vscode-documentdb-0.8.0.vsix`)	7.60 MB	7.60 MB	⬇️ -0 KB (-0.0%)
Webview bundle (`views.js`)	5.88 MB	5.88 MB	✅ 0 KB (0.0%)

Download artifact · updated automatically on each push.

tnaum-ms added 6 commits June 2, 2026 22:43

chore(l10n): drop strings for removed Azure Databases import

c7de792

Copilot AI review requested due to automatic review settings June 2, 2026 21:02

tnaum-ms requested a review from a team as a code owner June 2, 2026 21:03

Copilot started reviewing on behalf of tnaum-ms June 2, 2026 21:03 View session

tnaum-ms added 2 commits June 2, 2026 23:04

docs(pr): add PR #726 storage load optimization summary

5d935c7

style(storage): apply prettier formatting

8739120

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread src/services/connectionStorageService.ts

Comment thread src/services/connectionStorageService.ts Outdated

Comment thread docs/ai-and-plans/PRs/726-storage-load-optimization.md Outdated

perf(storage): coalesce and briefly cache getItems during activation

c7d3a18

tnaum-ms requested a review from Copilot June 2, 2026 21:50

Copilot started reviewing on behalf of tnaum-ms June 2, 2026 21:50 View session

tnaum-ms added this to DocumentDB for VSCode: Release Plan Jun 2, 2026

tnaum-ms added this to the 0.8.1 milestone Jun 2, 2026

tnaum-ms moved this to In progress in DocumentDB for VSCode: Release Plan Jun 2, 2026

Copilot AI reviewed Jun 2, 2026

View reviewed changes

Comment thread src/services/storageService.ts

Comment thread src/services/storageService.ts

tnaum-ms added 4 commits June 3, 2026 10:14

tnaum-ms added 4 commits June 3, 2026 10:22

tnaum-ms mentioned this pull request Jun 3, 2026

Investigation: can the legacy v1/v2 storage-wrapping code be removed based on telemetry? #729

Open

5 tasks

tnaum-ms moved this from In progress to In review in DocumentDB for VSCode: Release Plan Jun 3, 2026

tnaum-ms added 2 commits June 3, 2026 11:09

fix(storage): add missing comments for subscriptions and event regist…

f684d15

…ration in tests

tnaum-ms added a commit that referenced this pull request Jun 3, 2026

docs: add PR #726 storage startup perf to v0.8.1 release notes

5cf8d5f

tnaum-ms mentioned this pull request Jun 3, 2026

docs: release notes and changelog for v0.8.1 #727

Merged

sajeetharan approved these changes Jun 3, 2026

View reviewed changes

tnaum-ms merged commit 4233592 into main Jun 3, 2026
8 checks passed

tnaum-ms deleted the dev/tnaum/storeage-optimization branch June 3, 2026 09:35

github-project-automation Bot moved this from In review to Done in DocumentDB for VSCode: Release Plan Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(storage): speed up connection load and clarify in-memory wrapping#726

perf(storage): speed up connection load and clarify in-memory wrapping#726
tnaum-ms merged 20 commits into
mainfrom
dev/tnaum/storeage-optimization

tnaum-ms commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tnaum-ms commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed (one concern per commit)

Performance impact

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

tnaum-ms commented Jun 3, 2026

Uh oh!

github-actions Bot commented Jun 3, 2026

✅ Code Quality Checks

Uh oh!

github-actions Bot commented Jun 3, 2026

📦 Build Size Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tnaum-ms commented Jun 2, 2026 •

edited

Loading