Skip to content

perf(tree): throttle background document-count fetches#685

Merged
tnaum-ms merged 12 commits into
mainfrom
dev/tnaum/limit-concurrent-counts
May 28, 2026
Merged

perf(tree): throttle background document-count fetches#685
tnaum-ms merged 12 commits into
mainfrom
dev/tnaum/limit-concurrent-counts

Conversation

@tnaum-ms

Copy link
Copy Markdown
Collaborator

Summary

When a database node is expanded in the tree, every CollectionItem fires its own estimateDocumentCount request in parallel (fire-and-forget, from DatabaseItem.getChildren). For databases with many collections this produces a burst of concurrent requests that:

  • opens many sockets in the MongoDB driver connection pool (default maxPoolSize: 100), and
  • competes with foreground operations (queries, the collection view) for pool slots and server resources.

This PR adds a small in-house concurrency limiter and applies it to the background count fetches so this work stays unobtrusive.

Changes

  • New utility src/utils/concurrencyLimiter.ts exposing createConcurrencyLimiter({ concurrency, interTaskDelayMs }). Caps in-flight tasks and optionally inserts a delay before dispatching the next queued task after one completes.
  • CollectionItem.fetchAndUpdateCount now runs through a per-cluster limiter (keyed by clusterId) with concurrency = 5 and interTaskDelayMs = 250. Each cluster gets its own pool so different clusters never share a queue.

Behaviour: tree expansion still returns immediately, descriptions still fill in as counts arrive, but at most 5 count requests are in flight per cluster and the next dispatch waits 250 ms after a completion. UX impact is negligible (counts trickle in slightly later for the 6th and later collections), pool/server impact is dramatically reduced.

Why not p-limit?

p-limit is the de-facto standard for this and the obvious first choice. It is, however, pure ESM since v4.0.0. This extension is built and bundled as CommonJS (tsconfig module: "commonjs", webpack-bundled dist/extension.js, VS Code extension host loads via require). Using current p-limit in CJS code requires either:

  1. Pin to p-limit@3.1.0 (last CJS release, Oct 2020). Still works but stale, and we would still need to wrap it to add the inter-task delay used by low-priority background work.
  2. Use dynamic import('p-limit') at every call site. Awkward in synchronous code paths and adds an async boundary at module load.
  3. Migrate the whole codebase to ESM. Touches webpack output format, tsconfig, jest/ts-jest, every relative import (ESM requires .js extensions), eslint config. Not worth it for one dependency.

The in-house implementation is ~80 lines including JSDoc, has no runtime dependency, fits the CJS bundle, and gives us a clean place to add the interTaskDelayMs knob (which p-limit does not provide). If the extension ever migrates to ESM we can drop this and swap to p-limit mechanically.

Test plan

  • npm run prettier-fix, npm run lint, npm run build all clean.
  • npx jest --no-coverage: 1900/1900 tests pass. (Three test suites were SIGKILL'd by the OS in CI-like conditions; unrelated to this change.)
  • No user-facing strings changed, so no npm run l10n needed.

Follow-ups (out of scope)

  • Apply the same limiter to other tree-driven background loads (e.g. lazy index counts) once they exist.
  • Add cancellation on tree collapse via the limiter's queue (would need a clearQueue() method similar to p-limit's).

Copilot AI review requested due to automatic review settings May 28, 2026 08:10
@tnaum-ms tnaum-ms requested a review from a team as a code owner May 28, 2026 08:10

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces load spikes caused by tree expansion by introducing a small in-house concurrency limiter and routing background per-collection estimateDocumentCount calls through a per-cluster limiter, keeping the work low-priority and less disruptive to foreground operations.

Changes:

  • Added createConcurrencyLimiter({ concurrency, interTaskDelayMs }) utility to cap in-flight async tasks and optionally pace dispatch.
  • Applied a per-clusterId limiter (concurrency 5, 250ms delay) to CollectionItem.fetchAndUpdateCount background count fetches.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/utils/concurrencyLimiter.ts Adds a reusable promise concurrency limiter with optional inter-task pacing.
src/tree/documentdb/CollectionItem.ts Routes background document count fetches through a per-cluster limiter to reduce request bursts.

Comment thread src/utils/concurrencyLimiter.ts Outdated
Comment thread src/utils/concurrencyLimiter.ts Outdated
Comment thread src/utils/concurrencyLimiter.ts
Comment thread src/tree/documentdb/CollectionItem.ts Outdated
Cap concurrent estimateDocumentCount calls per cluster and add a small
delay between dispatches so lazy tree metadata loads do not monopolize
the MongoDB driver connection pool or burst the server.

- Add createConcurrencyLimiter() in src/utils/concurrencyLimiter.ts
  (in-house, CJS-friendly alternative to p-limit).
- Wrap fetchAndUpdateCount() in CollectionItem with a per-cluster
  limiter (concurrency=5, interTaskDelayMs=250).
Drop interBatchDelayMs from the CollectionItem fetch. The batch-then-rest
pattern was overkill: the slowest count in a batch held up the next batch,
producing visible 'first N, then M' gaps in the tree. A plain semaphore
(concurrency: 5) keeps the pipe smoothly busy without ever exceeding the
cap, which is the right shape for slow background work where individual
task latencies vary.

The interTaskDelayMs and interBatchDelayMs knobs remain available on the
limiter for callers that genuinely want trickle or burst-rest behaviour.

Also clarify the sort-then-enqueue contract in DatabaseItem and expand
the concurrencyLimiter JSDoc with mode-selection guidance and examples.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

Comment thread src/utils/concurrencyLimiter.ts
Comment thread src/utils/concurrencyLimiter.ts Outdated
Comment thread src/utils/concurrencyLimiter.ts
Comment thread src/tree/documentdb/DatabaseItem.ts Outdated
Comment thread src/tree/documentdb/CollectionItem.ts Outdated
Comment thread src/tree/documentdb/CollectionItem.ts Outdated
tnaum-ms added 4 commits May 28, 2026 13:15
Remove interTaskDelayMs and interBatchDelayMs from ConcurrencyLimiterOptions
and from the implementation. Both were unused by the only caller (per-cluster
document-count fetches use a plain semaphore) and the delayed dispatch paths
had two known bugs:

- F1: the timer-based release path decremented 'active' before the delay,
  so new callers could observe 'active < concurrency' and start during the
  delay window. When the queued waiter resumed it also incremented 'active',
  exceeding the configured cap.
- F7: interBatchDelayMs documentation described batch semantics, but the
  implementation refilled one slot per completion, behaving like continuous
  refill. The batch delay almost never fired.

We can re-add a pacing knob later with proper slot-reservation semantics and
tests if a real use case appears.
Math.floor(NaN) is NaN, and Math.max(1, NaN) is also NaN. With concurrency
set to NaN, 'active >= concurrency' is always false, so the limiter
silently stops limiting. Guard with Number.isFinite and fall back to 1.

The only current caller passes a literal 5, so this is pure hardening of an
exported utility.
…ks (N5)

The release path resumes the next waiter inside a try/catch. The current
body cannot throw, but a future change (telemetry, logging, an extra
callback) could. If release ever threw, the queued waiter would never be
resumed and the limiter would deadlock for the lifetime of the process.
Swallowing here is the right tradeoff: a misbehaving callback should not
wedge the whole limiter.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

N5 (defensive dispatch guard): addressed in 7635a56.

Action: wrapped the waiter-resume step inside release() in a try/catch that swallows.

Reason: the current body cannot throw, but if a future change ever adds a throwing call (telemetry, logging, etc.) inside release, the queued waiter would never be resumed and the limiter would deadlock for the rest of the process lifetime. The cost of the guard is a few lines and one extra try block. The benefit is that a misbehaving callback can never wedge the limiter.

tnaum-ms added 2 commits May 28, 2026 13:47
Covers:
- concurrency cap is never exceeded for synchronous and asynchronous task
  shapes
- concurrency is clamped to at least 1 (0 and negative values)
- fractional concurrency is floored
- FIFO dispatch order matches enqueue order
- a rejected task releases its slot and queued tasks proceed
- the cap is preserved across mixed success / rejection workloads
- non-finite concurrency (NaN, +/-Infinity) falls back to 1 instead of
  silently disabling the limit

These tests would have caught the F1 and F5 issues before review.
Before this change, when the user refreshed, collapsed, or re-expanded a
database, DatabaseItem.getChildren constructed fresh CollectionItem
instances. The old instances were dropped from the tree but their queued
or in-flight estimateDocumentCount work continued, eventually writing to
documentCount on the stale instance and firing notifyChildrenChanged on
ids that no longer mattered. That work also competed with foreground
operations for connection pool slots, which is what this PR is supposed
to prevent.

Approach: DatabaseItem maintains a monotonic expansionGeneration counter,
bumped on each getChildren call. The current generation value is captured
into a closure that is handed to CollectionItem as isCurrent(). The
CollectionItem checks isCurrent() twice:

  1. At dispatch time inside the limiter task: if stale, return null
     without issuing the estimateDocumentCount request at all.
  2. After the await returns: if stale, do not write back documentCount
     and do not call notifyChildrenChanged.

The constructor parameter defaults to () => true so any direct callers of
CollectionItem are unaffected.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

F2 (stale-tree-item guard for queued document counts): addressed in 38a78e1.

Action: DatabaseItem now keeps a monotonic expansionGeneration counter that is bumped on every getChildren call. The current value is captured into an isCurrent() closure passed to each CollectionItem. The collection's background fetch checks isCurrent() twice: once at dispatch time inside the limiter task (skip the request entirely if stale), and once after the await returns (skip the writeback and the tree refresh notification).

Reason: before this change, queued or in-flight count work on stale CollectionItem instances would run, hit the server, and write to the now-dropped instance's documentCount field. That defeats the throttling intent (work piles up across refreshes) and consumes connection pool slots that should serve foreground operations. The generation token is the cheapest fix that addresses both: no signal plumbing, no limiter API change, default () => true keeps direct callers of CollectionItem unaffected.

The inner task we hand to the document-count limiter previously captured
`this`, transitively pinning the CollectionItem (and through it the
TreeCluster, DatabaseItemModel, and CollectionItemModel) until the queued
work either ran or the outer chain completed.

Hoist clusterId, dbName, collName, and the isCurrent closure into local
variables before the await. The inner task now captures only those few
strings plus the small isCurrent closure. The outer async frame still
references `this` (it is an instance method) but that frame is short
once the post-await stale-check fires.

Behaviour is unchanged.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

N2 (capture primitives in queued limiter closure): addressed in f9380b7.

Action: hoisted clusterId, dbName, collName, and the isCurrent closure into locals at the top of fetchAndUpdateCount and used those locals inside the limiter task. The inner closure no longer references this.

Reason: the closure handed to the limiter is alive for as long as the await is pending. Previously it captured this, transitively pinning the TreeCluster, DatabaseItemModel, and CollectionItemModel. With this change the queued task pins only a few strings plus the small isCurrent closure. Behavior is unchanged.

tnaum-ms added 2 commits May 28, 2026 13:52
… (F8)

DatabaseItem: shorten the comment above the alphabetical sort. The old
wording promised counts would 'populate predictably from the top of the
visible list downward'. With concurrency > 1, request latency variance
makes completion order non-deterministic even though dispatch order is
FIFO. State only what is true: sorting fixes the dispatch (request) order;
completion order may still differ.

CollectionItem: remove the reference to DOCUMENT_COUNT_CONCURRENCY (the
constant was inlined in an earlier commit) and trim the surrounding prose.
The limit value '5' is now stated directly in the comment so it matches
the code.
CollectionItem and DatabaseItem each carried their own copy of an
escapeMarkdown helper. There is already an exported version in
src/webviews/utils/escapeMarkdown.ts with its own tests. Replace both
local copies with imports of the shared util.

The shared regex escapes a slightly larger set of characters (adds
<, >, &) which is strictly safer for MarkdownString tooltips. Behavior
on tooltip text containing only the previously-handled characters is
unchanged.

Two other duplicates remain in DocumentDBClusterItem.ts and
PlaygroundHoverProvider.ts; those files are outside this PR's scope and
can be consolidated in a follow-up.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

N4 (consolidate duplicated escapeMarkdown): addressed in 81ceabc.

Action: replaced the local escapeMarkdown copies in CollectionItem.ts and DatabaseItem.ts with imports from src/webviews/utils/escapeMarkdown.ts (which already has tests).

Reason: identical helper, two copies, no upside. The shared util's regex covers a slightly larger character set (adds <, >, &), which is strictly safer for MarkdownString tooltips. Two duplicates remain (DocumentDBClusterItem.ts, PlaygroundHoverProvider.ts); those are outside this PR's scope and noted in the commit for a follow-up.

@github-actions

Copy link
Copy Markdown
Contributor

✅ Code Quality Checks

Check Status How to fix
Localization (l10n) ✅ Passed
ESLint ✅ Passed
Prettier formatting ✅ Passed

This comment is updated automatically on each push.

@github-actions

Copy link
Copy Markdown
Contributor

📦 Build Size Report

Metric Base (main) PR Delta
VSIX (vscode-documentdb-0.8.0.vsix) 7.53 MB 7.53 MB ⬆️ +0 KB (+0.0%)
Webview bundle (views.js) 5.88 MB 5.88 MB ✅ 0 KB (0.0%)

Download artifact · updated automatically on each push.

@tnaum-ms tnaum-ms merged commit 1199e68 into main May 28, 2026
8 checks passed
@tnaum-ms tnaum-ms deleted the dev/tnaum/limit-concurrent-counts branch May 28, 2026 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants