perf(tree): throttle background document-count fetches#685
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces load spikes caused by tree expansion by introducing a small in-house concurrency limiter and routing background per-collection estimateDocumentCount calls through a per-cluster limiter, keeping the work low-priority and less disruptive to foreground operations.
Changes:
- Added
createConcurrencyLimiter({ concurrency, interTaskDelayMs })utility to cap in-flight async tasks and optionally pace dispatch. - Applied a per-
clusterIdlimiter (concurrency 5, 250ms delay) toCollectionItem.fetchAndUpdateCountbackground count fetches.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/utils/concurrencyLimiter.ts |
Adds a reusable promise concurrency limiter with optional inter-task pacing. |
src/tree/documentdb/CollectionItem.ts |
Routes background document count fetches through a per-cluster limiter to reduce request bursts. |
Cap concurrent estimateDocumentCount calls per cluster and add a small delay between dispatches so lazy tree metadata loads do not monopolize the MongoDB driver connection pool or burst the server. - Add createConcurrencyLimiter() in src/utils/concurrencyLimiter.ts (in-house, CJS-friendly alternative to p-limit). - Wrap fetchAndUpdateCount() in CollectionItem with a per-cluster limiter (concurrency=5, interTaskDelayMs=250).
f64db45 to
d18180e
Compare
Drop interBatchDelayMs from the CollectionItem fetch. The batch-then-rest pattern was overkill: the slowest count in a batch held up the next batch, producing visible 'first N, then M' gaps in the tree. A plain semaphore (concurrency: 5) keeps the pipe smoothly busy without ever exceeding the cap, which is the right shape for slow background work where individual task latencies vary. The interTaskDelayMs and interBatchDelayMs knobs remain available on the limiter for callers that genuinely want trickle or burst-rest behaviour. Also clarify the sort-then-enqueue contract in DatabaseItem and expand the concurrencyLimiter JSDoc with mode-selection guidance and examples.
Remove interTaskDelayMs and interBatchDelayMs from ConcurrencyLimiterOptions and from the implementation. Both were unused by the only caller (per-cluster document-count fetches use a plain semaphore) and the delayed dispatch paths had two known bugs: - F1: the timer-based release path decremented 'active' before the delay, so new callers could observe 'active < concurrency' and start during the delay window. When the queued waiter resumed it also incremented 'active', exceeding the configured cap. - F7: interBatchDelayMs documentation described batch semantics, but the implementation refilled one slot per completion, behaving like continuous refill. The batch delay almost never fired. We can re-add a pacing knob later with proper slot-reservation semantics and tests if a real use case appears.
Math.floor(NaN) is NaN, and Math.max(1, NaN) is also NaN. With concurrency set to NaN, 'active >= concurrency' is always false, so the limiter silently stops limiting. Guard with Number.isFinite and fall back to 1. The only current caller passes a literal 5, so this is pure hardening of an exported utility.
…ks (N5) The release path resumes the next waiter inside a try/catch. The current body cannot throw, but a future change (telemetry, logging, an extra callback) could. If release ever threw, the queued waiter would never be resumed and the limiter would deadlock for the lifetime of the process. Swallowing here is the right tradeoff: a misbehaving callback should not wedge the whole limiter.
|
N5 (defensive dispatch guard): addressed in 7635a56. Action: wrapped the waiter-resume step inside Reason: the current body cannot throw, but if a future change ever adds a throwing call (telemetry, logging, etc.) inside |
Covers: - concurrency cap is never exceeded for synchronous and asynchronous task shapes - concurrency is clamped to at least 1 (0 and negative values) - fractional concurrency is floored - FIFO dispatch order matches enqueue order - a rejected task releases its slot and queued tasks proceed - the cap is preserved across mixed success / rejection workloads - non-finite concurrency (NaN, +/-Infinity) falls back to 1 instead of silently disabling the limit These tests would have caught the F1 and F5 issues before review.
Before this change, when the user refreshed, collapsed, or re-expanded a
database, DatabaseItem.getChildren constructed fresh CollectionItem
instances. The old instances were dropped from the tree but their queued
or in-flight estimateDocumentCount work continued, eventually writing to
documentCount on the stale instance and firing notifyChildrenChanged on
ids that no longer mattered. That work also competed with foreground
operations for connection pool slots, which is what this PR is supposed
to prevent.
Approach: DatabaseItem maintains a monotonic expansionGeneration counter,
bumped on each getChildren call. The current generation value is captured
into a closure that is handed to CollectionItem as isCurrent(). The
CollectionItem checks isCurrent() twice:
1. At dispatch time inside the limiter task: if stale, return null
without issuing the estimateDocumentCount request at all.
2. After the await returns: if stale, do not write back documentCount
and do not call notifyChildrenChanged.
The constructor parameter defaults to () => true so any direct callers of
CollectionItem are unaffected.
|
F2 (stale-tree-item guard for queued document counts): addressed in 38a78e1. Action: Reason: before this change, queued or in-flight count work on stale |
The inner task we hand to the document-count limiter previously captured `this`, transitively pinning the CollectionItem (and through it the TreeCluster, DatabaseItemModel, and CollectionItemModel) until the queued work either ran or the outer chain completed. Hoist clusterId, dbName, collName, and the isCurrent closure into local variables before the await. The inner task now captures only those few strings plus the small isCurrent closure. The outer async frame still references `this` (it is an instance method) but that frame is short once the post-await stale-check fires. Behaviour is unchanged.
|
N2 (capture primitives in queued limiter closure): addressed in f9380b7. Action: hoisted Reason: the closure handed to the limiter is alive for as long as the await is pending. Previously it captured |
… (F8) DatabaseItem: shorten the comment above the alphabetical sort. The old wording promised counts would 'populate predictably from the top of the visible list downward'. With concurrency > 1, request latency variance makes completion order non-deterministic even though dispatch order is FIFO. State only what is true: sorting fixes the dispatch (request) order; completion order may still differ. CollectionItem: remove the reference to DOCUMENT_COUNT_CONCURRENCY (the constant was inlined in an earlier commit) and trim the surrounding prose. The limit value '5' is now stated directly in the comment so it matches the code.
CollectionItem and DatabaseItem each carried their own copy of an escapeMarkdown helper. There is already an exported version in src/webviews/utils/escapeMarkdown.ts with its own tests. Replace both local copies with imports of the shared util. The shared regex escapes a slightly larger set of characters (adds <, >, &) which is strictly safer for MarkdownString tooltips. Behavior on tooltip text containing only the previously-handled characters is unchanged. Two other duplicates remain in DocumentDBClusterItem.ts and PlaygroundHoverProvider.ts; those files are outside this PR's scope and can be consolidated in a follow-up.
|
N4 (consolidate duplicated escapeMarkdown): addressed in 81ceabc. Action: replaced the local Reason: identical helper, two copies, no upside. The shared util's regex covers a slightly larger character set (adds |
✅ Code Quality Checks
This comment is updated automatically on each push. |
📦 Build Size Report
Download artifact · updated automatically on each push. |
Summary
When a database node is expanded in the tree, every
CollectionItemfires its ownestimateDocumentCountrequest in parallel (fire-and-forget, fromDatabaseItem.getChildren). For databases with many collections this produces a burst of concurrent requests that:maxPoolSize: 100), andThis PR adds a small in-house concurrency limiter and applies it to the background count fetches so this work stays unobtrusive.
Changes
src/utils/concurrencyLimiter.tsexposingcreateConcurrencyLimiter({ concurrency, interTaskDelayMs }). Caps in-flight tasks and optionally inserts a delay before dispatching the next queued task after one completes.CollectionItem.fetchAndUpdateCountnow runs through a per-cluster limiter (keyed byclusterId) withconcurrency = 5andinterTaskDelayMs = 250. Each cluster gets its own pool so different clusters never share a queue.Behaviour: tree expansion still returns immediately, descriptions still fill in as counts arrive, but at most 5 count requests are in flight per cluster and the next dispatch waits 250 ms after a completion. UX impact is negligible (counts trickle in slightly later for the 6th and later collections), pool/server impact is dramatically reduced.
Why not
p-limit?p-limitis the de-facto standard for this and the obvious first choice. It is, however, pure ESM since v4.0.0. This extension is built and bundled as CommonJS (tsconfigmodule: "commonjs", webpack-bundleddist/extension.js, VS Code extension host loads viarequire). Using currentp-limitin CJS code requires either:p-limit@3.1.0(last CJS release, Oct 2020). Still works but stale, and we would still need to wrap it to add the inter-task delay used by low-priority background work.import('p-limit')at every call site. Awkward in synchronous code paths and adds an async boundary at module load.tsconfig, jest/ts-jest, every relative import (ESM requires.jsextensions), eslint config. Not worth it for one dependency.The in-house implementation is ~80 lines including JSDoc, has no runtime dependency, fits the CJS bundle, and gives us a clean place to add the
interTaskDelayMsknob (whichp-limitdoes not provide). If the extension ever migrates to ESM we can drop this and swap top-limitmechanically.Test plan
npm run prettier-fix,npm run lint,npm run buildall clean.npx jest --no-coverage: 1900/1900 tests pass. (Three test suites were SIGKILL'd by the OS in CI-like conditions; unrelated to this change.)npm run l10nneeded.Follow-ups (out of scope)
clearQueue()method similar top-limit's).