You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Surfaced during the review for #685 (the document-count throttling work). This one looks like it slipped previous reviews and deserves a deliberate look.
The document-count loop was an unbounded burst (bad for the server, fast for the user).
This LLM schema-sampling loop is fully serial (gentle on the server, but linear in the number of collections).
Neither extreme is ideal. On a database with many collections, the LLM "generate query" flow walks them one at a time before it can even start prompting the model. The user sees a long delay with no parallelism, while the server can clearly handle a handful of concurrent sampling requests (we already proved this with the count limiter at concurrency 5).
What to investigate
How many collections do real users hit this path with? (telemetry, or estimate from supported workloads)
What is the cost-per-call of getSampleDocuments? Does it scale with collection size, or is it bounded?
Should this use the shared ConcurrencyLimiter introduced in perf(tree): throttle background document-count fetches #685, or does the LLM flow need its own tuning (e.g. higher concurrency, no inter-batch delay, since this is a foreground user-initiated action)?
Is there a "fast path" for small databases where we can fan out fully, and a throttled path for large ones?
Should we expose a setting, or pick conservative defaults?
Acceptance criteria for this investigation
A short write-up of the chosen approach (issue comment or design note).
A follow-up PR (or a decision to not change anything, with rationale).
Context
Surfaced during the review for #685 (the document-count throttling work). This one looks like it slipped previous reviews and deserves a deliberate look.
Location
src/commands/llmEnhancedCommands/queryGenerationCommands.ts — the schema-sampling loop around line 199-218 iterates collections with
for...of+awaitand callsclient.getSampleDocuments(...)once per collection.The shape of the problem
This is the opposite of the bug fixed in #685:
Neither extreme is ideal. On a database with many collections, the LLM "generate query" flow walks them one at a time before it can even start prompting the model. The user sees a long delay with no parallelism, while the server can clearly handle a handful of concurrent sampling requests (we already proved this with the count limiter at concurrency 5).
What to investigate
getSampleDocuments? Does it scale with collection size, or is it bounded?ConcurrencyLimiterintroduced in perf(tree): throttle background document-count fetches #685, or does the LLM flow need its own tuning (e.g. higher concurrency, no inter-batch delay, since this is a foreground user-initiated action)?Acceptance criteria for this investigation
References
src/utils/concurrencyLimiter.tswith both per-task and per-batch delay knobs. The same primitive is reusable here.clusterIdin the count case; this flow may want a per-call-site key instead.