Skip to content

perf: investigate schema sampling fan-out in LLM query generation #686

@tnaum-ms

Description

@tnaum-ms

Context

Surfaced during the review for #685 (the document-count throttling work). This one looks like it slipped previous reviews and deserves a deliberate look.

Location

src/commands/llmEnhancedCommands/queryGenerationCommands.ts — the schema-sampling loop around line 199-218 iterates collections with for...of + await and calls client.getSampleDocuments(...) once per collection.

The shape of the problem

This is the opposite of the bug fixed in #685:

  • The document-count loop was an unbounded burst (bad for the server, fast for the user).
  • This LLM schema-sampling loop is fully serial (gentle on the server, but linear in the number of collections).

Neither extreme is ideal. On a database with many collections, the LLM "generate query" flow walks them one at a time before it can even start prompting the model. The user sees a long delay with no parallelism, while the server can clearly handle a handful of concurrent sampling requests (we already proved this with the count limiter at concurrency 5).

What to investigate

  1. How many collections do real users hit this path with? (telemetry, or estimate from supported workloads)
  2. What is the cost-per-call of getSampleDocuments? Does it scale with collection size, or is it bounded?
  3. Should this use the shared ConcurrencyLimiter introduced in perf(tree): throttle background document-count fetches #685, or does the LLM flow need its own tuning (e.g. higher concurrency, no inter-batch delay, since this is a foreground user-initiated action)?
  4. Is there a "fast path" for small databases where we can fan out fully, and a throttled path for large ones?
  5. Should we expose a setting, or pick conservative defaults?

Acceptance criteria for this investigation

  • A short write-up of the chosen approach (issue comment or design note).
  • A follow-up PR (or a decision to not change anything, with rationale).

References

  • PR perf(tree): throttle background document-count fetches #685: introduces src/utils/concurrencyLimiter.ts with both per-task and per-batch delay knobs. The same primitive is reusable here.
  • The limiter is keyed per cluster via clusterId in the count case; this flow may want a per-call-site key instead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions