perf: investigate schema sampling fan-out in LLM query generation

## Context

Surfaced during the review for #685 (the document-count throttling work). This one looks like it slipped previous reviews and deserves a deliberate look.

## Location

[src/commands/llmEnhancedCommands/queryGenerationCommands.ts](https://github.com/microsoft/vscode-documentdb/blob/main/src/commands/llmEnhancedCommands/queryGenerationCommands.ts) — the schema-sampling loop around line 199-218 iterates collections with `for...of` + `await` and calls `client.getSampleDocuments(...)` once per collection.

## The shape of the problem

This is the *opposite* of the bug fixed in #685:

- The document-count loop was an unbounded burst (bad for the server, fast for the user).
- This LLM schema-sampling loop is fully serial (gentle on the server, but linear in the number of collections).

Neither extreme is ideal. On a database with many collections, the LLM "generate query" flow walks them one at a time before it can even start prompting the model. The user sees a long delay with no parallelism, while the server can clearly handle a handful of concurrent sampling requests (we already proved this with the count limiter at concurrency 5).

## What to investigate

1. How many collections do real users hit this path with? (telemetry, or estimate from supported workloads)
2. What is the cost-per-call of `getSampleDocuments`? Does it scale with collection size, or is it bounded?
3. Should this use the shared `ConcurrencyLimiter` introduced in #685, or does the LLM flow need its own tuning (e.g. higher concurrency, no inter-batch delay, since this is a foreground user-initiated action)?
4. Is there a "fast path" for small databases where we can fan out fully, and a throttled path for large ones?
5. Should we expose a setting, or pick conservative defaults?

## Acceptance criteria for this investigation

- A short write-up of the chosen approach (issue comment or design note).
- A follow-up PR (or a decision to not change anything, with rationale).

## References

- PR #685: introduces `src/utils/concurrencyLimiter.ts` with both per-task and per-batch delay knobs. The same primitive is reusable here.
- The limiter is keyed per cluster via `clusterId` in the count case; this flow may want a per-call-site key instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: investigate schema sampling fan-out in LLM query generation #686

Context

Location

The shape of the problem

What to investigate

Acceptance criteria for this investigation

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf: investigate schema sampling fan-out in LLM query generation #686

Description

Context

Location

The shape of the problem

What to investigate

Acceptance criteria for this investigation

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions