Skip to content

feat: AI model transparency for Query Insights panel#690

Merged
tnaum-ms merged 38 commits into
mainfrom
dev/tnaum/query-insights-model-transparency
May 29, 2026
Merged

feat: AI model transparency for Query Insights panel#690
tnaum-ms merged 38 commits into
mainfrom
dev/tnaum/query-insights-model-transparency

Conversation

@tnaum-ms

Copy link
Copy Markdown
Collaborator

Summary

Adds model transparency and cost-neutral disclosure to the Query Insights AI feature. Users now see which model processed their request and are informed upfront that this feature uses a utility model (a Copilot tier that does not count against the premium request quota).


What was changed

Model disclosure pipeline (backend → webview)

  • copilotService.ts: CopilotResponse now carries modelUsed (the id of the selected LanguageModelChat instance).
  • indexAdvisorCommands.tsQueryInsightsAIService.tstransformations.ts: modelUsed is threaded through each layer and surfaced to the webview via QueryInsightsStage3Response.
  • collectionViewRouter.ts (Stage 3 router): emits aiModelDisclosed telemetry property when modelUsed is present.

Cost-neutral disclosure UI

  • Pre-invocation card (GetPerformanceInsightsCard): a persistent info row beneath the action buttons reads "No additional cost for most GitHub Copilot subscribers. [Learn more about the utility model used.]" — shown before the user clicks, and during loading.
  • Post-response byline (QueryInsightsTab): after a successful Stage 3 response, two lines appear:
    1. "No additional cost for most GitHub Copilot subscribers. [Learn more about the utility model used.]"
    2. "Powered by {modelId} via GitHub Copilot" — concrete model attribution.
  • Both "Learn more" links open https://aka.ms/vscode-documentdb-copilot-utility-model (⚠️ slug must be registered before shipping).

Token usage tracking (trace/telemetry only, not in UI)

  • copilotService.ts: parallel countTokens calls measure prompt tokens, response tokens, total, and utilization percentage. All measurements are emitted to telemetry and written to the trace output channel (formatTokenCount helper for compact K/M notation).
  • CopilotTokenUsage interface added; CopilotResponse carries an optional usage field. The measurements flow all the way to the indexOptimization telemetry event.
  • Token counts are intentionally not rendered in the UI (see design decisions below).

Model selection tracing

  • selectBestModel: traces each candidate model as requested / accepted / rejected so diagnostics show the full selection chain.
  • sendMessage: records modelPreferenceChain, modelsAvailable, modelSelectionOutcome, and modelsAvailableCount in telemetry.
  • dumpModelMetadata: traces all stable model fields (id, vendor, family, version, name, maxInputTokens) plus any additional own enumerable non-function properties — diagnostic only.

Model fallback chain

  • promptTemplates.ts: FALLBACK_MODELS extended with copilot-utility as the final fallback after gpt-4o and gpt-4o-mini.

Minor fixes

  • Link components inside Text size={200} rows now carry style={{ fontSize: tokens.fontSizeBase200, lineHeight: tokens.lineHeightBase200 }} to override the default 14px Fluent v9 fui-Link class.

Design decisions

Why credits used are NOT shown

GitHub Copilot's billing model assigns a credit cost per model request. The stable VS Code Language Model API (vscode.lm) does not expose pricing or credit data. A proposed API (vscode.proposed.languageModelPricing.d.ts) exists in the VS Code source but:

  • It is a proposed (pre-release) API — subject to breaking changes without notice.
  • Shipping an extension that depends on proposed APIs requires special opt-in (enabledApiProposals in package.json) and is not permitted for extensions published to the Marketplace without Microsoft sign-off.

Decision: credits are not surfaced in the UI. The extension stays entirely on stable VS Code APIs. If the pricing API graduates to stable in a future VS Code release, this feature can be revisited. A GitHub issue has been filed to track this. Token counts (via countTokens, which is stable) are captured in trace/telemetry as a proxy for cost awareness without making any binding cost claim to users.

Why token counts are not in the UI

countTokens gives a token count, not a cost. Displaying raw token numbers to users risks misleading them (tokens ≠ credits, and the conversion ratio is model-dependent and may change). The appropriate audience for token numbers is telemetry and diagnostic traces. Keeping them out of the UI avoids a support burden around questions like "why did this cost 2400 tokens?".

Why "most GitHub Copilot subscribers" not "all"

Copilot utility model access is documented as included for subscribers, but enterprise agreements and custom billing arrangements can differ. "Most" is a deliberate hedge that is accurate without overpromising.

Why aka.ms/vscode-documentdb-copilot-utility-model and not the learn.microsoft.com index-advisor URL

The index-advisor docs page covers the feature end-to-end; it does not specifically explain the utility model tier or its billing implications. A dedicated aka.ms redirect allows the docs team to point this link at the most relevant GitHub Copilot pricing/model-tier page without a code change.


Commits (newest first)

Hash Message
0a012266 chore: regenerate l10n bundle
2f481b8c feat(ui): refine cost-neutral disclosure wording and split post-response byline
43b260b4 chore: regenerate l10n bundle
71c5aaea fix(ui): match Link font size to parent Text size in disclosure rows
c41e9d2b feat: enhance token usage tracking and model metadata logging in Copilot service
7b822b68 wip: feat(ui): add Utility Model badge to AI Performance Insights card and update powered-by text
8fa152fe chore: regenerate l10n bundle
7c077e8c feat(ui): expand Powered-by byline with icon, cost-neutral wording, and token usage
54b3d32e feat: capture token usage from Copilot responses and emit measurements
aa3791d5 feat: log AI model selection chain (requested/accepted/rejected)
272d21b1 chore: regenerate l10n bundle
39006b5b feat(ui): add post-response Powered-by byline + shared learn-more handler
67cd3349 feat(ui): add cost-neutral disclosure row to AI Performance Insights card
0477b4e3 feat: surface AI model id to Query Insights webview
9046c857 chore: add copilot-utility as final AI fallback model

Pre-merge checklist

  • Register https://aka.ms/vscode-documentdb-copilot-utility-model in the Microsoft URL shortener
  • Verify disclosure wording with GitHub Copilot billing docs team
  • Squash the wip: commit (7b822b68) before merge
  • Run full CI

tnaum-ms added 15 commits May 28, 2026 11:14
Append the Copilot internal 'copilot-utility' alias to the Query Insights AI fallback chain so the feature falls back to whichever chat model CAPI marks as is_chat_fallback when gpt-4o and gpt-4o-mini are unavailable. This keeps the AI Performance Insights flow on a model intended to be cost-neutral for GitHub Copilot subscribers.
Propagate the language model id returned by CopilotService through AIOptimizationResponse, transformAIResponseForUI, and QueryInsightsStage3Response so the webview can disclose which model actually produced the AI Performance Insights response. Also record the disclosed id under the aiModelDisclosed telemetry property on the Stage 3 router event for correlation with UI exposure.
…card

Adds a small InfoRegular + Text caption beneath the action buttons in GetPerformanceInsightsCard explaining that the feature uses a utility model intended to be cost-neutral for GitHub Copilot subscribers, plus an inline Learn more link that reuses the existing onLearnMore callback. A new optional modelHint prop (defaults to 'GPT-4o') labels the model so the disclosure can adapt if the preferred model ever changes.
…dler

Renders a small caption beneath the AI suggestions list once Stage 3 succeeds, surfacing the actual model id returned by CopilotService so users see when fallbacks (gpt-4o-mini, copilot-utility) kick in. The doc URL and openUrl call are extracted into a single handleLearnMore callback so the brand card button, the cost-disclosure row, and the new byline all open the same page.
Picks up the new strings introduced by the AI Performance Insights model-transparency UI ('Uses a utility model …', 'Powered by {0} via GitHub Copilot.', 'Learn more').
Refactors CopilotService.selectBestModel to emit a structured trace of the model selection process: it logs the available models from VS Code, the requested preference chain, accepted/rejected status per preferred id, and the final selection. When the Copilot vendor returns no models at all the no-models-available branch is logged as a warning so users debugging 'AI insights unavailable' see the root cause without enabling verbose logging in the LM API itself. Adds three telemetry properties (modelPreferenceChain, modelsAvailable, modelSelectionOutcome) and one measurement (modelsAvailableCount) on the copilot.sendMessage event to allow offline monitoring of how often each fallback level is hit.
Adds a CopilotTokenUsage type and computes prompt/response/context-window token counts client-side via LanguageModelChat.countTokens after each request. Counts are best-effort: failures fall back to undefined so telemetry never blocks the user flow. The usage object is propagated through OptimizationResult, AIOptimizationResponse, transformAIResponseForUI, and QueryInsightsStage3Response so the webview can surface it. Telemetry measurements (promptTokens, responseTokens, totalTokens, maxInputTokens, promptUtilizationPct) are emitted on three events for offline monitoring: copilot.sendMessage, indexOptimization (inside optimizeQuery), and the Stage 3 router.
…nd token usage

Aligns the post-response byline with the pre-invocation card: prefixes the line with InfoRegular, mirrors the 'utility model intended to be cost-neutral for GitHub Copilot subscribers' wording, and appends a localised token-usage summary built from QueryInsightsStage3Response.usage. The token summary degrades gracefully across three cases (prompt + response + utilisation %, prompt + response only, prompt only) and is omitted entirely when countTokens did not return anything.
Picks up the strings added by the model-selection trace logging, token-usage measurement, and the expanded Powered-by byline ('[Copilot] Available models...', 'Used {0} prompt + {1} response tokens...', etc.).
Fluent v9 Link does not inherit font size from its parent Text component. Both the pre-invocation cost-neutral disclosure row in GetPerformanceInsightsCard and the post-response Powered-by byline in QueryInsightsTab rendered the 'Learn more' link at the default 14px instead of the surrounding Text size={200} 12px. Fixed by adding an explicit inline style with tokens.fontSizeBase200 and tokens.lineHeightBase200 to each Link, overriding the fui-Link class.
…nse byline

- Pre-invocation disclosure: 'No additional cost for most GitHub Copilot subscribers. Learn more about the utility model used.' (link goes to dedicated utility model doc page via aka.ms slug)
- Post-response byline split into two lines: cost-neutral disclosure (line 1) + 'Powered by {model} via GitHub Copilot' attribution (line 2)
- Add onLearnMoreUtilityModel prop to GetPerformanceInsightsCard; remove unused modelHint prop
- Add utilityModelUrl constant and handleLearnMoreUtilityModel callback in QueryInsightsTab
@tnaum-ms tnaum-ms linked an issue May 28, 2026 that may be closed by this pull request
4 tasks
@tnaum-ms tnaum-ms added this to the 0.8.1 milestone May 28, 2026
tnaum-ms added 3 commits May 28, 2026 14:19
New user-manual page docs/user-manual/ai-utility-model.md explains what utility models are in the GitHub Copilot context, which models the extension uses and in what order, what each billing tier means for users (paid, Free, enterprise, and the June 2026 usage-based billing transition), how prompts are optimised to stay cost-neutral, and how to find the model attribution in the post-response byline. Registered in the user manual index under a new AI Features section. Links to the canonical GitHub Copilot billing and plans documentation.
@tnaum-ms tnaum-ms force-pushed the dev/tnaum/query-insights-model-transparency branch from 990cf71 to 3ecdb79 Compare May 28, 2026 14:29
@tnaum-ms tnaum-ms marked this pull request as ready for review May 28, 2026 14:30
@tnaum-ms tnaum-ms requested a review from a team as a code owner May 28, 2026 14:30
Copilot AI review requested due to automatic review settings May 28, 2026 14:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds transparency around the AI model used by the Query Insights “AI Performance Insights” feature and introduces cost-neutral disclosure messaging, while also enhancing diagnostic/telemetry signals for model selection and token usage.

Changes:

  • Thread modelUsed (and best-effort token usage metrics) from the Copilot service through the optimization pipeline to the webview and telemetry.
  • Add pre-invocation and post-response UI disclosures including a utility-model “Learn more” link and a “Powered by {modelId}” byline.
  • Add diagnostics for model selection (requested/accepted/rejected) and token usage counting/tracing; extend fallback model chain to include copilot-utility.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/webviews/documentdb/collectionView/types/queryInsights.ts Extends Stage 3 webview response type with modelUsed and usage.
src/webviews/documentdb/collectionView/components/queryInsightsTab/QueryInsightsTab.tsx Adds disclosure links/handlers and renders the post-response “Powered by” byline.
src/webviews/documentdb/collectionView/components/queryInsightsTab/components/optimizationCards/custom/GetPerformanceInsightsCard.tsx Adds persistent cost-neutral disclosure row and a utility-model learn-more callback.
src/webviews/documentdb/collectionView/collectionViewRouter.ts Mirrors modelUsed and token usage into Stage 3 telemetry properties/measurements.
src/utils/formatTokenCount.ts New helper to compact-format token counts for trace output.
src/services/copilotService.ts Adds token usage measurement, model selection tracing, metadata dumping, and telemetry fields.
src/services/ai/types.ts Threads modelUsed and usage through AI optimization response types.
src/services/ai/QueryInsightsAIService.ts Carries modelUsed/usage from optimization results into parsed response.
src/documentdb/queryInsights/transformations.ts Surfaces modelUsed/usage into the UI transform result.
src/commands/llmEnhancedCommands/promptTemplates.ts Extends fallback model chain to include copilot-utility.
src/commands/llmEnhancedCommands/indexAdvisorCommands.ts Propagates token usage and records token usage measurements in telemetry.
l10n/bundle.l10n.json Regenerates localization bundle with new UI/trace strings.
docs/user-manual/ai-utility-model.md Adds documentation for model selection and billing/cost rationale.
docs/index.md Adds the new AI documentation page to the docs index.
docs/ai-and-plans/PRs/690-ai-model-transparency.md Adds internal PR notes describing rationale, pipeline, and decisions.

Comment thread src/webviews/documentdb/collectionView/types/queryInsights.ts
Comment thread docs/user-manual/ai-utility-model.md Outdated
…pt-4o -> copilot-utility

Addresses PR #690 review Low #5. The 'We fall back gracefully' section
still advertised GPT-4o -> GPT-4o-mini -> copilot-utility, but the
implemented chain in promptTemplates.ts is GPT-4.1 -> GPT-4o ->
copilot-utility. Update the manual so the cost-disclosure page does not
ship a stale fallback policy.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: Low finding #5 (docs disagree with implemented fallback chain) — addressed in 125a234.

docs/user-manual/ai-utility-model.md advertised GPT-4o → GPT-4o-mini → copilot-utility, but the implemented chain in promptTemplates.ts is gpt-4.1 → gpt-4o → copilot-utility. Updated the cost-disclosure page to match the code so we don't ship a stale fallback policy on the page users are most likely to consult.

tnaum-ms added 4 commits May 28, 2026 17:50
Addresses GitHub Copilot reviewer comment 3318538081 on PR #690. The
JSDoc on QueryInsightsStage3Response.usage said the values are
'Surfaced in the post-response byline', but the byline component renders
only the model display name. The token-usage fields are forwarded purely
so the extension host can mirror them onto telemetry alongside Stage-3
properties without a second event. Rewrote the JSDoc to say so.
Addresses GitHub Copilot reviewer comment 3318538176 on PR #690. The
inline comment on the cost-neutral disclosure row claimed the 'Learn
more' link 'reuses onLearnMore so the doc URL stays in one place'. In
fact the disclosure-row link is wired to onLearnMoreUtilityModel (the
utility-model cost-disclosure page) and is intentionally separate from
the feature's general onLearnMore. Rewrote the comment to reflect that
the two URLs are kept separate by design.
Addresses GitHub Copilot reviewer comment 3318538227 on PR #690. The
JSDoc on aiInsightsDocsUrl claimed the brand-card 'Learn more' button,
the cost-disclosure row, and the byline all open the same page. They do
not: only the brand-card button uses aiInsightsDocsUrl; the disclosure
row uses utilityModelUrl, and the byline currently has no link. Rewrote
the JSDoc to explain the two URLs are intentionally separate.
…y size limit

Addresses PR #690 review Additional Low finding. The
'modelsAvailable' property previously joined every available
LanguageModelChat.id verbatim with commas. With long ids like
'copilot-gpt-4o-mini-2024-07-18' and Copilot extensions surfacing 10+
models, the property routinely exceeded downstream telemetry size caps
and got truncated, hiding the very data the field was meant to expose.

Switch to LanguageModelChat.family (well-known short names like
'gpt-4o'), dedupe with a Set, sort for stable ordering, cap to 8
entries, and append a '+N-more' suffix when truncated so analytics can
still see that the list was capped.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: Additional Low finding (unbounded modelsAvailable telemetry property) — addressed in 4e7c17d.

The modelsAvailable property previously joined every LanguageModelChat.id verbatim with commas. With long ids like copilot-gpt-4o-mini-2024-07-18 and Copilot extensions returning 10+ models, the property routinely exceeded downstream telemetry property-size caps and got truncated, hiding the very data the field was meant to expose. Now:

  • Use LanguageModelChat.family (short well-known names like gpt-4o) instead of opaque ids.
  • Dedupe via Set, sort for stable ordering.
  • Cap at 8 entries; append a +N-more suffix when truncated so analytics still sees the list was capped.
  • Keep modelsAvailableCount as a measurement for the true count.

Addresses PR #690 review Additional Low finding. dumpModelMetadata
emitted the same static metadata block to the trace output channel on
every Copilot request (per-message tokens + own-property enumeration).
For a given LanguageModelChat.id the metadata is static for the
lifetime of the extension host, so emit it at most once per id. Keeps
the trace stream readable without losing first-seen visibility into
what the runtime exposes.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: Additional Low finding (dumpModelMetadata runs on every request) — addressed in 5f51b3d.

dumpModelMetadata emitted the same static stable-fields block and own-property enumeration to the trace output channel on every Copilot request. For a given LanguageModelChat.id the metadata is static for the extension-host lifetime, so it's now memoised: the first time we see a new id we dump it; subsequent calls early-return. Keeps the trace stream readable without losing first-seen visibility.

Addresses PR #690 review Additional Nit finding. When
selectChatModels returns an empty array because the user dismissed
VS Code's one-time language-model access consent prompt, the previous
error message only suggested checking Copilot install/subscription
status, leading users on a wild goose chase. Add explicit mention of
the consent prompt so users know to re-run the feature to re-trigger
it.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: Additional Nit finding (no-model error message doesn't mention consent) — addressed in 65bb97c.

selectChatModels returns an empty array when the user has dismissed VS Code's one-time language-model access consent prompt. The previous error message only nudged users toward checking Copilot install/subscription, leading them on a wild goose chase. The message now explicitly mentions the consent prompt and notes that re-running the feature will re-trigger it.

Addresses PR #690 review Additional Nit finding. The cancellation
trace inside CopilotService used the '[Query Insights AI]' prefix used
by the index-advisor caller, even though CopilotService is shared
across query generation and any future AI feature. Use '[Copilot]'
inside the service so the prefix is consistent with the other service-
internal traces (token usage, model metadata) and so the message stays
accurate when query generation or another caller is the source.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: Additional Nit finding (inconsistent trace prefix) — addressed in dfdb0a1.

The cancellation trace inside CopilotService used [Query Insights AI], the index-advisor caller's prefix, even though the service is shared across query generation and any future AI feature. Switched the service-internal trace to [Copilot] so it's consistent with the other service-internal traces (token usage, model metadata) and stays accurate regardless of which feature triggered the request. Caller-side [Query Insights AI] / [Query Generation] prefixes are unchanged.

Adds a 'Review-feedback follow-up' section to the PR design doc
covering the significant changes made after the initial push to
address PR #690 review findings:

- CopilotResponse split into modelId / modelFamily / modelDisplayName
- Per-feature model constants + featureSource telemetry plumbing
- Manual softened: post-response token measurement, display-name
  byline prose, fallback chain update
- modelsAvailable telemetry capped and deduped
- dumpModelMetadata memoised per model id
- Consent-aware no-model error message
- Service-internal trace prefix unified to [Copilot]

Comment-only / wording-only fixes are deliberately not re-listed
here — they are recorded in the per-fix PR comments.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: PR description update (PRs/690-ai-model-transparency.md) — addressed in 9964d21.

Added a "Review-feedback follow-up" section to the PR design doc covering the significant code/contract/doc changes from this review pass (CopilotResponse split, per-feature constants + featureSource telemetry, manual softening, fallback-chain update, modelsAvailable cap/dedupe, dumpModelMetadata memoisation, consent-aware error message, unified trace prefix). Comment-only and wording-only fixes are intentionally not re-listed in the design doc — they live in their per-fix PR comments and reviewer-thread replies.

tnaum-ms added 2 commits May 28, 2026 17:58
Picks up the two user-facing string changes from this review pass:

- Updated 'No suitable language model' error message to mention the
  language-model access consent prompt.
- Renamed the cancellation trace key from
  '[Query Insights AI] Copilot call cancelled during streaming' to
  '[Copilot] Call cancelled during streaming' to match the unified
  service-internal trace prefix.
Resolves PR #690 review High finding #1.

Renames the preferred/fallback model surface so the unit of selection is
unambiguously LanguageModelChat.family rather than LanguageModelChat.id:

  promptTemplates.ts:
    INDEX_OPTIMIZATION_PREFERRED_MODEL    -> INDEX_OPTIMIZATION_PREFERRED_FAMILY
    INDEX_OPTIMIZATION_FALLBACK_MODELS    -> INDEX_OPTIMIZATION_FALLBACK_FAMILIES
    QUERY_GENERATION_PREFERRED_MODEL      -> QUERY_GENERATION_PREFERRED_FAMILY
    QUERY_GENERATION_FALLBACK_MODELS      -> QUERY_GENERATION_FALLBACK_FAMILIES

  CopilotMessageOptions:
    preferredModel  -> preferredFamily
    fallbackModels  -> fallbackFamilies

  CopilotService:
    getPreferredModels -> getPreferredFamilies
    selectBestModel matcher: m.id === preferredId -> m.family === preferredFamily

  OptimizeQueryContext:
    preferredModel  -> preferredFamily
    fallbackModels  -> fallbackFamilies

Why family, not id:

  - LanguageModelChat.id is documented as opaque and can change between
    Copilot extension versions (or carry date-stamped suffixes like
    'copilot-gpt-4o-mini-2024-07-18').
  - LanguageModelChat.family is the documented stable well-known name and
    is what the official VS Code LM API examples use.
  - copilot-utility safely matches via family too: verified against the
    Copilot Chat extension source that alias entries are registered with
    the alias string used as BOTH id and family. So gpt-4.1, gpt-4o, and
    copilot-utility all match via family with no special-case needed.

Before this commit, selectBestModel matched on m.id === preferredId, so
'gpt-4.1' / 'gpt-4o' chain entries never matched (real ids are
'copilot-gpt-4.1' / 'copilot-gpt-4o') and the code silently fell through
to availableModels[0]. The copilot-utility entry coincidentally matched
because aliases register the alias string as the id, which masked the
bug in casual testing.

Warning-toast checks in indexAdvisorCommands.ts / queryGenerationCommands.ts
now also compare strictly on family. The earlier defensive '|| modelId ===
...' branch was removed once aliases were confirmed to register family
alongside id — it could not fire in practice for any entry in our chain.

Trace messages updated to say 'family' instead of 'model' in the selection
log so the output channel reflects what is actually being matched.

PR design doc (docs/ai-and-plans/PRs/690-ai-model-transparency.md) extended
with a 'Family-based model selection' section recording the rationale and
the alias-registration evidence.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Re: High finding #1 (family-vs-id model matching) — fix landed in 0f098aa.

Took the direction recorded in the earlier research note and made family-based selection unambiguous throughout the contract:

  • promptTemplates.ts:
    • INDEX_OPTIMIZATION_PREFERRED_MODELINDEX_OPTIMIZATION_PREFERRED_FAMILY
    • INDEX_OPTIMIZATION_FALLBACK_MODELSINDEX_OPTIMIZATION_FALLBACK_FAMILIES
    • QUERY_GENERATION_PREFERRED_MODELQUERY_GENERATION_PREFERRED_FAMILY
    • QUERY_GENERATION_FALLBACK_MODELSQUERY_GENERATION_FALLBACK_FAMILIES
  • CopilotMessageOptions.preferredModelpreferredFamily; fallbackModelsfallbackFamilies.
  • CopilotService.selectBestModel now matches m.family === preferredFamily. The id-based matcher is gone.
  • Warning-toast checks in indexAdvisorCommands.ts / queryGenerationCommands.ts also compare strictly on family — the earlier defensive || modelId === ... branch was dropped once aliases were confirmed to register family alongside id (it could not fire in practice for any entry in our chain).
  • Trace messages updated to say family instead of model in the selection log so the output channel reflects what is actually being matched.

Why family

LanguageModelChat.id is documented as opaque and can change between Copilot extension versions or carry date-stamped suffixes like copilot-gpt-4o-mini-2024-07-18. LanguageModelChat.family is the well-known stable name and is what the official VS Code LM API examples use:

const [model] = await vscode.lm.selectChatModels({ vendor: 'copilot', family: 'gpt-4o' });

Why this is safe for copilot-utility

Confirmed directly against the Copilot Chat extension source (microsoft/vscode-copilot-chat, src/extension/conversation/vscode-node/languageModelAccess.ts): alias entries are registered with the alias string used as both id and family. So gpt-4.1, gpt-4o, and copilot-utility all match via family with no special-case needed.

The bug this closes

The old m.id === preferredId matcher never matched gpt-4.1 / gpt-4o (real ids are copilot-gpt-4.1 / copilot-gpt-4o) and silently fell through to availableModels[0]. The copilot-utility entry coincidentally matched because aliases register the alias string as the id, which masked the bug in casual testing.

The PR design doc (docs/ai-and-plans/PRs/690-ai-model-transparency.md) now has a Family-based model selection section that records the rationale and the alias-registration evidence for future reference. The 5-step PR checklist passes (l10n regenerated, prettier clean, lint clean, 1984/1984 jest, build clean).

…annel warnings

The preferred-model-not-used warning was previously surfaced via
vscode.window.showWarningMessage as a notification toast in both
indexAdvisorCommands.ts and queryGenerationCommands.ts. The fallback
to the next available family is automatic and there is nothing the
user can act on, so a popup only adds confusion.

Surface the same information as ext.outputChannel.warn entries with
'[Query Insights AI]' / '[Query Generation]' prefixes, including
both the requested family and the actually-used family alongside the
display name. The data is still captured in telemetry via
modelSelectionOutcome on the shared sendMessage event for analytics
follow-up.
@tnaum-ms

Copy link
Copy Markdown
Collaborator Author

Follow-up: demoted "preferred model not used" warnings to output-channel entries in b650f0b.

Both indexAdvisorCommands.ts and queryGenerationCommands.ts previously surfaced the "not using preferred model" warning as a vscode.window.showWarningMessage notification toast. The fallback to the next available family is fully automatic and there is nothing the user can act on, so the popup was adding confusion rather than value.

Same information is now an ext.outputChannel.warn line (with the existing [Query Insights AI] / [Query Generation] prefixes), recording both the requested family and the actually-used family alongside the display name. The data is still captured in telemetry via modelSelectionOutcome on the shared sendMessage event for analytics follow-up.

5-step PR checklist passes: l10n regenerated, prettier clean, lint clean, 1984/1984 jest, build clean.

…cing section

Restructures and rewrites the manual page to match the conversational
second-person style of the other user-manual pages. Key changes:

- Retitles page 'Model and Pricing' (was 'Model and Billing')
- Removes the standalone 'What is a utility model?' intro section;
  the utility-model concept is now woven into 'Which model does the
  extension use?' as a natural first paragraph
- Renames 'How we optimize prompts for the utility model' to 'How we
  keep prompts lean' and tightens the prose
- Adds a note about fallback diagnostics going to the output channel
  (not a popup) in the fallback section
- Updates 'Which model was actually used?' to mention the copilot-utility
  byline case and removes raw API property names from body text
- Adds a dedicated 'Pricing' section covering: what GitHub documents
  about 0x multiplier models, a per-plan table, the 'most subscribers'
  hedge explanation, and a note on billing evolution — all sourced to
  official GitHub docs
- Fixes header breadcrumb dash to — to match other pages

docs(manual): drop technical section, remove dashes, add testing note

docs(manual): restructure ai-utility-model for all AI features + pricing first

docs(manual): improve clarity and formatting in AI utility model documentation
@tnaum-ms tnaum-ms force-pushed the dev/tnaum/query-insights-model-transparency branch from 69eebd3 to 9acb245 Compare May 29, 2026 07:58
@github-actions

Copy link
Copy Markdown
Contributor

✅ Code Quality Checks

Check Status How to fix
Localization (l10n) ✅ Passed
ESLint ✅ Passed
Prettier formatting ✅ Passed

This comment is updated automatically on each push.

@github-actions

Copy link
Copy Markdown
Contributor

📦 Build Size Report

Metric Base (main) PR Delta
VSIX (vscode-documentdb-0.8.0.vsix) 7.53 MB 7.54 MB ⬆️ +2 KB (+0.0%)
Webview bundle (views.js) 5.88 MB 5.88 MB ⬆️ +1 KB (+0.0%)

Download artifact · updated automatically on each push.

@tnaum-ms tnaum-ms merged commit 413dbcd into main May 29, 2026
8 checks passed
@tnaum-ms tnaum-ms deleted the dev/tnaum/query-insights-model-transparency branch May 29, 2026 08:07
@tnaum-ms tnaum-ms linked an issue May 29, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: show which LLM model is used by AI features in the UI Track LLM token usage

3 participants