feat: AI model transparency for Query Insights panel#690
Conversation
Append the Copilot internal 'copilot-utility' alias to the Query Insights AI fallback chain so the feature falls back to whichever chat model CAPI marks as is_chat_fallback when gpt-4o and gpt-4o-mini are unavailable. This keeps the AI Performance Insights flow on a model intended to be cost-neutral for GitHub Copilot subscribers.
Propagate the language model id returned by CopilotService through AIOptimizationResponse, transformAIResponseForUI, and QueryInsightsStage3Response so the webview can disclose which model actually produced the AI Performance Insights response. Also record the disclosed id under the aiModelDisclosed telemetry property on the Stage 3 router event for correlation with UI exposure.
…card Adds a small InfoRegular + Text caption beneath the action buttons in GetPerformanceInsightsCard explaining that the feature uses a utility model intended to be cost-neutral for GitHub Copilot subscribers, plus an inline Learn more link that reuses the existing onLearnMore callback. A new optional modelHint prop (defaults to 'GPT-4o') labels the model so the disclosure can adapt if the preferred model ever changes.
…dler Renders a small caption beneath the AI suggestions list once Stage 3 succeeds, surfacing the actual model id returned by CopilotService so users see when fallbacks (gpt-4o-mini, copilot-utility) kick in. The doc URL and openUrl call are extracted into a single handleLearnMore callback so the brand card button, the cost-disclosure row, and the new byline all open the same page.
Picks up the new strings introduced by the AI Performance Insights model-transparency UI ('Uses a utility model …', 'Powered by {0} via GitHub Copilot.', 'Learn more').
Refactors CopilotService.selectBestModel to emit a structured trace of the model selection process: it logs the available models from VS Code, the requested preference chain, accepted/rejected status per preferred id, and the final selection. When the Copilot vendor returns no models at all the no-models-available branch is logged as a warning so users debugging 'AI insights unavailable' see the root cause without enabling verbose logging in the LM API itself. Adds three telemetry properties (modelPreferenceChain, modelsAvailable, modelSelectionOutcome) and one measurement (modelsAvailableCount) on the copilot.sendMessage event to allow offline monitoring of how often each fallback level is hit.
Adds a CopilotTokenUsage type and computes prompt/response/context-window token counts client-side via LanguageModelChat.countTokens after each request. Counts are best-effort: failures fall back to undefined so telemetry never blocks the user flow. The usage object is propagated through OptimizationResult, AIOptimizationResponse, transformAIResponseForUI, and QueryInsightsStage3Response so the webview can surface it. Telemetry measurements (promptTokens, responseTokens, totalTokens, maxInputTokens, promptUtilizationPct) are emitted on three events for offline monitoring: copilot.sendMessage, indexOptimization (inside optimizeQuery), and the Stage 3 router.
…nd token usage Aligns the post-response byline with the pre-invocation card: prefixes the line with InfoRegular, mirrors the 'utility model intended to be cost-neutral for GitHub Copilot subscribers' wording, and appends a localised token-usage summary built from QueryInsightsStage3Response.usage. The token summary degrades gracefully across three cases (prompt + response + utilisation %, prompt + response only, prompt only) and is omitted entirely when countTokens did not return anything.
Picks up the strings added by the model-selection trace logging, token-usage measurement, and the expanded Powered-by byline ('[Copilot] Available models...', 'Used {0} prompt + {1} response tokens...', etc.).
…d and update powered-by text
Fluent v9 Link does not inherit font size from its parent Text component. Both the pre-invocation cost-neutral disclosure row in GetPerformanceInsightsCard and the post-response Powered-by byline in QueryInsightsTab rendered the 'Learn more' link at the default 14px instead of the surrounding Text size={200} 12px. Fixed by adding an explicit inline style with tokens.fontSizeBase200 and tokens.lineHeightBase200 to each Link, overriding the fui-Link class.
…nse byline
- Pre-invocation disclosure: 'No additional cost for most GitHub Copilot subscribers. Learn more about the utility model used.' (link goes to dedicated utility model doc page via aka.ms slug)
- Post-response byline split into two lines: cost-neutral disclosure (line 1) + 'Powered by {model} via GitHub Copilot' attribution (line 2)
- Add onLearnMoreUtilityModel prop to GetPerformanceInsightsCard; remove unused modelHint prop
- Add utilityModelUrl constant and handleLearnMoreUtilityModel callback in QueryInsightsTab
…st disclosure and token tracking
New user-manual page docs/user-manual/ai-utility-model.md explains what utility models are in the GitHub Copilot context, which models the extension uses and in what order, what each billing tier means for users (paid, Free, enterprise, and the June 2026 usage-based billing transition), how prompts are optimised to stay cost-neutral, and how to find the model attribution in the post-response byline. Registered in the user manual index under a new AI Features section. Links to the canonical GitHub Copilot billing and plans documentation.
990cf71 to
3ecdb79
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds transparency around the AI model used by the Query Insights “AI Performance Insights” feature and introduces cost-neutral disclosure messaging, while also enhancing diagnostic/telemetry signals for model selection and token usage.
Changes:
- Thread
modelUsed(and best-effort token usage metrics) from the Copilot service through the optimization pipeline to the webview and telemetry. - Add pre-invocation and post-response UI disclosures including a utility-model “Learn more” link and a “Powered by {modelId}” byline.
- Add diagnostics for model selection (requested/accepted/rejected) and token usage counting/tracing; extend fallback model chain to include
copilot-utility.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/webviews/documentdb/collectionView/types/queryInsights.ts | Extends Stage 3 webview response type with modelUsed and usage. |
| src/webviews/documentdb/collectionView/components/queryInsightsTab/QueryInsightsTab.tsx | Adds disclosure links/handlers and renders the post-response “Powered by” byline. |
| src/webviews/documentdb/collectionView/components/queryInsightsTab/components/optimizationCards/custom/GetPerformanceInsightsCard.tsx | Adds persistent cost-neutral disclosure row and a utility-model learn-more callback. |
| src/webviews/documentdb/collectionView/collectionViewRouter.ts | Mirrors modelUsed and token usage into Stage 3 telemetry properties/measurements. |
| src/utils/formatTokenCount.ts | New helper to compact-format token counts for trace output. |
| src/services/copilotService.ts | Adds token usage measurement, model selection tracing, metadata dumping, and telemetry fields. |
| src/services/ai/types.ts | Threads modelUsed and usage through AI optimization response types. |
| src/services/ai/QueryInsightsAIService.ts | Carries modelUsed/usage from optimization results into parsed response. |
| src/documentdb/queryInsights/transformations.ts | Surfaces modelUsed/usage into the UI transform result. |
| src/commands/llmEnhancedCommands/promptTemplates.ts | Extends fallback model chain to include copilot-utility. |
| src/commands/llmEnhancedCommands/indexAdvisorCommands.ts | Propagates token usage and records token usage measurements in telemetry. |
| l10n/bundle.l10n.json | Regenerates localization bundle with new UI/trace strings. |
| docs/user-manual/ai-utility-model.md | Adds documentation for model selection and billing/cost rationale. |
| docs/index.md | Adds the new AI documentation page to the docs index. |
| docs/ai-and-plans/PRs/690-ai-model-transparency.md | Adds internal PR notes describing rationale, pipeline, and decisions. |
…pt-4o -> copilot-utility Addresses PR #690 review Low #5. The 'We fall back gracefully' section still advertised GPT-4o -> GPT-4o-mini -> copilot-utility, but the implemented chain in promptTemplates.ts is GPT-4.1 -> GPT-4o -> copilot-utility. Update the manual so the cost-disclosure page does not ship a stale fallback policy.
|
Re: Low finding #5 (docs disagree with implemented fallback chain) — addressed in 125a234.
|
Addresses GitHub Copilot reviewer comment 3318538081 on PR #690. The JSDoc on QueryInsightsStage3Response.usage said the values are 'Surfaced in the post-response byline', but the byline component renders only the model display name. The token-usage fields are forwarded purely so the extension host can mirror them onto telemetry alongside Stage-3 properties without a second event. Rewrote the JSDoc to say so.
Addresses GitHub Copilot reviewer comment 3318538176 on PR #690. The inline comment on the cost-neutral disclosure row claimed the 'Learn more' link 'reuses onLearnMore so the doc URL stays in one place'. In fact the disclosure-row link is wired to onLearnMoreUtilityModel (the utility-model cost-disclosure page) and is intentionally separate from the feature's general onLearnMore. Rewrote the comment to reflect that the two URLs are kept separate by design.
Addresses GitHub Copilot reviewer comment 3318538227 on PR #690. The JSDoc on aiInsightsDocsUrl claimed the brand-card 'Learn more' button, the cost-disclosure row, and the byline all open the same page. They do not: only the brand-card button uses aiInsightsDocsUrl; the disclosure row uses utilityModelUrl, and the byline currently has no link. Rewrote the JSDoc to explain the two URLs are intentionally separate.
…y size limit Addresses PR #690 review Additional Low finding. The 'modelsAvailable' property previously joined every available LanguageModelChat.id verbatim with commas. With long ids like 'copilot-gpt-4o-mini-2024-07-18' and Copilot extensions surfacing 10+ models, the property routinely exceeded downstream telemetry size caps and got truncated, hiding the very data the field was meant to expose. Switch to LanguageModelChat.family (well-known short names like 'gpt-4o'), dedupe with a Set, sort for stable ordering, cap to 8 entries, and append a '+N-more' suffix when truncated so analytics can still see that the list was capped.
|
Re: Additional Low finding (unbounded The
|
Addresses PR #690 review Additional Low finding. dumpModelMetadata emitted the same static metadata block to the trace output channel on every Copilot request (per-message tokens + own-property enumeration). For a given LanguageModelChat.id the metadata is static for the lifetime of the extension host, so emit it at most once per id. Keeps the trace stream readable without losing first-seen visibility into what the runtime exposes.
|
Re: Additional Low finding (
|
Addresses PR #690 review Additional Nit finding. When selectChatModels returns an empty array because the user dismissed VS Code's one-time language-model access consent prompt, the previous error message only suggested checking Copilot install/subscription status, leading users on a wild goose chase. Add explicit mention of the consent prompt so users know to re-run the feature to re-trigger it.
|
Re: Additional Nit finding (no-model error message doesn't mention consent) — addressed in 65bb97c.
|
Addresses PR #690 review Additional Nit finding. The cancellation trace inside CopilotService used the '[Query Insights AI]' prefix used by the index-advisor caller, even though CopilotService is shared across query generation and any future AI feature. Use '[Copilot]' inside the service so the prefix is consistent with the other service- internal traces (token usage, model metadata) and so the message stays accurate when query generation or another caller is the source.
|
Re: Additional Nit finding (inconsistent trace prefix) — addressed in dfdb0a1. The cancellation trace inside |
Adds a 'Review-feedback follow-up' section to the PR design doc covering the significant changes made after the initial push to address PR #690 review findings: - CopilotResponse split into modelId / modelFamily / modelDisplayName - Per-feature model constants + featureSource telemetry plumbing - Manual softened: post-response token measurement, display-name byline prose, fallback chain update - modelsAvailable telemetry capped and deduped - dumpModelMetadata memoised per model id - Consent-aware no-model error message - Service-internal trace prefix unified to [Copilot] Comment-only / wording-only fixes are deliberately not re-listed here — they are recorded in the per-fix PR comments.
|
Re: PR description update (PRs/690-ai-model-transparency.md) — addressed in 9964d21. Added a "Review-feedback follow-up" section to the PR design doc covering the significant code/contract/doc changes from this review pass (CopilotResponse split, per-feature constants + featureSource telemetry, manual softening, fallback-chain update, modelsAvailable cap/dedupe, dumpModelMetadata memoisation, consent-aware error message, unified trace prefix). Comment-only and wording-only fixes are intentionally not re-listed in the design doc — they live in their per-fix PR comments and reviewer-thread replies. |
Picks up the two user-facing string changes from this review pass: - Updated 'No suitable language model' error message to mention the language-model access consent prompt. - Renamed the cancellation trace key from '[Query Insights AI] Copilot call cancelled during streaming' to '[Copilot] Call cancelled during streaming' to match the unified service-internal trace prefix.
Resolves PR #690 review High finding #1. Renames the preferred/fallback model surface so the unit of selection is unambiguously LanguageModelChat.family rather than LanguageModelChat.id: promptTemplates.ts: INDEX_OPTIMIZATION_PREFERRED_MODEL -> INDEX_OPTIMIZATION_PREFERRED_FAMILY INDEX_OPTIMIZATION_FALLBACK_MODELS -> INDEX_OPTIMIZATION_FALLBACK_FAMILIES QUERY_GENERATION_PREFERRED_MODEL -> QUERY_GENERATION_PREFERRED_FAMILY QUERY_GENERATION_FALLBACK_MODELS -> QUERY_GENERATION_FALLBACK_FAMILIES CopilotMessageOptions: preferredModel -> preferredFamily fallbackModels -> fallbackFamilies CopilotService: getPreferredModels -> getPreferredFamilies selectBestModel matcher: m.id === preferredId -> m.family === preferredFamily OptimizeQueryContext: preferredModel -> preferredFamily fallbackModels -> fallbackFamilies Why family, not id: - LanguageModelChat.id is documented as opaque and can change between Copilot extension versions (or carry date-stamped suffixes like 'copilot-gpt-4o-mini-2024-07-18'). - LanguageModelChat.family is the documented stable well-known name and is what the official VS Code LM API examples use. - copilot-utility safely matches via family too: verified against the Copilot Chat extension source that alias entries are registered with the alias string used as BOTH id and family. So gpt-4.1, gpt-4o, and copilot-utility all match via family with no special-case needed. Before this commit, selectBestModel matched on m.id === preferredId, so 'gpt-4.1' / 'gpt-4o' chain entries never matched (real ids are 'copilot-gpt-4.1' / 'copilot-gpt-4o') and the code silently fell through to availableModels[0]. The copilot-utility entry coincidentally matched because aliases register the alias string as the id, which masked the bug in casual testing. Warning-toast checks in indexAdvisorCommands.ts / queryGenerationCommands.ts now also compare strictly on family. The earlier defensive '|| modelId === ...' branch was removed once aliases were confirmed to register family alongside id — it could not fire in practice for any entry in our chain. Trace messages updated to say 'family' instead of 'model' in the selection log so the output channel reflects what is actually being matched. PR design doc (docs/ai-and-plans/PRs/690-ai-model-transparency.md) extended with a 'Family-based model selection' section recording the rationale and the alias-registration evidence.
|
Re: High finding #1 (family-vs-id model matching) — fix landed in 0f098aa. Took the direction recorded in the earlier research note and made family-based selection unambiguous throughout the contract:
Why family
const [model] = await vscode.lm.selectChatModels({ vendor: 'copilot', family: 'gpt-4o' });Why this is safe for Confirmed directly against the Copilot Chat extension source ( The bug this closes The old The PR design doc ( |
…annel warnings The preferred-model-not-used warning was previously surfaced via vscode.window.showWarningMessage as a notification toast in both indexAdvisorCommands.ts and queryGenerationCommands.ts. The fallback to the next available family is automatic and there is nothing the user can act on, so a popup only adds confusion. Surface the same information as ext.outputChannel.warn entries with '[Query Insights AI]' / '[Query Generation]' prefixes, including both the requested family and the actually-used family alongside the display name. The data is still captured in telemetry via modelSelectionOutcome on the shared sendMessage event for analytics follow-up.
|
Follow-up: demoted "preferred model not used" warnings to output-channel entries in b650f0b. Both Same information is now an 5-step PR checklist passes: l10n regenerated, prettier clean, lint clean, 1984/1984 jest, build clean. |
…cing section Restructures and rewrites the manual page to match the conversational second-person style of the other user-manual pages. Key changes: - Retitles page 'Model and Pricing' (was 'Model and Billing') - Removes the standalone 'What is a utility model?' intro section; the utility-model concept is now woven into 'Which model does the extension use?' as a natural first paragraph - Renames 'How we optimize prompts for the utility model' to 'How we keep prompts lean' and tightens the prose - Adds a note about fallback diagnostics going to the output channel (not a popup) in the fallback section - Updates 'Which model was actually used?' to mention the copilot-utility byline case and removes raw API property names from body text - Adds a dedicated 'Pricing' section covering: what GitHub documents about 0x multiplier models, a per-plan table, the 'most subscribers' hedge explanation, and a note on billing evolution — all sourced to official GitHub docs - Fixes header breadcrumb dash to — to match other pages docs(manual): drop technical section, remove dashes, add testing note docs(manual): restructure ai-utility-model for all AI features + pricing first docs(manual): improve clarity and formatting in AI utility model documentation
69eebd3 to
9acb245
Compare
✅ Code Quality Checks
This comment is updated automatically on each push. |
📦 Build Size Report
Download artifact · updated automatically on each push. |
Summary
Adds model transparency and cost-neutral disclosure to the Query Insights AI feature. Users now see which model processed their request and are informed upfront that this feature uses a utility model (a Copilot tier that does not count against the premium request quota).
What was changed
Model disclosure pipeline (backend → webview)
copilotService.ts:CopilotResponsenow carriesmodelUsed(theidof the selectedLanguageModelChatinstance).indexAdvisorCommands.ts→QueryInsightsAIService.ts→transformations.ts:modelUsedis threaded through each layer and surfaced to the webview viaQueryInsightsStage3Response.collectionViewRouter.ts(Stage 3 router): emitsaiModelDisclosedtelemetry property whenmodelUsedis present.Cost-neutral disclosure UI
GetPerformanceInsightsCard): a persistent info row beneath the action buttons reads "No additional cost for most GitHub Copilot subscribers. [Learn more about the utility model used.]" — shown before the user clicks, and during loading.QueryInsightsTab): after a successful Stage 3 response, two lines appear:https://aka.ms/vscode-documentdb-copilot-utility-model(Token usage tracking (trace/telemetry only, not in UI)
copilotService.ts: parallelcountTokenscalls measure prompt tokens, response tokens, total, and utilization percentage. All measurements are emitted to telemetry and written to the trace output channel (formatTokenCounthelper for compact K/M notation).CopilotTokenUsageinterface added;CopilotResponsecarries an optionalusagefield. The measurements flow all the way to theindexOptimizationtelemetry event.Model selection tracing
selectBestModel: traces each candidate model as requested / accepted / rejected so diagnostics show the full selection chain.sendMessage: recordsmodelPreferenceChain,modelsAvailable,modelSelectionOutcome, andmodelsAvailableCountin telemetry.dumpModelMetadata: traces all stable model fields (id,vendor,family,version,name,maxInputTokens) plus any additional own enumerable non-function properties — diagnostic only.Model fallback chain
promptTemplates.ts:FALLBACK_MODELSextended withcopilot-utilityas the final fallback aftergpt-4oandgpt-4o-mini.Minor fixes
Linkcomponents insideText size={200}rows now carrystyle={{ fontSize: tokens.fontSizeBase200, lineHeight: tokens.lineHeightBase200 }}to override the default 14px Fluent v9fui-Linkclass.Design decisions
Why credits used are NOT shown
GitHub Copilot's billing model assigns a credit cost per model request. The stable VS Code Language Model API (
vscode.lm) does not expose pricing or credit data. A proposed API (vscode.proposed.languageModelPricing.d.ts) exists in the VS Code source but:enabledApiProposalsinpackage.json) and is not permitted for extensions published to the Marketplace without Microsoft sign-off.Decision: credits are not surfaced in the UI. The extension stays entirely on stable VS Code APIs. If the pricing API graduates to stable in a future VS Code release, this feature can be revisited. A GitHub issue has been filed to track this. Token counts (via
countTokens, which is stable) are captured in trace/telemetry as a proxy for cost awareness without making any binding cost claim to users.Why token counts are not in the UI
countTokensgives a token count, not a cost. Displaying raw token numbers to users risks misleading them (tokens ≠ credits, and the conversion ratio is model-dependent and may change). The appropriate audience for token numbers is telemetry and diagnostic traces. Keeping them out of the UI avoids a support burden around questions like "why did this cost 2400 tokens?".Why "most GitHub Copilot subscribers" not "all"
Copilot utility model access is documented as included for subscribers, but enterprise agreements and custom billing arrangements can differ. "Most" is a deliberate hedge that is accurate without overpromising.
Why
aka.ms/vscode-documentdb-copilot-utility-modeland not the learn.microsoft.com index-advisor URLThe index-advisor docs page covers the feature end-to-end; it does not specifically explain the utility model tier or its billing implications. A dedicated
aka.msredirect allows the docs team to point this link at the most relevant GitHub Copilot pricing/model-tier page without a code change.Commits (newest first)
0a0122662f481b8c43b260b471c5aaeac41e9d2b7b822b688fa152fe7c077e8c54b3d32eaa3791d5272d21b139006b5b67cd33490477b4e39046c857Pre-merge checklist
https://aka.ms/vscode-documentdb-copilot-utility-modelin the Microsoft URL shortenerwip:commit (7b822b68) before merge