From 45b6aae1ab47b8ef039e1d509f64ee5c30abce3b Mon Sep 17 00:00:00 2001 From: zhihuan Date: Tue, 9 Jun 2026 18:07:00 +0800 Subject: [PATCH 1/6] docs(spec): add Bicep-less Foundry agent init design spec MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Design spec for RFC #8065 — make `azd ai agent init` Bicep-less by default with the `azure.ai.agents` extension owning provisioning. Adopts the compromise of explicit `infra.provider: azure.ai.agents` in azure.yaml (per PR #7482's custom provisioning provider framework), deferring service-host-driven auto-routing to v0.3+. Covers in-memory synthesis, on-disk reuse after eject, brownfield `resourceId:` flow, 5-step validation pipeline, and the small Core changes required to surface `uses` / `runtime` on the extension-facing `ServiceConfig`. --- docs/specs/bicepless-foundry/spec.md | 503 +++++++++++++++++++++++++++ 1 file changed, 503 insertions(+) create mode 100644 docs/specs/bicepless-foundry/spec.md diff --git a/docs/specs/bicepless-foundry/spec.md b/docs/specs/bicepless-foundry/spec.md new file mode 100644 index 00000000000..aec12c2f5f5 --- /dev/null +++ b/docs/specs/bicepless-foundry/spec.md @@ -0,0 +1,503 @@ +# Bicep-less `azd ai agent init` via Extension-Owned Provisioning + +## Problem + +`azd ai agent init` clones `Azure-Samples/azd-ai-starter-basic` into every new +project, dropping ~300 lines of conditional Bicep (`shouldCreateAcr`, +`useExistingAiProject ? X : Y`, 30+ outputs) on the developer's disk before they +write any agent code. The starter Bicep lives in a sample repo on `main`, so +every shipped extension version reads from the same template — slimming or +tailoring the template breaks every project initialized by every prior extension +build. The on-disk template is bloated; there is no in-place fix. + +See RFC [#8065](https://github.com/Azure/azure-dev/issues/8065) for the full +problem statement. + +## Solution + +Move infrastructure templates from `Azure-Samples/azd-ai-starter-basic` into +the `azure.ai.agents` extension binary. `azd ai agent init` produces only +`azure.yaml` and an agent code project — no `infra/` directory. At provision +time, the extension's own provisioning provider synthesizes Bicep in memory +from `azure.yaml` and applies it. `azd ai agent init --infra` ejects on demand: +the same synthesis writes Bicep to `./infra/`, and subsequent provisions read +from disk. + +The mechanism is the **custom provisioning provider** capability merged in +[PR #7482](https://github.com/Azure/azure-dev/pull/7482). The extension +registers itself by name; the developer declares it in `azure.yaml` as +`infra.provider: azure.ai.agents`. + +## Scope + +**In scope:** + +- Bicep-less default behavior for `azd ai agent init` +- `azd ai agent init --infra` eject command +- Embedded templates inside the `azure.ai.agents` extension +- Retiring `Azure-Samples/azd-ai-starter-basic` as the init target +- Schema updates to allow extension-named providers in `infra.provider` + +**Out of scope:** + +- Unified `azure.yaml` schema for `azure.ai.project` / `azure.ai.agent` host + kinds — [#7962](https://github.com/Azure/azure-dev/issues/7962) +- `azd ai agent add` and incremental composition — + [#8049](https://github.com/Azure/azure-dev/issues/8049) +- Service-host-driven provider auto-routing (removes the explicit + `infra.provider:` declaration) — RFC #8065 Core Ask #1 + +## Activation + +| Trigger | Behavior | +| --------------------------------------------- | ---------------------------------------------------------------------------------------------- | +| `azd ai agent init` (default) | Write `azure.yaml` + agent code project. No `./infra/`. `azure.yaml` includes `infra.provider: azure.ai.agents`. | +| `azd ai agent init --infra` | Same as default, plus synthesize and write Bicep to `./infra/`. Project starts on-disk. | +| `azd ai agent init --infra` (existing project, no `./infra/`) | Synthesize current `azure.yaml` and write `./infra/`. Do not re-prompt or touch agent code. Refuse if `./infra/` already exists. | +| `azd provision` (no `./infra/`) | Extension synthesizes Bicep in memory, applies via ARM SDK. | +| `azd provision` (with `./infra/`) | Extension reads from `./infra/` instead of synthesizing. Same ARM-side output. | + +## Architecture + +``` +cli/azd/extensions/azure.ai.agents/ + internal/cmd/init.go ← gen azure.yaml; gen --infra path + internal/cmd/listen.go ← register provider via WithProvisioningProvider + internal/project/provisioning.go ← FoundryProvisioningProvider implementation + internal/synthesis/ ← in-memory Bicep generation from azure.yaml + synthesizer.go ← top-level: ServiceConfig → template files + project.bicep.tmpl ← embedded template: Foundry project + deps + agent.bicep.tmpl ← embedded template: ACR (if container agents) + *.tmpl ← other embedded templates + internal/deploy/ + bicep_runner.go ← ARM SDK deployment wrapper + parameters.go ← parameter resolution (env vars, prompts) + +cli/azd/pkg/ ← Core changes (small) + project/mapper_registry.go ← +Uses, +Runtime on ServiceConfig→proto + project/service_config.go ← +Runtime AppServiceRuntime field + infra/provisioning/provider.go ← (no change needed) + +cli/azd/grpc/proto/ + models.proto ← +runtime, +uses on ServiceConfig message + +schemas/v1.0/azure.yaml.json ← relax infra.provider enum → examples +``` + +## Provider Resolution (Verified Against Code) + +The extension's provider plugs into the existing IoC-registered factory. +Provider selection logic (`cli/azd/pkg/infra/provisioning/manager.go:505-540`) +is unchanged: + +```go +providerKey := m.options.Provider // from azure.yaml infra.provider +if providerKey == NotSpecified { + defaultProvider, _ := m.defaultProvider() // returns "bicep" + providerKey = defaultProvider +} +err = m.serviceLocator.ResolveNamed(string(providerKey), &provider) +``` + +Built-in providers register at `cli/azd/pkg/azd/default.go:79-87`. Extension +providers register at runtime via `RegisterProvisioningProviderRequest` +(`cli/azd/internal/grpcserver/provisioning_service.go:138-152`) into the same +`*ioc.NestedContainer`. From the resolver's perspective `bicep` and +`azure.ai.agents` are equivalent keys. + +`ParseProvider` was relaxed in PR #7482 to accept any string +(`cli/azd/pkg/infra/provisioning/provisioning.go:53-57`). + +## Explicit `infra.provider:` Declaration + +The RFC ideal is service-host-driven auto-routing — the extension is picked +because `host: azure.ai.agent` is present, not because `infra.provider:` is +declared. Verified gap (`cli/azd/pkg/project/importer.go:288-358`): +`ProjectInfrastructure` never inspects `service.Host` to pick a provisioning +provider. The Aspire branch (the only service-driven precedent) hard-codes +Bicep. The compose branch keys off `len(Resources)>0` and also hard-codes +Bicep. + +Adding service-host auto-routing requires a net-new branch in +`ProjectInfrastructure` plus a registry of which hosts map to which extension +providers. We defer that work and ship v0.2 with an explicit declaration: + +```yaml +infra: + provider: azure.ai.agents + +services: + foundry-project: + host: azure.ai.project + config: { ... } + my-agent: + host: azure.ai.agent + uses: [foundry-project] + runtime: { stack: python, version: "3.13" } + config: { ... } +``` + +**Cost:** developers see an extension name in the `infra.provider:` slot +historically used for IaC engines (`bicep`, `terraform`). This is a real +concept leak — `azure.ai.agents` is not an IaC engine, it's a domain extension. +Documented; tracked for v0.3+. + +**What it buys us:** all of PR #7482's plumbing works as-is. No Core changes +to `ProjectInfrastructure`. No new auto-route signal to design. + +## On-Disk Reuse (Post-Eject Behavior) + +`azure.yaml` is **never mutated by eject**. The extension's provider decides +internally whether to synthesize or read from disk: + +```go +// FoundryProvisioningProvider.Deploy(ctx) +if exists("./infra/main.bicep") { + templates = readFromDisk("./infra/") +} else { + templates = synthesizeFromYAML(serviceConfig) +} +return deployTemplates(ctx, templates) +``` + +The developer sees one `infra.provider: azure.ai.agents` declaration that +holds across both modes. Eject is a pure file-write operation; `azure.yaml` +stays clean. + +Verified: all Core sites that read `./infra/` tolerate a missing directory: + +| Site | Behavior when `./infra/` is absent | +| ------------------------------------ | ----------------------------------------------- | +| `importer.go:323` (`pathHasModule`) | Returns false → continues to fallthrough | +| `project.go:187` (`hooksFromInfraModule`) | Returns empty → no hooks merged | +| `manager.go:121` (`azdFileShareUploadOperations`) | Missing dir → no operations | +| `importer.go:304` (`detectProviderFromFiles`) | Only runs when `Provider == NotSpecified`; with our explicit declaration, never executes | + +## In-Memory Synthesis + +The extension owns the Bicep deployment pipeline. Composition: + +``` +ServiceConfig (from azure.yaml) + │ + ▼ +synthesis.Synthesizer + │ - validates azure.yaml against extension schemas + │ - merges defaults + │ - selects templates based on services (ACR only if container agent) + │ - resolves ${VAR} from azd env + ▼ +[]TemplateFile (main.bicep, modules/*.bicep, main.parameters.json) + │ + ▼ +deploy.BicepRunner + │ - resolves remaining parameters (prompts, env) + │ - calls ARM REST: deployments.CreateOrUpdate + │ - streams progress via grpcbroker.ProgressFunc + │ - captures outputs + ▼ +ProvisioningDeployResult (back to azd Core via gRPC) +``` + +The extension does **not** delegate deployment to Core's Bicep provider +(no such delegation API exists today — verified via +`cli/azd/grpc/proto/deployment.proto`, which only exposes +`GetDeployment`/`GetDeploymentContext` to extensions, not "deploy this +template"). The extension reimplements the deploy step using Azure SDK +`armresources.DeploymentsClient`. This is intentional for v0.2 — future Core +work could expose a shared Bicep-deploy API to avoid drift. + +## Validation Pipeline + +Synthesis runs only on a valid `azure.yaml`. Order, all before Bicep is +generated: + +1. **Schema validation** — each `azure.ai.*` service's `config:` block against + its JSON schema. Failures: `services.foundry-project.config.deployments[0].sku: required`. +2. **Service graph invariants** — exactly one `azure.ai.project` service; + every `azure.ai.agent` `uses:` exactly one project; no cycles. +3. **Deploy-mode invariant** — each `azure.ai.agent` has exactly one of + `runtime:` or `docker:`. Both = error. Neither = error. +4. **Env reference resolution** — every `${VAR}` in `config:` blocks must + resolve from the azd environment. +5. **Brownfield consistency** — if `resourceId:` is set on the project, it + must be a syntactically valid Foundry project ARM resource ID (existence + check at deploy time). + +All five run on every `provision`, `preview`, and `init --infra`. + +## Brownfield: Existing Foundry Projects + +Today: `USE_EXISTING_AI_PROJECT` and `AZURE_AI_PROJECT_ID` env vars; starter +Bicep branches on them. + +After: explicit field on the project service. + +```yaml +services: + foundry-project: + host: azure.ai.project + resourceId: ${AZURE_AI_PROJECT_ID} # presence → existing-project mode + config: + toolboxes: { ... } +``` + +Synthesizer behavior when `resourceId:` is set: + +- Omits the Foundry project ARM resource from generated Bicep. +- Generates references to wire `AZURE_AI_PROJECT_ENDPOINT`, + `AZURE_AI_PROJECT_ID`, `AZURE_RESOURCE_GROUP`, tenant/subscription/location. +- Still synthesizes ARM-backed children (e.g., additional model deployments) + declared under `config:`. +- Routes data-plane resources to the existing project's deploy verb. + +The `useExistingAiProject` ternary collapses to a single field-presence check +at synthesis time. + +## Eject Command (`azd ai agent init --infra`) + +Infra-only operation. Four contexts: + +| Context | Behavior | +| ----------------------------------------- | ---------------------------------------------------------------------------------------------- | +| Empty directory | Run init normally + write `./infra/` from synthesis. | +| Existing Bicep-less azd agent project | Synthesize current `azure.yaml`; write `./infra/`. Do not re-prompt; do not touch agent code. Do not modify `azure.yaml` (`infra.provider:` stays `azure.ai.agents`). | +| Existing on-disk project (`./infra/` exists) | Refuse to overwrite. Print: *"`./infra/` already exists. To regenerate from `azure.yaml`, delete the `infra/` directory and run the command again."* | +| Not an azd agent project | Refuse: "no `azure.ai.*` services found in `azure.yaml`; nothing to eject." | + +Eject is **all-or-nothing for the whole project**. No partial mode where some +agents synthesize and others sit on disk. + +Regenerating requires the user to delete `./infra/` themselves and re-run +`azd ai agent init --infra`. Rationale: no new flag surface, no special +overwrite logic, no implicit destruction of user-owned files. The user +explicitly removes the old `./infra/` (which is a git-tracked operation +they're responsible for), then asks for fresh synthesis. + +Example output: + +``` +> azd ai agent init --infra + +Generating infrastructure files from azure.yaml... + + Created infra/main.bicep + Created infra/main.parameters.json + Created infra/modules/foundry-project.bicep + Created infra/modules/acr.bicep + +Future provisions will read from ./infra/. + +Next steps: + azd provision Apply changes +``` + +Example output (refused): + +``` +> azd ai agent init --infra + +Error: ./infra/ already exists. + +If you want to regenerate from azure.yaml, delete the infra directory +and run the command again. +``` + +Example output: + +``` +> azd ai agent init --infra + +Generating infrastructure files from azure.yaml... + + Created infra/main.bicep + Created infra/main.parameters.json + Created infra/modules/foundry-project.bicep + Created infra/modules/acr.bicep + +Future provisions will read from ./infra/. + +Next steps: + azd provision Apply changes +``` + +## Post-Eject CLI Behavior + +CLI commands keep modifying `azure.yaml` after eject. Drift risk: `azure.yaml` +declares something requiring a new ARM resource (e.g., second container agent +needing ACR), but on-disk Bicep doesn't have it. + +| Command class | Bicep-less project | On-disk project (post-eject) | +| ------------------------------------------------------------ | ------------------------- | ------------------------------------------------------------------------------------ | +| Modifies data-plane only (`add tool`, `add toolbox`) | Apply normally | Apply normally — nothing in Bicep changes | +| Modifies `azure.yaml` requiring new ARM resources | Apply; next `provision` synthesizes the new resources | Apply to `azure.yaml` and warn: "your project uses on-disk Bicep; delete `./infra/` and run `azd ai agent init --infra` to regenerate, or edit `infra/` manually" | +| Eject (`init --infra`) | Allowed | Refused — user must delete `./infra/` and re-run | + +CLI never silently patches user-owned Bicep. + +## Core Changes Required + +Small, mechanical. All ride alongside `azure.ai.agents` extension work. + +### 1. Surface `uses` and `runtime` to extensions (RFC Core Ask #2) + +Today: `cli/azd/pkg/project/mapper_registry.go:148` drops `Uses` when +mapping `ServiceConfig` to proto. `Runtime` is on `AppServiceProps` only, not +on `ServiceConfig`. + +Changes: + +| File | Change | +| ------------------------------------------------- | ------------------------------------------------------------------- | +| `cli/azd/pkg/project/service_config.go` | Add `Runtime AppServiceRuntime \`yaml:"runtime,omitempty"\`` | +| `cli/azd/grpc/proto/models.proto` | Add `runtime` (typed) and `uses` (repeated string) to `ServiceConfig` | +| `cli/azd/pkg/project/mapper_registry.go:148-161` | Populate both fields in forward + reverse mappers | +| `schemas/v1.0/azure.yaml.json` | Allow `runtime:` at the service level (reuse existing schema shape at lines 1477-1489) | + +Extension reads `serviceConfig.Uses` and `serviceConfig.Runtime` from typed +proto fields instead of re-parsing `additional_properties` Struct. + +### 2. Relax `infra.provider` enum in schemas + +| File | Change | +| ------------------------------------- | -------------------------------------------------------------- | +| `schemas/v1.0/azure.yaml.json:44-52` | Change `enum: ["bicep","terraform"]` → `examples: [...]` | +| `schemas/alpha/azure.yaml.json:44-52` | Same | + +Without this, `infra.provider: azure.ai.agents` fails IDE schema validation +despite being runtime-valid. + +### 3. (Optional, deferred) Auto-install for `provisioning-provider` extensions + +Today: `cli/azd/cmd/auto_install.go:511-578` auto-installs extensions for +unknown `service-target-provider` host kinds. No equivalent for +`provisioning-provider`. Tracked as `#7502`. + +Acceptable to defer — developers writing `infra.provider: azure.ai.agents` +have opted in explicitly. `azd ai agent init` force-installs the extension at +init time anyway. The failure mode is `git clone` + `azd up` on a fresh +machine where the README is the install instruction. + +## Extension Changes Required + +### Schemas + +Two schemas owned by the extension (per #7962): + +- `azure.ai.agent.json` — agent runtime config block (already exists; trimmed + per #7962) +- `azure.ai.project.json` — project-scoped data-plane state (new per #7962) + +Both `additionalProperties: true` for forward-compatibility with future +resources (eval datasets, vector indexes). + +### Embedded templates + +`cli/azd/extensions/azure.ai.agents/internal/synthesis/*.tmpl` — Go-embedded +Bicep templates, versioned with the extension. Templates are tailored: ACR +only included when at least one agent has a `docker:` block; monitoring only +when explicitly added via `azd ai agent add monitoring` (per #8049). + +Replaces today's `Azure-Samples/azd-ai-starter-basic` Bicep entirely. The +slimming is safe because the templates ship inside the extension version — +changing them only affects projects on the new extension, not every project +from every prior build. + +### Provider implementation + +`internal/project/provisioning.go` implements +`azdext.ProvisioningProvider` (`cli/azd/pkg/azdext/provisioning_manager.go:23-36`). +Registered via `WithProvisioningProvider("azure.ai.agents", factory)` in +`internal/cmd/listen.go`. + +Method behaviors: + +| Method | Implementation | +| ----------------- | ------------------------------------------------------------------------------ | +| `Initialize` | Validate `azure.yaml` (5-step pipeline above); resolve env vars | +| `State` | Query ARM for last deployment; return outputs | +| `Deploy` | If `./infra/` exists, read from disk; else synthesize. Apply via ARM SDK. | +| `Preview` | Same as Deploy with `validationOnly` mode; return diff summary | +| `Destroy` | Delete resource group or use deployment stacks | +| `EnsureEnv` | Prompt for required env vars (subscription, location) if missing | +| `Parameters` | Return parameter list from synthesized/on-disk template | +| `PlannedOutputs` | Return output list from synthesized/on-disk template | + +## Stability Contract + +Synthesis output is best-effort stable within a minor extension version +(`0.2.x`). Same `azure.yaml` → semantically identical Bicep. Across minors, +the output may change; documented in the changelog with recommendation to run +`azd provision --preview` after upgrades. + +## Telemetry + +| Field | Values | Where emitted | +| ------------------------------ | ---------------------------- | ------------------------------------- | +| `provision.synthesis_source` | `embedded` \| `on_disk` | `Deploy()` start | +| `init.infra_flag` | `true` \| `false` | `azd ai agent init` start | + +Lets us measure eject rate and confirm the Bicep-less default sticks. + +## Downstream Impact + +- **`Azure-Samples/azd-ai-starter-basic`** — retired as init target. Repo stays + as reference. Sample README points at extension. +- **Other AZD samples that embed agent definitions** — `azd init -t ` + unchanged. Those samples bring their own `infra/` and the extension respects + them. Only default `azd ai agent init` (no `-t`) goes Bicep-less. +- **Foundry Toolkit (VS Code)** — reads `azure.yaml`; absence of `./infra/` + is normal, not corruption. No new files to parse. +- **Migration** — existing `0.1.x` projects already have `infra/` on disk; + they stay on the on-disk path. No action needed. +- **Documentation** — new doc explaining Bicep-less default, eject command, + stability contract. Migration guide for 0.1.x users (no action; everything + keeps working). + +## Risks + +| Risk | Mitigation | +| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | +| `infra.provider: azure.ai.agents` confuses developers | Documented in extension README; v0.3+ removes the declaration via service-host auto-routing | +| Extension's Bicep deployment drifts from Core's | Pin to specific ARM SDK version; integration tests vs. Core's bicep provider for parity | +| Synthesis output changes between minor versions | Changelog notes; `azd provision --preview` recommended after upgrade | +| Brownfield projects with custom Bicep edits hit eject + drift | Eject is opt-in; first-time eject just writes synthesized Bicep, no merge logic | +| Auto-install gap (#7502) bites a teammate cloning the repo | README install instruction; v0.3+ delivers auto-install | + +## Open Questions + +1. Should the extension's `Deploy()` warn when both `./infra/` exists and + `azure.yaml` config has changed since last eject? (Drift detection.) +2. Do we expose a `--preview-bicep` flag that prints synthesized Bicep + without applying, for debugging? Or rely on `--infra` + diff? +3. Schema branch for typed `host: azure.ai.agent` / `azure.ai.project` + validation (per #7962) — does it land in this RFC's PRs or #7962's? + +## Test Plan + +- Unit: synthesizer determinism (same input → byte-equal output) +- Unit: validation pipeline error paths (all five steps) +- Unit: `ResolveNamed("azure.ai.agents")` returns extension provider +- Integration: `azd ai agent init` produces no `./infra/` +- Integration: `azd provision` succeeds with synthesized templates +- Integration: `azd ai agent init --infra` writes `./infra/`; next + `azd provision` reads from disk (verified via extension log) +- Integration: brownfield `resourceId:` skips ARM project creation +- E2E: `init` → `provision` → `deploy` → `down` on a single-agent project +- E2E: `init --infra` → manual edit of `infra/main.bicep` → `provision` + applies the edit +- Regression: existing `0.1.x` projects with on-disk Bicep continue to work + (extension reads `./infra/` like today) + +## References + +- RFC [#8065](https://github.com/Azure/azure-dev/issues/8065) — original +- Issue [#7962](https://github.com/Azure/azure-dev/issues/7962) — unified + schema (dependency) +- Issue [#8049](https://github.com/Azure/azure-dev/issues/8049) — incremental + composition (parallel) +- PR [#7482](https://github.com/Azure/azure-dev/pull/7482) — custom + provisioning provider framework (merged) +- Issue [#7502](https://github.com/Azure/azure-dev/issues/7502) — auto-install + for provisioning providers (deferred dependency) +- Reference: [therealjohn/foundry-azd-config-preview](https://github.com/therealjohn/foundry-azd-config-preview/blob/main/REFERENCE.md) — target `azure.yaml` shape From f61c45b1426b67561f42275052c3e47d3a6a4b24 Mon Sep 17 00:00:00 2001 From: Zhijie Huang Date: Wed, 10 Jun 2026 14:17:28 +0800 Subject: [PATCH 2/6] docs(spec): tighten Bicep-less spec; fix code-citation errors Review-pass revisions on docs/specs/bicepless-foundry/spec.md: - Fix Core Changes #1: Uses already exists on ServiceConfig (service_config.go:58) and the v1 schema (azure.yaml.json:234); the gap is proto-only. Runtime remains the larger gap. Rewrote the section narrative and trimmed the file-change table accordingly. - Fix On-Disk Reuse table: azdFileShareUploadOperations is at infra/provisioning/manager.go:125, not :121. Disambiguated all four rows with full paths and 'call'/'gate' suffix. - Fix Core Changes mapper range: mapper_registry.go:102-162 (was :139-161, which was a sub-slice). - Remove all v0.2/v0.3+/0.1.x/0.2.x version markers; the spec doesn't own a release schedule. - Trim Problem, Solution, Provider Resolution, Explicit Declaration, Brownfield, In-Memory Synthesis, and Embedded Templates sections; remove duplicate eject example; consolidate Post-Eject trade paragraph. - Open Questions: drop --preview-bicep entry, add one-line proposals to the remaining two. --- docs/specs/bicepless-foundry/spec.md | 232 ++++++++++++--------------- 1 file changed, 106 insertions(+), 126 deletions(-) diff --git a/docs/specs/bicepless-foundry/spec.md b/docs/specs/bicepless-foundry/spec.md index aec12c2f5f5..30ba02dad93 100644 --- a/docs/specs/bicepless-foundry/spec.md +++ b/docs/specs/bicepless-foundry/spec.md @@ -2,13 +2,12 @@ ## Problem -`azd ai agent init` clones `Azure-Samples/azd-ai-starter-basic` into every new -project, dropping ~300 lines of conditional Bicep (`shouldCreateAcr`, -`useExistingAiProject ? X : Y`, 30+ outputs) on the developer's disk before they -write any agent code. The starter Bicep lives in a sample repo on `main`, so -every shipped extension version reads from the same template — slimming or -tailoring the template breaks every project initialized by every prior extension -build. The on-disk template is bloated; there is no in-place fix. +`azd ai agent init` clones `Azure-Samples/azd-ai-starter-basic` into every +new project, dropping ~300 lines of conditional Bicep (`shouldCreateAcr`, +`useExistingAiProject ? X : Y`, 30+ outputs) on the developer's disk before +they write any agent code. The starter Bicep lives in a sample repo on +`main`, so slimming it would break every project initialized by every prior +extension build. See RFC [#8065](https://github.com/Azure/azure-dev/issues/8065) for the full problem statement. @@ -18,15 +17,12 @@ problem statement. Move infrastructure templates from `Azure-Samples/azd-ai-starter-basic` into the `azure.ai.agents` extension binary. `azd ai agent init` produces only `azure.yaml` and an agent code project — no `infra/` directory. At provision -time, the extension's own provisioning provider synthesizes Bicep in memory -from `azure.yaml` and applies it. `azd ai agent init --infra` ejects on demand: -the same synthesis writes Bicep to `./infra/`, and subsequent provisions read -from disk. - -The mechanism is the **custom provisioning provider** capability merged in -[PR #7482](https://github.com/Azure/azure-dev/pull/7482). The extension -registers itself by name; the developer declares it in `azure.yaml` as -`infra.provider: azure.ai.agents`. +time, the extension's own provisioning provider (registered via the +[PR #7482](https://github.com/Azure/azure-dev/pull/7482) framework) +synthesizes Bicep in memory from `azure.yaml` and applies it. +`azd ai agent init --infra` ejects on demand: the same synthesis writes Bicep +to `./infra/`, and subsequent provisions read from disk. The developer opts +in by declaring `infra.provider: azure.ai.agents` in `azure.yaml`. ## Scope @@ -74,26 +70,28 @@ cli/azd/extensions/azure.ai.agents/ parameters.go ← parameter resolution (env vars, prompts) cli/azd/pkg/ ← Core changes (small) - project/mapper_registry.go ← +Uses, +Runtime on ServiceConfig→proto - project/service_config.go ← +Runtime AppServiceRuntime field + project/service_runtime.go ← NEW: ServiceRuntime type (no Stack enum) + project/service_config.go ← +Runtime *ServiceRuntime (Uses already present) + project/mapper_registry.go ← +Uses, +Runtime in ServiceConfig↔proto mappers infra/provisioning/provider.go ← (no change needed) cli/azd/grpc/proto/ - models.proto ← +runtime, +uses on ServiceConfig message + models.proto ← +uses, +runtime (typed) on ServiceConfig message -schemas/v1.0/azure.yaml.json ← relax infra.provider enum → examples +schemas/v1.0/azure.yaml.json ← +runtime under services. (uses already present); + relax infra.provider enum → examples ``` -## Provider Resolution (Verified Against Code) +## Provider Resolution The extension's provider plugs into the existing IoC-registered factory. -Provider selection logic (`cli/azd/pkg/infra/provisioning/manager.go:505-540`) -is unchanged: +Provider selection (`cli/azd/pkg/infra/provisioning/manager.go:505-540`) is +unchanged: ```go providerKey := m.options.Provider // from azure.yaml infra.provider if providerKey == NotSpecified { - defaultProvider, _ := m.defaultProvider() // returns "bicep" + defaultProvider, _ := m.defaultProvider() // "bicep" providerKey = defaultProvider } err = m.serviceLocator.ResolveNamed(string(providerKey), &provider) @@ -102,11 +100,9 @@ err = m.serviceLocator.ResolveNamed(string(providerKey), &provider) Built-in providers register at `cli/azd/pkg/azd/default.go:79-87`. Extension providers register at runtime via `RegisterProvisioningProviderRequest` (`cli/azd/internal/grpcserver/provisioning_service.go:138-152`) into the same -`*ioc.NestedContainer`. From the resolver's perspective `bicep` and -`azure.ai.agents` are equivalent keys. - -`ParseProvider` was relaxed in PR #7482 to accept any string -(`cli/azd/pkg/infra/provisioning/provisioning.go:53-57`). +container. `bicep` and `azure.ai.agents` are equivalent keys to the resolver. +`ParseProvider` (`cli/azd/pkg/infra/provisioning/provisioning.go:53-57`) was +relaxed in PR #7482 to accept any string. ## Explicit `infra.provider:` Declaration @@ -114,13 +110,12 @@ The RFC ideal is service-host-driven auto-routing — the extension is picked because `host: azure.ai.agent` is present, not because `infra.provider:` is declared. Verified gap (`cli/azd/pkg/project/importer.go:288-358`): `ProjectInfrastructure` never inspects `service.Host` to pick a provisioning -provider. The Aspire branch (the only service-driven precedent) hard-codes -Bicep. The compose branch keys off `len(Resources)>0` and also hard-codes -Bicep. +provider. The Aspire branch (the only service-driven precedent) and the +compose branch both hard-code Bicep. Adding service-host auto-routing requires a net-new branch in -`ProjectInfrastructure` plus a registry of which hosts map to which extension -providers. We defer that work and ship v0.2 with an explicit declaration: +`ProjectInfrastructure` plus a host→extension registry. We defer that and +ship an explicit declaration: ```yaml infra: @@ -137,13 +132,10 @@ services: config: { ... } ``` -**Cost:** developers see an extension name in the `infra.provider:` slot -historically used for IaC engines (`bicep`, `terraform`). This is a real -concept leak — `azure.ai.agents` is not an IaC engine, it's a domain extension. -Documented; tracked for v0.3+. - -**What it buys us:** all of PR #7482's plumbing works as-is. No Core changes -to `ProjectInfrastructure`. No new auto-route signal to design. +**Trade:** developers see an extension name in the `infra.provider:` slot +historically used for IaC engines (`bicep`, `terraform`) — a real concept +leak we accept to reuse PR #7482's plumbing as-is, with no Core changes to +`ProjectInfrastructure`. Revisit once service-host auto-routing lands. ## On-Disk Reuse (Post-Eject Behavior) @@ -166,12 +158,12 @@ stays clean. Verified: all Core sites that read `./infra/` tolerate a missing directory: -| Site | Behavior when `./infra/` is absent | -| ------------------------------------ | ----------------------------------------------- | -| `importer.go:323` (`pathHasModule`) | Returns false → continues to fallthrough | -| `project.go:187` (`hooksFromInfraModule`) | Returns empty → no hooks merged | -| `manager.go:121` (`azdFileShareUploadOperations`) | Missing dir → no operations | -| `importer.go:304` (`detectProviderFromFiles`) | Only runs when `Provider == NotSpecified`; with our explicit declaration, never executes | +| Site | Behavior when `./infra/` is absent | +| --------------------------------------------------------------------- | ------------------------------------------------- | +| `cli/azd/pkg/project/importer.go:323` (`pathHasModule` call) | Returns false → continues to fallthrough | +| `cli/azd/pkg/project/project.go:187` (`hooksFromInfraModule` call) | Returns empty → no hooks merged | +| `cli/azd/pkg/infra/provisioning/manager.go:125` (`azdFileShareUploadOperations` call) | Missing dir → no operations | +| `cli/azd/pkg/project/importer.go:304` (`detectProviderFromFiles` gate) | Only runs when `Provider == NotSpecified`; with our explicit declaration, never executes | ## In-Memory Synthesis @@ -199,13 +191,11 @@ deploy.BicepRunner ProvisioningDeployResult (back to azd Core via gRPC) ``` -The extension does **not** delegate deployment to Core's Bicep provider -(no such delegation API exists today — verified via -`cli/azd/grpc/proto/deployment.proto`, which only exposes -`GetDeployment`/`GetDeploymentContext` to extensions, not "deploy this -template"). The extension reimplements the deploy step using Azure SDK -`armresources.DeploymentsClient`. This is intentional for v0.2 — future Core -work could expose a shared Bicep-deploy API to avoid drift. +The extension does **not** delegate deployment to Core's Bicep provider — no +such delegation API exists today (`cli/azd/grpc/proto/deployment.proto` +exposes only `GetDeployment`/`GetDeploymentContext`). The extension +reimplements the deploy step using `armresources.DeploymentsClient`; a future +Core API could expose a shared Bicep-deploy path to avoid drift. ## Validation Pipeline @@ -228,10 +218,9 @@ All five run on every `provision`, `preview`, and `init --infra`. ## Brownfield: Existing Foundry Projects -Today: `USE_EXISTING_AI_PROJECT` and `AZURE_AI_PROJECT_ID` env vars; starter -Bicep branches on them. - -After: explicit field on the project service. +Replaces today's `USE_EXISTING_AI_PROJECT` / `AZURE_AI_PROJECT_ID` env vars +(which the starter Bicep branches on) with an explicit field on the project +service: ```yaml services: @@ -242,17 +231,12 @@ services: toolboxes: { ... } ``` -Synthesizer behavior when `resourceId:` is set: - -- Omits the Foundry project ARM resource from generated Bicep. -- Generates references to wire `AZURE_AI_PROJECT_ENDPOINT`, - `AZURE_AI_PROJECT_ID`, `AZURE_RESOURCE_GROUP`, tenant/subscription/location. -- Still synthesizes ARM-backed children (e.g., additional model deployments) - declared under `config:`. -- Routes data-plane resources to the existing project's deploy verb. - -The `useExistingAiProject` ternary collapses to a single field-presence check -at synthesis time. +When `resourceId:` is set, synthesis omits the Foundry project ARM resource; +generates references to wire `AZURE_AI_PROJECT_ENDPOINT`, +`AZURE_AI_PROJECT_ID`, `AZURE_RESOURCE_GROUP`, tenant/subscription/location; +still synthesizes ARM-backed children (e.g., additional model deployments); +and routes data-plane resources to the existing project's deploy verb. The +`useExistingAiProject` ternary collapses to a single field-presence check. ## Eject Command (`azd ai agent init --infra`) @@ -265,14 +249,10 @@ Infra-only operation. Four contexts: | Existing on-disk project (`./infra/` exists) | Refuse to overwrite. Print: *"`./infra/` already exists. To regenerate from `azure.yaml`, delete the `infra/` directory and run the command again."* | | Not an azd agent project | Refuse: "no `azure.ai.*` services found in `azure.yaml`; nothing to eject." | -Eject is **all-or-nothing for the whole project**. No partial mode where some -agents synthesize and others sit on disk. - -Regenerating requires the user to delete `./infra/` themselves and re-run -`azd ai agent init --infra`. Rationale: no new flag surface, no special -overwrite logic, no implicit destruction of user-owned files. The user -explicitly removes the old `./infra/` (which is a git-tracked operation -they're responsible for), then asks for fresh synthesis. +Eject is **all-or-nothing for the whole project** — no partial mode where +some agents synthesize and others sit on disk. To regenerate, the user +deletes `./infra/` themselves and re-runs the command. No `--force`, no +implicit destruction of user-owned files. Example output: @@ -292,7 +272,7 @@ Next steps: azd provision Apply changes ``` -Example output (refused): +Refused: ``` > azd ai agent init --infra @@ -303,24 +283,6 @@ If you want to regenerate from azure.yaml, delete the infra directory and run the command again. ``` -Example output: - -``` -> azd ai agent init --infra - -Generating infrastructure files from azure.yaml... - - Created infra/main.bicep - Created infra/main.parameters.json - Created infra/modules/foundry-project.bicep - Created infra/modules/acr.bicep - -Future provisions will read from ./infra/. - -Next steps: - azd provision Apply changes -``` - ## Post-Eject CLI Behavior CLI commands keep modifying `azure.yaml` after eject. Drift risk: `azure.yaml` @@ -335,27 +297,46 @@ needing ACR), but on-disk Bicep doesn't have it. CLI never silently patches user-owned Bicep. +**Accepted trade.** Post-eject, the user-driven `rm -rf ./infra/ && azd ai +agent init --infra` flow throws away any hand-edits the user made. We pick +this over auto-diff/merge (which would re-introduce silent rewrites of +user-owned files) and over refusing `add` post-eject (which would gut the +CLI for ejected projects). Auto-merge is future work, out of scope here. + ## Core Changes Required Small, mechanical. All ride alongside `azure.ai.agents` extension work. ### 1. Surface `uses` and `runtime` to extensions (RFC Core Ask #2) -Today: `cli/azd/pkg/project/mapper_registry.go:148` drops `Uses` when -mapping `ServiceConfig` to proto. `Runtime` is on `AppServiceProps` only, not -on `ServiceConfig`. +`Uses` already exists on the core `ServiceConfig` +(`cli/azd/pkg/project/service_config.go:58`) and in the v1 schema under +`services.` (`schemas/v1.0/azure.yaml.json:234`). The gap is proto-only: +`models.proto`'s `ServiceConfig` message (`cli/azd/grpc/proto/models.proto:87-100`) +has no `uses` field, so extensions can't read it from typed proto. + +`Runtime` is a bigger gap — it doesn't exist on `ServiceConfig` at all. It +lives only on `AppServiceProps` (`cli/azd/pkg/project/resources.go:283`, +compose side). `AppServiceRuntime` hard-restricts `Stack` to `node`/`python` +(`schemas/v1.0/azure.yaml.json:1490-1493`) — too narrow for Foundry agents, +so a new neutral `ServiceRuntime` type is added rather than reused. Changes: -| File | Change | -| ------------------------------------------------- | ------------------------------------------------------------------- | -| `cli/azd/pkg/project/service_config.go` | Add `Runtime AppServiceRuntime \`yaml:"runtime,omitempty"\`` | -| `cli/azd/grpc/proto/models.proto` | Add `runtime` (typed) and `uses` (repeated string) to `ServiceConfig` | -| `cli/azd/pkg/project/mapper_registry.go:148-161` | Populate both fields in forward + reverse mappers | -| `schemas/v1.0/azure.yaml.json` | Allow `runtime:` at the service level (reuse existing schema shape at lines 1477-1489) | +| File | Change | +| ------------------------------------------------- | --------------------------------------------------------------------------------------- | +| `cli/azd/pkg/project/service_runtime.go` (new) | Define `type ServiceRuntime struct { Stack string; Version string }` — no Stack enum | +| `cli/azd/pkg/project/service_config.go` | Add `Runtime *ServiceRuntime \`yaml:"runtime,omitempty"\`` (Uses already present at :58) | +| `cli/azd/grpc/proto/models.proto` | Add `repeated string uses = 13` and `ServiceRuntime runtime = 14` to `ServiceConfig` | +| `cli/azd/pkg/project/mapper_registry.go:102-162` | Populate `Uses` and `Runtime` in forward + reverse `ServiceConfig`↔proto mappers | +| `schemas/v1.0/azure.yaml.json` (services branch) | Add typed `runtime` under `services.` (uses already at :234). Distinct from `appServiceResource.runtime` at lines 1477-1493, which stays as-is. | Extension reads `serviceConfig.Uses` and `serviceConfig.Runtime` from typed -proto fields instead of re-parsing `additional_properties` Struct. +proto fields instead of `additional_properties`. + +> **Note for #7962:** that RFC assumes `services..uses` and +> `services..runtime` exist. `uses` already does; this spec adds +> `runtime`. ### 2. Relax `infra.provider` enum in schemas @@ -396,12 +377,8 @@ resources (eval datasets, vector indexes). `cli/azd/extensions/azure.ai.agents/internal/synthesis/*.tmpl` — Go-embedded Bicep templates, versioned with the extension. Templates are tailored: ACR only included when at least one agent has a `docker:` block; monitoring only -when explicitly added via `azd ai agent add monitoring` (per #8049). - -Replaces today's `Azure-Samples/azd-ai-starter-basic` Bicep entirely. The -slimming is safe because the templates ship inside the extension version — -changing them only affects projects on the new extension, not every project -from every prior build. +when explicitly added via `azd ai agent add monitoring` (per #8049). Replaces +`Azure-Samples/azd-ai-starter-basic` Bicep entirely. ### Provider implementation @@ -425,8 +402,8 @@ Method behaviors: ## Stability Contract -Synthesis output is best-effort stable within a minor extension version -(`0.2.x`). Same `azure.yaml` → semantically identical Bicep. Across minors, +Synthesis output is best-effort stable within a patch extension version. +Same `azure.yaml` → semantically identical Bicep. Across minor versions, the output may change; documented in the changelog with recommendation to run `azd provision --preview` after upgrades. @@ -448,30 +425,33 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. them. Only default `azd ai agent init` (no `-t`) goes Bicep-less. - **Foundry Toolkit (VS Code)** — reads `azure.yaml`; absence of `./infra/` is normal, not corruption. No new files to parse. -- **Migration** — existing `0.1.x` projects already have `infra/` on disk; - they stay on the on-disk path. No action needed. -- **Documentation** — new doc explaining Bicep-less default, eject command, - stability contract. Migration guide for 0.1.x users (no action; everything - keeps working). +- **Migration** — projects created by prior extension versions already have + `infra/` on disk; they stay on the on-disk path. No action needed. +- **Documentation** — new doc covering Bicep-less default, eject command, and + stability contract. ## Risks | Risk | Mitigation | | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | -| `infra.provider: azure.ai.agents` confuses developers | Documented in extension README; v0.3+ removes the declaration via service-host auto-routing | +| `infra.provider: azure.ai.agents` confuses developers | Documented in extension README; removed once service-host auto-routing lands | | Extension's Bicep deployment drifts from Core's | Pin to specific ARM SDK version; integration tests vs. Core's bicep provider for parity | | Synthesis output changes between minor versions | Changelog notes; `azd provision --preview` recommended after upgrade | | Brownfield projects with custom Bicep edits hit eject + drift | Eject is opt-in; first-time eject just writes synthesized Bicep, no merge logic | -| Auto-install gap (#7502) bites a teammate cloning the repo | README install instruction; v0.3+ delivers auto-install | +| Auto-install gap (#7502) bites a teammate cloning the repo | README install instruction until #7502 lands | ## Open Questions 1. Should the extension's `Deploy()` warn when both `./infra/` exists and `azure.yaml` config has changed since last eject? (Drift detection.) -2. Do we expose a `--preview-bicep` flag that prints synthesized Bicep - without applying, for debugging? Or rely on `--infra` + diff? -3. Schema branch for typed `host: azure.ai.agent` / `azure.ai.project` + **Proposal:** no detection — matches "on-disk Bicep is the source of + truth"; CLI `add` commands already warn at the entry point. Revisit when + auto-merge lands. +2. Schema branch for typed `host: azure.ai.agent` / `azure.ai.project` validation (per #7962) — does it land in this RFC's PRs or #7962's? + **Proposal:** #7962 owns the schema branches, since it defines the field + shapes. Extension validates against its own embedded schema at runtime, so + IDE schema lag during the gap is cosmetic. ## Test Plan @@ -486,8 +466,8 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. - E2E: `init` → `provision` → `deploy` → `down` on a single-agent project - E2E: `init --infra` → manual edit of `infra/main.bicep` → `provision` applies the edit -- Regression: existing `0.1.x` projects with on-disk Bicep continue to work - (extension reads `./infra/` like today) +- Regression: projects created by prior extension versions with on-disk Bicep + continue to work (extension reads `./infra/` like today) ## References From c95581db500aad54d3681c8c21a583f5a1a02422 Mon Sep 17 00:00:00 2001 From: Zhijie Huang Date: Wed, 10 Jun 2026 14:25:59 +0800 Subject: [PATCH 3/6] docs(spec): address Copilot review (Preview What-If, pathHasModule precision) - Preview: replace 'Same as Deploy with validationOnly mode' with 'ARM What-If, mirrors Core's Bicep provider'. validationOnly hits ARM's /validate endpoint and returns template-validity errors; What-If hits /whatIf and returns a real change diff. Core's bicep provider uses WhatIfDeployToResourceGroup (cli/azd/pkg/infra/scope.go:132); the spec now matches. - pathHasModule row: clarify that os.ReadDir returns NotExist on missing ./infra/, and the caller's 'err == nil && moduleExists' guard is what falls through. Prior wording 'returns false' was imprecise. --- docs/specs/bicepless-foundry/spec.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/specs/bicepless-foundry/spec.md b/docs/specs/bicepless-foundry/spec.md index 30ba02dad93..605eb92f819 100644 --- a/docs/specs/bicepless-foundry/spec.md +++ b/docs/specs/bicepless-foundry/spec.md @@ -160,7 +160,7 @@ Verified: all Core sites that read `./infra/` tolerate a missing directory: | Site | Behavior when `./infra/` is absent | | --------------------------------------------------------------------- | ------------------------------------------------- | -| `cli/azd/pkg/project/importer.go:323` (`pathHasModule` call) | Returns false → continues to fallthrough | +| `cli/azd/pkg/project/importer.go:323` (`pathHasModule` call) | `os.ReadDir` returns NotExist → caller's `err == nil && moduleExists` guard falls through | | `cli/azd/pkg/project/project.go:187` (`hooksFromInfraModule` call) | Returns empty → no hooks merged | | `cli/azd/pkg/infra/provisioning/manager.go:125` (`azdFileShareUploadOperations` call) | Missing dir → no operations | | `cli/azd/pkg/project/importer.go:304` (`detectProviderFromFiles` gate) | Only runs when `Provider == NotSpecified`; with our explicit declaration, never executes | @@ -394,7 +394,7 @@ Method behaviors: | `Initialize` | Validate `azure.yaml` (5-step pipeline above); resolve env vars | | `State` | Query ARM for last deployment; return outputs | | `Deploy` | If `./infra/` exists, read from disk; else synthesize. Apply via ARM SDK. | -| `Preview` | Same as Deploy with `validationOnly` mode; return diff summary | +| `Preview` | Synthesize (or read from disk), then call ARM What-If; return diff summary. Mirrors Core's Bicep provider. | | `Destroy` | Delete resource group or use deployment stacks | | `EnsureEnv` | Prompt for required env vars (subscription, location) if missing | | `Parameters` | Return parameter list from synthesized/on-disk template | From bca1d2262177c1032a86b0b94057cd04b5c6d15c Mon Sep 17 00:00:00 2001 From: Zhijie Huang Date: Wed, 10 Jun 2026 15:28:30 +0800 Subject: [PATCH 4/6] docs(spec): align with Foundry azure.yaml reference (microsoft.foundry host, nested agents) Adopts the consolidated YAML shape from therealjohn/foundry-azd-config-preview/REFERENCE.md and trims the spec accordingly. Shape changes: - Host: azure.ai.project + azure.ai.agent (two services, uses: link) -> microsoft.foundry (single service with nested agents[]). - Provider name: azure.ai.agents -> microsoft.foundry (matches host kind, reads like an engine next to bicep/terraform). - Brownfield signal: resourceId (ARM ID) -> endpoint (URL, matches Portal/CLI UX). - Deploy modes: added image: as third option alongside docker:/runtime:. Scope tightening: - azd deploy explicitly out of scope (agent code push and data-plane reconciliation are deploy's job). - Data-plane fields (connections, toolboxes, skills, routines, agent-level tools/skill, $ref) silently ignored by synthesizer; new field-skip table makes this explicit. - Coexistence with non-Foundry services out of scope; infra.layers[] noted as escape hatch. Core changes collapsed: - Removed "Surface uses/runtime to extensions" as a Core ask. With nested agents[], the runtime is inside the service body (read via additional_properties); no proto/struct/mapper plumbing needed. - Down to two Core changes: relax infra.provider enum, and the deferred auto-install (#7502). Validation pipeline rewritten to match the new invariants. Per-agent deploy-mode check now allows exactly one of docker/runtime/image. Brownfield validation checks endpoint URL shape. Foundry server-side templating syntax pass-through made explicit. --- docs/specs/bicepless-foundry/spec.md | 319 ++++++++++++++++----------- 1 file changed, 195 insertions(+), 124 deletions(-) diff --git a/docs/specs/bicepless-foundry/spec.md b/docs/specs/bicepless-foundry/spec.md index 605eb92f819..35845e6818c 100644 --- a/docs/specs/bicepless-foundry/spec.md +++ b/docs/specs/bicepless-foundry/spec.md @@ -22,7 +22,13 @@ time, the extension's own provisioning provider (registered via the synthesizes Bicep in memory from `azure.yaml` and applies it. `azd ai agent init --infra` ejects on demand: the same synthesis writes Bicep to `./infra/`, and subsequent provisions read from disk. The developer opts -in by declaring `infra.provider: azure.ai.agents` in `azure.yaml`. +in by declaring `infra.provider: microsoft.foundry` in `azure.yaml`. + +The `azure.yaml` shape is fixed by the +[Foundry `azure.yaml` reference](https://github.com/therealjohn/foundry-azd-config-preview/blob/main/REFERENCE.md): +a single `host: microsoft.foundry` service per project, with nested +`agents:`, `deployments:`, `connections:`, `toolboxes:`, etc. This spec +only changes how that file is *provisioned*; it does not redesign the YAML. ## Scope @@ -33,21 +39,32 @@ in by declaring `infra.provider: azure.ai.agents` in `azure.yaml`. - Embedded templates inside the `azure.ai.agents` extension - Retiring `Azure-Samples/azd-ai-starter-basic` as the init target - Schema updates to allow extension-named providers in `infra.provider` +- ARM-backed synthesis only: Foundry project + model deployments + ACR + (when needed for container agents) **Out of scope:** -- Unified `azure.yaml` schema for `azure.ai.project` / `azure.ai.agent` host - kinds — [#7962](https://github.com/Azure/azure-dev/issues/7962) +- **`azd deploy`** — agent code push, data-plane reconciliation (connections, + toolboxes, skills, routines, agent definitions) are all the deploy verb's + job, not provisioning's. This spec ends at "ARM resources are in place." +- `$ref:` resolution, `skills:`, `routines:`, agent-level `tools:`/`skill:` — + data-plane state. Synthesizer reads them only to skip them; `azd deploy` + reconciles them via Foundry APIs. +- Unified `azure.yaml` schema for `host: microsoft.foundry` — + [#7962](https://github.com/Azure/azure-dev/issues/7962) - `azd ai agent add` and incremental composition — [#8049](https://github.com/Azure/azure-dev/issues/8049) - Service-host-driven provider auto-routing (removes the explicit `infra.provider:` declaration) — RFC #8065 Core Ask #1 +- Coexistence with non-Foundry services in the same project — users with + mixed projects use `infra.layers[]` to scope `microsoft.foundry` to their + Foundry services; mixed-provider auto-routing is a future spec. ## Activation | Trigger | Behavior | | --------------------------------------------- | ---------------------------------------------------------------------------------------------- | -| `azd ai agent init` (default) | Write `azure.yaml` + agent code project. No `./infra/`. `azure.yaml` includes `infra.provider: azure.ai.agents`. | +| `azd ai agent init` (default) | Write `azure.yaml` + agent code project. No `./infra/`. `azure.yaml` includes `infra.provider: microsoft.foundry`. | | `azd ai agent init --infra` | Same as default, plus synthesize and write Bicep to `./infra/`. Project starts on-disk. | | `azd ai agent init --infra` (existing project, no `./infra/`) | Synthesize current `azure.yaml` and write `./infra/`. Do not re-prompt or touch agent code. Refuse if `./infra/` already exists. | | `azd provision` (no `./infra/`) | Extension synthesizes Bicep in memory, applies via ARM SDK. | @@ -58,28 +75,27 @@ in by declaring `infra.provider: azure.ai.agents` in `azure.yaml`. ``` cli/azd/extensions/azure.ai.agents/ internal/cmd/init.go ← gen azure.yaml; gen --infra path - internal/cmd/listen.go ← register provider via WithProvisioningProvider + internal/cmd/listen.go ← register provider via + WithProvisioningProvider("microsoft.foundry", ...) internal/project/provisioning.go ← FoundryProvisioningProvider implementation internal/synthesis/ ← in-memory Bicep generation from azure.yaml synthesizer.go ← top-level: ServiceConfig → template files project.bicep.tmpl ← embedded template: Foundry project + deps - agent.bicep.tmpl ← embedded template: ACR (if container agents) + deployments.bicep.tmpl ← embedded template: model deployments + acr.bicep.tmpl ← embedded template: ACR (if any nested + agent has a docker: block) *.tmpl ← other embedded templates internal/deploy/ bicep_runner.go ← ARM SDK deployment wrapper parameters.go ← parameter resolution (env vars, prompts) + extension.yaml ← +capability: provisioning-provider + +providers: {name: microsoft.foundry, + type: provisioning-provider} cli/azd/pkg/ ← Core changes (small) - project/service_runtime.go ← NEW: ServiceRuntime type (no Stack enum) - project/service_config.go ← +Runtime *ServiceRuntime (Uses already present) - project/mapper_registry.go ← +Uses, +Runtime in ServiceConfig↔proto mappers infra/provisioning/provider.go ← (no change needed) -cli/azd/grpc/proto/ - models.proto ← +uses, +runtime (typed) on ServiceConfig message - -schemas/v1.0/azure.yaml.json ← +runtime under services. (uses already present); - relax infra.provider enum → examples +schemas/v1.0/azure.yaml.json ← relax infra.provider enum → examples ``` ## Provider Resolution @@ -100,42 +116,56 @@ err = m.serviceLocator.ResolveNamed(string(providerKey), &provider) Built-in providers register at `cli/azd/pkg/azd/default.go:79-87`. Extension providers register at runtime via `RegisterProvisioningProviderRequest` (`cli/azd/internal/grpcserver/provisioning_service.go:138-152`) into the same -container. `bicep` and `azure.ai.agents` are equivalent keys to the resolver. +container. `bicep` and `microsoft.foundry` are equivalent keys to the resolver. `ParseProvider` (`cli/azd/pkg/infra/provisioning/provisioning.go:53-57`) was relaxed in PR #7482 to accept any string. ## Explicit `infra.provider:` Declaration The RFC ideal is service-host-driven auto-routing — the extension is picked -because `host: azure.ai.agent` is present, not because `infra.provider:` is -declared. Verified gap (`cli/azd/pkg/project/importer.go:288-358`): +because `host: microsoft.foundry` is present, not because `infra.provider:` +is declared. Verified gap (`cli/azd/pkg/project/importer.go:288-358`): `ProjectInfrastructure` never inspects `service.Host` to pick a provisioning provider. The Aspire branch (the only service-driven precedent) and the compose branch both hard-code Bicep. Adding service-host auto-routing requires a net-new branch in `ProjectInfrastructure` plus a host→extension registry. We defer that and -ship an explicit declaration: +ship an explicit declaration. The +[reference YAML](https://github.com/therealjohn/foundry-azd-config-preview/blob/main/REFERENCE.md) +omits `infra.provider:`; this spec requires it as a one-line addition until +auto-routing lands: ```yaml infra: - provider: azure.ai.agents + provider: microsoft.foundry # added by `azd ai agent init` services: - foundry-project: - host: azure.ai.project - config: { ... } - my-agent: - host: azure.ai.agent - uses: [foundry-project] - runtime: { stack: python, version: "3.13" } - config: { ... } + my-project: + host: microsoft.foundry + # endpoint: ${FOUNDRY_PROJECT_ENDPOINT} # uncomment for brownfield + deployments: + - name: gpt-4.1-mini + model: { format: OpenAI, name: gpt-4.1-mini, version: "2025-04-14" } + sku: { capacity: 10, name: GlobalStandard } + connections: [ ... ] # data-plane, ignored by synthesizer + toolboxes: [ ... ] # data-plane, ignored by synthesizer + agents: + - name: my-agent + kind: hosted + project: src/my-agent + docker: { path: Dockerfile, remoteBuild: true } + # … rest is data-plane, ignored by synthesizer ``` -**Trade:** developers see an extension name in the `infra.provider:` slot -historically used for IaC engines (`bicep`, `terraform`) — a real concept -leak we accept to reuse PR #7482's plumbing as-is, with no Core changes to -`ProjectInfrastructure`. Revisit once service-host auto-routing lands. +The synthesizer only reads the ARM-backed fields (project itself, model +`deployments`, and ACR if any nested agent has a `docker:` block); the +rest is the deploy verb's job and out of scope for this spec. + +**Trade:** developers see a provider name in the `infra.provider:` slot — +mild friction, but `microsoft.foundry` matches the host kind and reads +naturally next to `bicep` / `terraform`. Removed once service-host +auto-routing lands. ## On-Disk Reuse (Post-Eject Behavior) @@ -152,7 +182,7 @@ if exists("./infra/main.bicep") { return deployTemplates(ctx, templates) ``` -The developer sees one `infra.provider: azure.ai.agents` declaration that +The developer sees one `infra.provider: microsoft.foundry` declaration that holds across both modes. Eject is a pure file-write operation; `azure.yaml` stays clean. @@ -170,14 +200,19 @@ Verified: all Core sites that read `./infra/` tolerate a missing directory: The extension owns the Bicep deployment pipeline. Composition: ``` -ServiceConfig (from azure.yaml) +ServiceConfig (host: microsoft.foundry, from azure.yaml) │ ▼ synthesis.Synthesizer - │ - validates azure.yaml against extension schemas - │ - merges defaults - │ - selects templates based on services (ACR only if container agent) - │ - resolves ${VAR} from azd env + │ - validates against azure.ai.agent.json schema + │ - reads ARM-backed fields only: + │ * project itself (or skip if endpoint: set) + │ * deployments[] + │ * agents[].docker → triggers ACR + │ * agents[].image → no ACR (pre-built image) + │ - skips data-plane fields (connections, toolboxes, skills, + │ routines, agents[].tools, agents[].skill, $ref) + │ - resolves ${VAR} from azd env; passes ${{...}} through verbatim ▼ []TemplateFile (main.bicep, modules/*.bicep, main.parameters.json) │ @@ -186,7 +221,7 @@ deploy.BicepRunner │ - resolves remaining parameters (prompts, env) │ - calls ARM REST: deployments.CreateOrUpdate │ - streams progress via grpcbroker.ProgressFunc - │ - captures outputs + │ - captures outputs (project endpoint, ACR login server, etc.) ▼ ProvisioningDeployResult (back to azd Core via gRPC) ``` @@ -197,47 +232,86 @@ exposes only `GetDeployment`/`GetDeploymentContext`). The extension reimplements the deploy step using `armresources.DeploymentsClient`; a future Core API could expose a shared Bicep-deploy path to avoid drift. +### What the synthesizer ignores (deploy verb's job) + +The reference YAML carries a lot of data-plane state that has no ARM +representation. The synthesizer reads these only to skip them; `azd deploy` +reconciles them via Foundry APIs (out of scope for this spec): + +| YAML field | Why ignored at synthesis | +| ----------------------------------- | --------------------------------------------------------- | +| `connections:` | Foundry data-plane resource, not ARM | +| `toolboxes:` | Foundry data-plane resource | +| `skills:` | Foundry data-plane resource | +| `routines:` | Foundry data-plane resource | +| `agents[].tools:` | Agent definition, posted via Foundry API at deploy | +| `agents[].skill:` | Reference to a skill (data-plane) | +| `agents[].protocols:` | Agent definition | +| `agents[].env:` | Agent runtime env, applied at deploy | +| `agents[].startupCommand:` | Agent runtime config | +| `agents[].container.resources:` | Agent runtime config | +| `agents[].runtime:` | Code-deploy mode marker; deploy verb's job | +| `$ref:` (anywhere) | Loaded but contents treated as data-plane; not validated | + ## Validation Pipeline Synthesis runs only on a valid `azure.yaml`. Order, all before Bicep is generated: -1. **Schema validation** — each `azure.ai.*` service's `config:` block against - its JSON schema. Failures: `services.foundry-project.config.deployments[0].sku: required`. -2. **Service graph invariants** — exactly one `azure.ai.project` service; - every `azure.ai.agent` `uses:` exactly one project; no cycles. -3. **Deploy-mode invariant** — each `azure.ai.agent` has exactly one of - `runtime:` or `docker:`. Both = error. Neither = error. -4. **Env reference resolution** — every `${VAR}` in `config:` blocks must - resolve from the azd environment. -5. **Brownfield consistency** — if `resourceId:` is set on the project, it - must be a syntactically valid Foundry project ARM resource ID (existence - check at deploy time). - -All five run on every `provision`, `preview`, and `init --infra`. +1. **Schema validation** — each `host: microsoft.foundry` service's body + against `azure.ai.agent.json`. Failures surface with field path: + `services.my-project.deployments[0].sku: required`. +2. **Service graph invariants** — at least one `host: microsoft.foundry` + service exists. Multiple are allowed (each is its own Foundry project). + `uses:` between Foundry services and other azd services is honored as + normal ordering — but synthesis only acts on the Foundry-hosted ones. +3. **Per-agent deploy-mode invariant** — each entry in `agents[]` has + exactly one of `docker:`, `runtime:`, or `image:`. Two or more = error. + None = error. +4. **Env reference resolution** — every `${VAR}` in ARM-backed fields + (project endpoint, deployment name/SKU/capacity, ACR options) must + resolve from the azd environment. `${{...}}` is opaque to the + synthesizer — it is passed through verbatim for Foundry to resolve + server-side at runtime. +5. **Brownfield consistency** — if `endpoint:` is set on the Foundry + service, the value must look like a Foundry project endpoint URL + (`https://.services.ai.azure.com/api/projects/` or + equivalent). Reachability is a deploy-time check, not synthesis. + +All five run on every `provision`, `preview`, and `init --infra`. Data-plane +fields (connections, toolboxes, skills, routines, agent-level tools/skill, +`$ref:` contents) are not validated here — the deploy verb owns them. ## Brownfield: Existing Foundry Projects Replaces today's `USE_EXISTING_AI_PROJECT` / `AZURE_AI_PROJECT_ID` env vars -(which the starter Bicep branches on) with an explicit field on the project -service: +(which the starter Bicep branches on) with the +[reference doc's](https://github.com/therealjohn/foundry-azd-config-preview/blob/main/REFERENCE.md) +`endpoint:` field on the Foundry service: ```yaml services: - foundry-project: - host: azure.ai.project - resourceId: ${AZURE_AI_PROJECT_ID} # presence → existing-project mode - config: - toolboxes: { ... } + my-project: + host: microsoft.foundry + endpoint: ${FOUNDRY_PROJECT_ENDPOINT} # presence → existing-project mode + deployments: [ ... ] + agents: [ ... ] ``` -When `resourceId:` is set, synthesis omits the Foundry project ARM resource; -generates references to wire `AZURE_AI_PROJECT_ENDPOINT`, -`AZURE_AI_PROJECT_ID`, `AZURE_RESOURCE_GROUP`, tenant/subscription/location; -still synthesizes ARM-backed children (e.g., additional model deployments); -and routes data-plane resources to the existing project's deploy verb. The +When `endpoint:` is set, synthesis omits the Foundry project ARM resource +and generates references to wire `FOUNDRY_PROJECT_ENDPOINT`, +`AZURE_RESOURCE_GROUP`, and tenant/subscription/location from the existing +project (resolved at deploy time via the endpoint). It still synthesizes +ARM-backed children declared inline — additional model `deployments[]`, +ACR if any nested agent has a `docker:` block. The `useExistingAiProject` ternary collapses to a single field-presence check. +The endpoint URL (not the ARM resource ID) is the user-facing identifier in +the reference doc, matching what `az` CLI and the Portal display. The deploy +verb resolves the ARM ID from the endpoint when it needs control-plane +access; synthesis treats `endpoint:` purely as a "skip ARM project creation" +signal. + ## Eject Command (`azd ai agent init --infra`) Infra-only operation. Four contexts: @@ -245,9 +319,9 @@ Infra-only operation. Four contexts: | Context | Behavior | | ----------------------------------------- | ---------------------------------------------------------------------------------------------- | | Empty directory | Run init normally + write `./infra/` from synthesis. | -| Existing Bicep-less azd agent project | Synthesize current `azure.yaml`; write `./infra/`. Do not re-prompt; do not touch agent code. Do not modify `azure.yaml` (`infra.provider:` stays `azure.ai.agents`). | +| Existing Bicep-less azd agent project | Synthesize current `azure.yaml`; write `./infra/`. Do not re-prompt; do not touch agent code. Do not modify `azure.yaml` (`infra.provider:` stays `microsoft.foundry`). | | Existing on-disk project (`./infra/` exists) | Refuse to overwrite. Print: *"`./infra/` already exists. To regenerate from `azure.yaml`, delete the `infra/` directory and run the command again."* | -| Not an azd agent project | Refuse: "no `azure.ai.*` services found in `azure.yaml`; nothing to eject." | +| Not an azd agent project | Refuse: "no `host: microsoft.foundry` service found in `azure.yaml`; nothing to eject." | Eject is **all-or-nothing for the whole project** — no partial mode where some agents synthesize and others sit on disk. To regenerate, the user @@ -307,54 +381,31 @@ CLI for ejected projects). Auto-merge is future work, out of scope here. Small, mechanical. All ride alongside `azure.ai.agents` extension work. -### 1. Surface `uses` and `runtime` to extensions (RFC Core Ask #2) - -`Uses` already exists on the core `ServiceConfig` -(`cli/azd/pkg/project/service_config.go:58`) and in the v1 schema under -`services.` (`schemas/v1.0/azure.yaml.json:234`). The gap is proto-only: -`models.proto`'s `ServiceConfig` message (`cli/azd/grpc/proto/models.proto:87-100`) -has no `uses` field, so extensions can't read it from typed proto. - -`Runtime` is a bigger gap — it doesn't exist on `ServiceConfig` at all. It -lives only on `AppServiceProps` (`cli/azd/pkg/project/resources.go:283`, -compose side). `AppServiceRuntime` hard-restricts `Stack` to `node`/`python` -(`schemas/v1.0/azure.yaml.json:1490-1493`) — too narrow for Foundry agents, -so a new neutral `ServiceRuntime` type is added rather than reused. - -Changes: - -| File | Change | -| ------------------------------------------------- | --------------------------------------------------------------------------------------- | -| `cli/azd/pkg/project/service_runtime.go` (new) | Define `type ServiceRuntime struct { Stack string; Version string }` — no Stack enum | -| `cli/azd/pkg/project/service_config.go` | Add `Runtime *ServiceRuntime \`yaml:"runtime,omitempty"\`` (Uses already present at :58) | -| `cli/azd/grpc/proto/models.proto` | Add `repeated string uses = 13` and `ServiceRuntime runtime = 14` to `ServiceConfig` | -| `cli/azd/pkg/project/mapper_registry.go:102-162` | Populate `Uses` and `Runtime` in forward + reverse `ServiceConfig`↔proto mappers | -| `schemas/v1.0/azure.yaml.json` (services branch) | Add typed `runtime` under `services.` (uses already at :234). Distinct from `appServiceResource.runtime` at lines 1477-1493, which stays as-is. | - -Extension reads `serviceConfig.Uses` and `serviceConfig.Runtime` from typed -proto fields instead of `additional_properties`. - -> **Note for #7962:** that RFC assumes `services..uses` and -> `services..runtime` exist. `uses` already does; this spec adds -> `runtime`. - -### 2. Relax `infra.provider` enum in schemas +### 1. Relax `infra.provider` enum in schemas | File | Change | | ------------------------------------- | -------------------------------------------------------------- | | `schemas/v1.0/azure.yaml.json:44-52` | Change `enum: ["bicep","terraform"]` → `examples: [...]` | | `schemas/alpha/azure.yaml.json:44-52` | Same | -Without this, `infra.provider: azure.ai.agents` fails IDE schema validation -despite being runtime-valid. +Without this, `infra.provider: microsoft.foundry` fails IDE schema +validation despite being runtime-valid. -### 3. (Optional, deferred) Auto-install for `provisioning-provider` extensions +> **Note on `uses` / `runtime`:** the original RFC asked Core to surface +> `services..uses` and a typed `services..runtime` on the +> extension-facing proto. With the consolidated `host: microsoft.foundry` +> shape, agents and their runtimes are *nested* inside the service body +> (`agents[].runtime`, `agents[].docker`, `agents[].image`), not separate +> services. The extension reads them through the existing +> `additional_properties` channel; no proto/struct changes needed for v1. + +### 2. (Optional, deferred) Auto-install for `provisioning-provider` extensions Today: `cli/azd/cmd/auto_install.go:511-578` auto-installs extensions for unknown `service-target-provider` host kinds. No equivalent for `provisioning-provider`. Tracked as `#7502`. -Acceptable to defer — developers writing `infra.provider: azure.ai.agents` +Acceptable to defer — developers writing `infra.provider: microsoft.foundry` have opted in explicitly. `azd ai agent init` force-installs the extension at init time anyway. The failure mode is `git clone` + `azd up` on a fresh machine where the README is the install instruction. @@ -363,29 +414,42 @@ machine where the README is the install instruction. ### Schemas -Two schemas owned by the extension (per #7962): - -- `azure.ai.agent.json` — agent runtime config block (already exists; trimmed - per #7962) -- `azure.ai.project.json` — project-scoped data-plane state (new per #7962) - -Both `additionalProperties: true` for forward-compatibility with future -resources (eval datasets, vector indexes). +The reference YAML keys (project + nested `agents:`, `deployments:`, +`connections:`, `toolboxes:`, `skills:`, `routines:`) are governed by the +existing `azure.ai.agent.json` schema the agents extension already publishes +(`cli/azd/extensions/azure.ai.agents/schemas/azure.ai.agent.json`). The +schema covers both ARM-backed fields (read by the synthesizer) and +data-plane fields (read by the deploy verb). `additionalProperties: true` +keeps it forward-compatible with future Foundry resource types. ### Embedded templates `cli/azd/extensions/azure.ai.agents/internal/synthesis/*.tmpl` — Go-embedded Bicep templates, versioned with the extension. Templates are tailored: ACR -only included when at least one agent has a `docker:` block; monitoring only -when explicitly added via `azd ai agent add monitoring` (per #8049). Replaces -`Azure-Samples/azd-ai-starter-basic` Bicep entirely. +only included when at least one entry in `agents[]` has a `docker:` block; +monitoring only when explicitly added via `azd ai agent add monitoring` (per +#8049). Replaces `Azure-Samples/azd-ai-starter-basic` Bicep entirely. ### Provider implementation `internal/project/provisioning.go` implements `azdext.ProvisioningProvider` (`cli/azd/pkg/azdext/provisioning_manager.go:23-36`). -Registered via `WithProvisioningProvider("azure.ai.agents", factory)` in -`internal/cmd/listen.go`. +Registered via `WithProvisioningProvider("microsoft.foundry", factory)` in +`internal/cmd/listen.go`. `extension.yaml` adds the +`provisioning-provider` capability and declares the provider: + +```yaml +capabilities: + - custom-commands + - lifecycle-events + - mcp-server + - service-target-provider + - provisioning-provider # new + - metadata +providers: + - { name: azure.ai.agent, type: service-target } # existing + - { name: microsoft.foundry, type: provisioning-provider } # new +``` Method behaviors: @@ -434,7 +498,7 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. | Risk | Mitigation | | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | -| `infra.provider: azure.ai.agents` confuses developers | Documented in extension README; removed once service-host auto-routing lands | +| `infra.provider: microsoft.foundry` confuses developers | Documented in extension README; removed once service-host auto-routing lands | | Extension's Bicep deployment drifts from Core's | Pin to specific ARM SDK version; integration tests vs. Core's bicep provider for parity | | Synthesis output changes between minor versions | Changelog notes; `azd provision --preview` recommended after upgrade | | Brownfield projects with custom Bicep edits hit eject + drift | Eject is opt-in; first-time eject just writes synthesized Bicep, no merge logic | @@ -447,23 +511,30 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. **Proposal:** no detection — matches "on-disk Bicep is the source of truth"; CLI `add` commands already warn at the entry point. Revisit when auto-merge lands. -2. Schema branch for typed `host: azure.ai.agent` / `azure.ai.project` - validation (per #7962) — does it land in this RFC's PRs or #7962's? - **Proposal:** #7962 owns the schema branches, since it defines the field - shapes. Extension validates against its own embedded schema at runtime, so - IDE schema lag during the gap is cosmetic. +2. Schema branch for typed `host: microsoft.foundry` validation in the v1 + `azure.yaml.json` (per #7962) — does it land in this RFC's PRs or #7962's? + **Proposal:** #7962 owns the schema branch, since it defines the field + shapes. Extension validates against its own embedded `azure.ai.agent.json` + schema at runtime, so IDE schema lag during the gap is cosmetic. ## Test Plan - Unit: synthesizer determinism (same input → byte-equal output) - Unit: validation pipeline error paths (all five steps) -- Unit: `ResolveNamed("azure.ai.agents")` returns extension provider +- Unit: `ResolveNamed("microsoft.foundry")` returns extension provider +- Unit: synthesizer ignores data-plane fields (`connections:`, `toolboxes:`, + `skills:`, `routines:`, agent-level `tools:`/`skill:`, `$ref:`) without + error +- Unit: `${{...}}` passes through synthesis unchanged; `${VAR}` resolves - Integration: `azd ai agent init` produces no `./infra/` -- Integration: `azd provision` succeeds with synthesized templates +- Integration: `azd provision` succeeds with synthesized templates against a + Foundry project with one container agent (ACR included) and one code-deploy + agent (no ACR) - Integration: `azd ai agent init --infra` writes `./infra/`; next `azd provision` reads from disk (verified via extension log) -- Integration: brownfield `resourceId:` skips ARM project creation -- E2E: `init` → `provision` → `deploy` → `down` on a single-agent project +- Integration: brownfield `endpoint:` skips ARM project creation +- E2E: `init` → `provision` → `down` on a single-agent project (deploy is + out of scope for this spec) - E2E: `init --infra` → manual edit of `infra/main.bicep` → `provision` applies the edit - Regression: projects created by prior extension versions with on-disk Bicep From 3392f3c888b886357a1d524fa26ed5baa3476dfa Mon Sep 17 00:00:00 2001 From: Zhijie Huang Date: Wed, 10 Jun 2026 16:23:26 +0800 Subject: [PATCH 5/6] docs(spec): drop out-of-scope references (#7502, #7962, #8049) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove Core Changes section on auto-install (#7502 — already delivered by #7482; not a gap, not in our scope). - Drop Open Question 2 (schema branch ownership with #7962). It was a coordination artifact, not a real dependency. - Drop #7962 and #8049 References entries. - Drop the forward reference to `azd ai agent add monitoring` (per #8049) from Embedded templates — monitoring is out of scope. - Drop the auto-install Risks row. Keeps #7962 and #8049 only as one-line out-of-scope pointers in the Scope section. --- docs/specs/bicepless-foundry/spec.md | 36 ++++------------------------ 1 file changed, 4 insertions(+), 32 deletions(-) diff --git a/docs/specs/bicepless-foundry/spec.md b/docs/specs/bicepless-foundry/spec.md index 35845e6818c..77989d8eb9e 100644 --- a/docs/specs/bicepless-foundry/spec.md +++ b/docs/specs/bicepless-foundry/spec.md @@ -379,18 +379,14 @@ CLI for ejected projects). Auto-merge is future work, out of scope here. ## Core Changes Required -Small, mechanical. All ride alongside `azure.ai.agents` extension work. - -### 1. Relax `infra.provider` enum in schemas +Relax the `infra.provider` enum in the schemas so `microsoft.foundry` is +runtime-valid in IDE validation: | File | Change | | ------------------------------------- | -------------------------------------------------------------- | | `schemas/v1.0/azure.yaml.json:44-52` | Change `enum: ["bicep","terraform"]` → `examples: [...]` | | `schemas/alpha/azure.yaml.json:44-52` | Same | -Without this, `infra.provider: microsoft.foundry` fails IDE schema -validation despite being runtime-valid. - > **Note on `uses` / `runtime`:** the original RFC asked Core to surface > `services..uses` and a typed `services..runtime` on the > extension-facing proto. With the consolidated `host: microsoft.foundry` @@ -399,17 +395,6 @@ validation despite being runtime-valid. > services. The extension reads them through the existing > `additional_properties` channel; no proto/struct changes needed for v1. -### 2. (Optional, deferred) Auto-install for `provisioning-provider` extensions - -Today: `cli/azd/cmd/auto_install.go:511-578` auto-installs extensions for -unknown `service-target-provider` host kinds. No equivalent for -`provisioning-provider`. Tracked as `#7502`. - -Acceptable to defer — developers writing `infra.provider: microsoft.foundry` -have opted in explicitly. `azd ai agent init` force-installs the extension at -init time anyway. The failure mode is `git clone` + `azd up` on a fresh -machine where the README is the install instruction. - ## Extension Changes Required ### Schemas @@ -426,9 +411,8 @@ keeps it forward-compatible with future Foundry resource types. `cli/azd/extensions/azure.ai.agents/internal/synthesis/*.tmpl` — Go-embedded Bicep templates, versioned with the extension. Templates are tailored: ACR -only included when at least one entry in `agents[]` has a `docker:` block; -monitoring only when explicitly added via `azd ai agent add monitoring` (per -#8049). Replaces `Azure-Samples/azd-ai-starter-basic` Bicep entirely. +only included when at least one entry in `agents[]` has a `docker:` block. +Replaces `Azure-Samples/azd-ai-starter-basic` Bicep entirely. ### Provider implementation @@ -502,7 +486,6 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. | Extension's Bicep deployment drifts from Core's | Pin to specific ARM SDK version; integration tests vs. Core's bicep provider for parity | | Synthesis output changes between minor versions | Changelog notes; `azd provision --preview` recommended after upgrade | | Brownfield projects with custom Bicep edits hit eject + drift | Eject is opt-in; first-time eject just writes synthesized Bicep, no merge logic | -| Auto-install gap (#7502) bites a teammate cloning the repo | README install instruction until #7502 lands | ## Open Questions @@ -511,11 +494,6 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. **Proposal:** no detection — matches "on-disk Bicep is the source of truth"; CLI `add` commands already warn at the entry point. Revisit when auto-merge lands. -2. Schema branch for typed `host: microsoft.foundry` validation in the v1 - `azure.yaml.json` (per #7962) — does it land in this RFC's PRs or #7962's? - **Proposal:** #7962 owns the schema branch, since it defines the field - shapes. Extension validates against its own embedded `azure.ai.agent.json` - schema at runtime, so IDE schema lag during the gap is cosmetic. ## Test Plan @@ -543,12 +521,6 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. ## References - RFC [#8065](https://github.com/Azure/azure-dev/issues/8065) — original -- Issue [#7962](https://github.com/Azure/azure-dev/issues/7962) — unified - schema (dependency) -- Issue [#8049](https://github.com/Azure/azure-dev/issues/8049) — incremental - composition (parallel) - PR [#7482](https://github.com/Azure/azure-dev/pull/7482) — custom provisioning provider framework (merged) -- Issue [#7502](https://github.com/Azure/azure-dev/issues/7502) — auto-install - for provisioning providers (deferred dependency) - Reference: [therealjohn/foundry-azd-config-preview](https://github.com/therealjohn/foundry-azd-config-preview/blob/main/REFERENCE.md) — target `azure.yaml` shape From 574a2b2579ff08228c77a83138405dc8e3d4045a Mon Sep 17 00:00:00 2001 From: Zhijie Huang Date: Thu, 11 Jun 2026 16:30:25 +0800 Subject: [PATCH 6/6] docs(spec): address glharper review (drift, eject UX, Destroy purge, schema, scope) All 8 substantive points from the maintainer review applied: - Split "What synthesizer ignores" into two tables: read-for-branching (docker/runtime/image, needed by validation step 3 and ARM branching) vs not-read-at-all (data-plane). Resolves the runtime contradiction. - Open Question 1 flipped from "no detection" to warn-on-Deploy(). Pseudocode + method-table row now describe the in-memory diff against on-disk Bicep. - Eject UX now matches azd infra generate (cmd/infra_generate.go:204-210): interactive overwrite prompt, --no-prompt keeps hard-refuse for CI. Post-Eject CLI table and Accepted-trade paragraph updated. - Destroy row spells out soft-delete purge of Cognitive Services accounts to mirror Core's bicep provider (bicep_provider.go:1283-1413). Without this, up -> down -> up under the same name fails. - Schema relaxation now pattern: ^[a-z0-9.]+$ + examples, not examples alone. Keeps typo catching for all users. - Brownfield section + Preview row: both Deploy and Preview now resolve endpoint -> ARM ID + target scope before invoking ARM. Preview can't run on a brownfield project without scope resolution. - Telemetry section names docs/reference/telemetry-data.md as an implementation-PR deliverable per cli/azd/AGENTS.md:246-249. - infra.layers[] escape hatch verified inline: InfraLayer.Provider field (provisioning/provider.go:57 -> :40) + ParseProvider accepts any string. - Stability Contract tightened from "semantically identical" to "byte-stable within a patch extension version / byte-identical Bicep," matching the Test Plan's byte-equal standard. Test Plan picked up entries for each new behavior: schema pattern, eject overwrite prompt, post-eject Deploy() drift warn, brownfield Preview scope, and an expanded init -> provision -> down -> provision E2E for soft-delete purge. --- docs/specs/bicepless-foundry/spec.md | 142 ++++++++++++++++++--------- 1 file changed, 95 insertions(+), 47 deletions(-) diff --git a/docs/specs/bicepless-foundry/spec.md b/docs/specs/bicepless-foundry/spec.md index 77989d8eb9e..4c7a3d3cc25 100644 --- a/docs/specs/bicepless-foundry/spec.md +++ b/docs/specs/bicepless-foundry/spec.md @@ -58,7 +58,10 @@ only changes how that file is *provisioned*; it does not redesign the YAML. `infra.provider:` declaration) — RFC #8065 Core Ask #1 - Coexistence with non-Foundry services in the same project — users with mixed projects use `infra.layers[]` to scope `microsoft.foundry` to their - Foundry services; mixed-provider auto-routing is a future spec. + Foundry services; mixed-provider auto-routing is a future spec. Verified: + each `InfraLayer` carries an `Options.Provider` field + (`cli/azd/pkg/infra/provisioning/provider.go:57` → `:40`) and + `ParseProvider` accepts any string, so a layer can name `microsoft.foundry`. ## Activation @@ -95,7 +98,7 @@ cli/azd/extensions/azure.ai.agents/ cli/azd/pkg/ ← Core changes (small) infra/provisioning/provider.go ← (no change needed) -schemas/v1.0/azure.yaml.json ← relax infra.provider enum → examples +schemas/v1.0/azure.yaml.json ← infra.provider enum → pattern+examples ``` ## Provider Resolution @@ -176,6 +179,11 @@ internally whether to synthesize or read from disk: // FoundryProvisioningProvider.Deploy(ctx) if exists("./infra/main.bicep") { templates = readFromDisk("./infra/") + // Drift detection (Open Question 1): also synthesize in-memory and + // warn — don't block — if the ARM-relevant fields disagree. + if synth := synthesizeFromYAML(serviceConfig); driftsFrom(synth, templates) { + warn("azure.yaml has drifted from ./infra/; on-disk Bicep wins") + } } else { templates = synthesizeFromYAML(serviceConfig) } @@ -232,13 +240,24 @@ exposes only `GetDeployment`/`GetDeploymentContext`). The extension reimplements the deploy step using `armresources.DeploymentsClient`; a future Core API could expose a shared Bicep-deploy path to avoid drift. -### What the synthesizer ignores (deploy verb's job) +### What the synthesizer reads vs ignores -The reference YAML carries a lot of data-plane state that has no ARM -representation. The synthesizer reads these only to skip them; `azd deploy` -reconciles them via Foundry APIs (out of scope for this spec): +The reference YAML carries both ARM-backed state and Foundry data-plane state. +The synthesizer's job is narrow: -| YAML field | Why ignored at synthesis | +**Read, but emit no ARM of their own — used only for validation and +ARM-graph branching:** + +| YAML field | Why read | +| ---------------------- | ----------------------------------------------------------------- | +| `agents[].docker:` | Presence triggers ACR inclusion; deploy-mode invariant (step 3) | +| `agents[].runtime:` | Deploy-mode invariant (exactly one of docker/runtime/image) | +| `agents[].image:` | Deploy-mode invariant; presence means no ACR (pre-built) | + +**Not read at all — `azd deploy` reconciles via Foundry APIs (out of scope +for this spec):** + +| YAML field | Why ignored | | ----------------------------------- | --------------------------------------------------------- | | `connections:` | Foundry data-plane resource, not ARM | | `toolboxes:` | Foundry data-plane resource | @@ -250,7 +269,6 @@ reconciles them via Foundry APIs (out of scope for this spec): | `agents[].env:` | Agent runtime env, applied at deploy | | `agents[].startupCommand:` | Agent runtime config | | `agents[].container.resources:` | Agent runtime config | -| `agents[].runtime:` | Code-deploy mode marker; deploy verb's job | | `$ref:` (anywhere) | Loaded but contents treated as data-plane; not validated | ## Validation Pipeline @@ -301,15 +319,17 @@ services: When `endpoint:` is set, synthesis omits the Foundry project ARM resource and generates references to wire `FOUNDRY_PROJECT_ENDPOINT`, `AZURE_RESOURCE_GROUP`, and tenant/subscription/location from the existing -project (resolved at deploy time via the endpoint). It still synthesizes +project (resolved at provision time via the endpoint). It still synthesizes ARM-backed children declared inline — additional model `deployments[]`, ACR if any nested agent has a `docker:` block. The `useExistingAiProject` ternary collapses to a single field-presence check. The endpoint URL (not the ARM resource ID) is the user-facing identifier in -the reference doc, matching what `az` CLI and the Portal display. The deploy -verb resolves the ARM ID from the endpoint when it needs control-plane -access; synthesis treats `endpoint:` purely as a "skip ARM project creation" +the reference doc, matching what `az` CLI and the Portal display. **Both +`Deploy` and `Preview` resolve the ARM ID + target scope (subscription / +resource group) from the endpoint** before invoking ARM — `Preview` needs +the scope to pick `WhatIfAtSubscriptionScope` vs `WhatIfAtResourceGroupScope`. +Synthesis itself treats `endpoint:` purely as a "skip ARM project creation" signal. ## Eject Command (`azd ai agent init --infra`) @@ -319,14 +339,13 @@ Infra-only operation. Four contexts: | Context | Behavior | | ----------------------------------------- | ---------------------------------------------------------------------------------------------- | | Empty directory | Run init normally + write `./infra/` from synthesis. | -| Existing Bicep-less azd agent project | Synthesize current `azure.yaml`; write `./infra/`. Do not re-prompt; do not touch agent code. Do not modify `azure.yaml` (`infra.provider:` stays `microsoft.foundry`). | -| Existing on-disk project (`./infra/` exists) | Refuse to overwrite. Print: *"`./infra/` already exists. To regenerate from `azure.yaml`, delete the `infra/` directory and run the command again."* | +| Existing Bicep-less azd agent project | Synthesize current `azure.yaml`; write `./infra/`. Do not re-prompt for agent init; do not touch agent code. Do not modify `azure.yaml` (`infra.provider:` stays `microsoft.foundry`). | +| Existing on-disk project (`./infra/` exists) | Interactive: prompt *"`./infra/` exists. Overwrite with regenerated files? [y/N]"*. Yes → overwrite. No / `--no-prompt` → refuse with the same error message used today. Matches `azd infra generate` (`cli/azd/cmd/infra_generate.go:204-210`). | | Not an azd agent project | Refuse: "no `host: microsoft.foundry` service found in `azure.yaml`; nothing to eject." | Eject is **all-or-nothing for the whole project** — no partial mode where -some agents synthesize and others sit on disk. To regenerate, the user -deletes `./infra/` themselves and re-runs the command. No `--force`, no -implicit destruction of user-owned files. +some agents synthesize and others sit on disk. CI / non-interactive +(`--no-prompt`) callers cannot overwrite; they must delete `./infra/` first. Example output: @@ -346,15 +365,23 @@ Next steps: azd provision Apply changes ``` -Refused: +Overwrite prompt: ``` > azd ai agent init --infra +? ./infra/ already exists. Overwrite with regenerated files? [y/N] +``` + +Refused (under `--no-prompt`, or user answered No): + +``` +> azd ai agent init --infra --no-prompt + Error: ./infra/ already exists. -If you want to regenerate from azure.yaml, delete the infra directory -and run the command again. +Re-run without --no-prompt to choose interactively, or delete the infra +directory and re-run. ``` ## Post-Eject CLI Behavior @@ -367,25 +394,31 @@ needing ACR), but on-disk Bicep doesn't have it. | ------------------------------------------------------------ | ------------------------- | ------------------------------------------------------------------------------------ | | Modifies data-plane only (`add tool`, `add toolbox`) | Apply normally | Apply normally — nothing in Bicep changes | | Modifies `azure.yaml` requiring new ARM resources | Apply; next `provision` synthesizes the new resources | Apply to `azure.yaml` and warn: "your project uses on-disk Bicep; delete `./infra/` and run `azd ai agent init --infra` to regenerate, or edit `infra/` manually" | -| Eject (`init --infra`) | Allowed | Refused — user must delete `./infra/` and re-run | +| Eject (`init --infra`) | Allowed | Prompts for overwrite (or refuses under `--no-prompt`); see Eject Command section | CLI never silently patches user-owned Bicep. -**Accepted trade.** Post-eject, the user-driven `rm -rf ./infra/ && azd ai -agent init --infra` flow throws away any hand-edits the user made. We pick -this over auto-diff/merge (which would re-introduce silent rewrites of -user-owned files) and over refusing `add` post-eject (which would gut the -CLI for ejected projects). Auto-merge is future work, out of scope here. +**Accepted trade.** Whether the user re-ejects via the interactive prompt +or by deleting `./infra/` themselves, regeneration throws away any +hand-edits. We pick this over auto-diff/merge (which would re-introduce +silent rewrites of user-owned files) and over refusing `add` post-eject +(which would gut the CLI for ejected projects). Auto-merge is future +work, out of scope here. ## Core Changes Required -Relax the `infra.provider` enum in the schemas so `microsoft.foundry` is -runtime-valid in IDE validation: +Relax the `infra.provider` enum in the schemas so extension-provided +providers like `microsoft.foundry` are runtime-valid in IDE validation, +while still catching typos: -| File | Change | -| ------------------------------------- | -------------------------------------------------------------- | -| `schemas/v1.0/azure.yaml.json:44-52` | Change `enum: ["bicep","terraform"]` → `examples: [...]` | -| `schemas/alpha/azure.yaml.json:44-52` | Same | +| File | Change | +| ------------------------------------- | --------------------------------------------------------------------------------------------------- | +| `schemas/v1.0/azure.yaml.json:44-52` | Replace `enum: ["bicep","terraform"]` with `pattern: "^[a-z0-9.]+$"` + `examples: ["bicep","terraform","microsoft.foundry"]`. `pattern` keeps typo-catching (e.g. `biceps` still fails); `examples` keeps autocomplete. | +| `schemas/alpha/azure.yaml.json:44-52` | Same | + +Note that `examples` alone is purely advisory in JSON Schema; without the +`pattern`, dropping the enum would silently accept any string for *all* +users, not just Foundry users. > **Note on `uses` / `runtime`:** the original RFC asked Core to surface > `services..uses` and a typed `services..runtime` on the @@ -441,19 +474,20 @@ Method behaviors: | ----------------- | ------------------------------------------------------------------------------ | | `Initialize` | Validate `azure.yaml` (5-step pipeline above); resolve env vars | | `State` | Query ARM for last deployment; return outputs | -| `Deploy` | If `./infra/` exists, read from disk; else synthesize. Apply via ARM SDK. | -| `Preview` | Synthesize (or read from disk), then call ARM What-If; return diff summary. Mirrors Core's Bicep provider. | -| `Destroy` | Delete resource group or use deployment stacks | +| `Deploy` | If `./infra/` exists, read from disk *and* synthesize in-memory; warn (don't block) when the ARM-relevant fields disagree. Apply via ARM SDK. If `./infra/` missing, synthesize and apply. | +| `Preview` | Resolve target scope (from `endpoint:` for brownfield, from azd env otherwise); synthesize (or read from disk); call ARM What-If at the right scope; return diff summary. Mirrors Core's Bicep provider. | +| `Destroy` | Delete resource group (or use deployment stacks), **then purge soft-deleted Cognitive Services accounts (Foundry projects)** so the same name can be re-provisioned. Mirrors Core's Bicep provider: `getCognitiveAccountsToPurge` + `purgeItems` (`cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go:1283-1413`). Without purge, `up → down → up` under the same name fails on the second provision. | | `EnsureEnv` | Prompt for required env vars (subscription, location) if missing | | `Parameters` | Return parameter list from synthesized/on-disk template | | `PlannedOutputs` | Return output list from synthesized/on-disk template | ## Stability Contract -Synthesis output is best-effort stable within a patch extension version. -Same `azure.yaml` → semantically identical Bicep. Across minor versions, -the output may change; documented in the changelog with recommendation to run -`azd provision --preview` after upgrades. +Synthesis output is **byte-stable within a patch extension version** — +the same `azure.yaml` produces byte-identical Bicep across patch releases, +matched by the synthesizer-determinism test in Test Plan. Across minor +versions the output may change; documented in the changelog with a +recommendation to run `azd provision --preview` after upgrades. ## Telemetry @@ -464,6 +498,9 @@ the output may change; documented in the changelog with recommendation to run Lets us measure eject rate and confirm the Bicep-less default sticks. +Implementation PRs must also add both fields to +`docs/reference/telemetry-data.md` per `cli/azd/AGENTS.md:246-249`. + ## Downstream Impact - **`Azure-Samples/azd-ai-starter-basic`** — retired as init target. Repo stays @@ -489,11 +526,15 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. ## Open Questions -1. Should the extension's `Deploy()` warn when both `./infra/` exists and - `azure.yaml` config has changed since last eject? (Drift detection.) - **Proposal:** no detection — matches "on-disk Bicep is the source of - truth"; CLI `add` commands already warn at the entry point. Revisit when - auto-merge lands. +1. How aggressive should drift detection be at `Deploy()` when both + `./infra/` exists and `azure.yaml` has changed since eject? + **Proposal:** warn (don't block) on every on-disk `Deploy()`. The + extension owns both branches (`synthesizeFromYAML` and `readFromDisk`), + so it can synthesize in-memory, diff the ARM-relevant fields against + what `./infra/main.bicep` declares, and print a one-line warning when + they disagree. Catches hand-edits to `azure.yaml` that bypass every + `add`-command warning. Resolution (block vs warn vs silent) is the + open question. ## Test Plan @@ -504,15 +545,22 @@ Lets us measure eject rate and confirm the Bicep-less default sticks. `skills:`, `routines:`, agent-level `tools:`/`skill:`, `$ref:`) without error - Unit: `${{...}}` passes through synthesis unchanged; `${VAR}` resolves +- Unit: schema `pattern` rejects typos (`provider: biceps` fails) while + accepting `bicep`, `terraform`, and `microsoft.foundry` - Integration: `azd ai agent init` produces no `./infra/` - Integration: `azd provision` succeeds with synthesized templates against a Foundry project with one container agent (ACR included) and one code-deploy agent (no ACR) - Integration: `azd ai agent init --infra` writes `./infra/`; next `azd provision` reads from disk (verified via extension log) -- Integration: brownfield `endpoint:` skips ARM project creation -- E2E: `init` → `provision` → `down` on a single-agent project (deploy is - out of scope for this spec) +- Integration: `azd ai agent init --infra` over an existing `./infra/` + prompts; `--no-prompt` refuses; "Yes" answer overwrites +- Integration: brownfield `endpoint:` skips ARM project creation; `Preview` + resolves the endpoint to a target scope and calls What-If at that scope +- Integration: post-eject `Deploy()` warns (does not block) when + on-disk Bicep disagrees with synthesized output +- E2E: `init` → `provision` → `down` → `provision` under the same name + succeeds (verifies soft-delete purge of Cognitive Services accounts) - E2E: `init --infra` → manual edit of `infra/main.bicep` → `provision` applies the edit - Regression: projects created by prior extension versions with on-disk Bicep