[FEATURE] metal-agent: honor InferenceService.spec.runtime per-CR (multi-runtime on one agent)

## Feature Description

Make `metal-agent` honor `InferenceService.spec.runtime` per-CR
instead of pinning all served models to its global `--runtime` flag.
A single metal-agent should be able to host a llama-server GGUF and
an mlx-server MLX model concurrently, selected by the ISvc CR.

## Problem Statement

Today `metal-agent` takes a single `--runtime` flag and that's it:

- `cmd/metal-agent/main.go:217` sets `cfg.Runtime` from the flag,
  defaulting to `llama-server`.
- The dispatch switches at `pkg/agent/agent.go:294` and `:544` key off
  `a.config.Runtime` (the agent-global value), never `isvc.Spec.Runtime`.
- `isvc.Spec.Runtime` is read at `pkg/agent/agent.go:1314` but is only
  propagated forward for telemetry / status; the executor selection
  ignores it.

Operational consequence: if you want to serve GGUF (Carnice, phi-4-mini,
Qwen3 GGUFs) AND MLX-format models (Qwen3.6-35B MLX, future MLX
Phi-class) from the same Apple Silicon node, you need two metal-agents
on different ports, or you flip the launchd unit's `--runtime` flag
each time you switch model families and bounce the agent.

Surfaced 2026-05-24 during Foreman V3 demo prep: the M5 Max was
configured `--runtime mlx-server` for the Swift mlx-server dogfood;
that prevented the same agent from serving the locked Carnice GGUF
coder model V3 calls for. Workaround tonight: flip M5 Max back to
`--runtime llama-server`, accept that mlx-server work pauses.

## Proposed Solution

1. **Runtime dispatch becomes per-ISvc.** `agent.go:294/544` switch
   on `isvc.Spec.Runtime` instead of `a.config.Runtime`. The agent
   maintains a registry of runtime executors keyed by runtime name.
2. **`--runtime` flag stays for back-compat as the default**: when
   `isvc.Spec.Runtime == ""`, fall back to `cfg.Runtime`. Existing CRs
   with no `spec.runtime` keep working.
3. **Capability advertisement extends to multi-runtime**: the metal-
   agent advertises `runtimes: [llama-server, mlx-server, ...]` in its
   FleetNode capability instead of a single `runtime: <one>` label.
   The scheduler / operator can pre-filter ISvc -> node matches by
   runtime support.
4. **Per-runtime binary flags stay agent-global** (`--mlx-server-bin`,
   `--llama-server-bin`, etc.). The CR picks the runtime; the agent
   knows where the binaries live.

### Concrete change shape

- `pkg/agent/agent.go`: new `Runtimes` field on the per-ISvc executor
  selection path; helper `resolveRuntime(isvc) string` does the
  defaulting.
- `cmd/metal-agent/main.go`: keep current flags; new optional flag
  `--runtimes llama-server,mlx-server` enables multi-runtime mode (or
  derive from which binary paths are populated).
- `pkg/agent/agent.go:1314`: stop being telemetry-only; feed
  `isvc.Spec.Runtime` into the executor lookup.
- Tests: extend `agent_test.go` table to cover empty `spec.runtime`
  (fall back to `cfg.Runtime`) and per-ISvc override.
- Doc / CHANGELOG mention.

### Out of scope for this issue

- New runtimes (vllm, sglang, etc.). This is purely about respecting
  the existing `spec.runtime` field at dispatch time.
- Per-ISvc binary path overrides (e.g. one ISvc using a custom
  mlx-server build). v0.2 problem.

## Additional Context

- Related to the namespace-partition design (issue #524) -- both are
  about a single metal-agent gracefully hosting heterogeneous workloads.
- The launchd-flip workaround works for single-developer dogfooding
  but does not scale to a multi-engineer fleet where each node runs
  one agent serving multiple model families.

## Priority

- [x] Medium - Nice to have

(Bumps to High once we want a single Apple Silicon node serving
GGUF + MLX simultaneously, which is the natural next step for the
Foreman fleet story.)

## Willingness to Contribute

- [x] Yes, I can submit a PR


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] metal-agent: honor InferenceService.spec.runtime per-CR (multi-runtime on one agent) #525

Feature Description

Problem Statement

Proposed Solution

Concrete change shape

Out of scope for this issue

Additional Context

Priority

Willingness to Contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] metal-agent: honor InferenceService.spec.runtime per-CR (multi-runtime on one agent) #525

Description

Feature Description

Problem Statement

Proposed Solution

Concrete change shape

Out of scope for this issue

Additional Context

Priority

Willingness to Contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions