Problem
The RewriteImageRegistry() function (pkg/config/settings_v1.go:191-216) decides whether to rewrite a container image reference based on whether the image basename starts with scion-. This is a naming-convention heuristic, and it breaks when template authors provide fully-qualified scion-* images from external registries.
Example failure:
- Template sets
image: ghcr.io/myorg/scion-elixir:latest
- Broker has
image_registry: us-central1-docker.pkg.dev/my-project/scion-images
- Rewrite produces:
us-central1-docker.pkg.dev/my-project/scion-images/scion-elixir:latest
- The image doesn't exist in the broker's registry → agent fails to start
The template author's intentional, fully-qualified image reference is silently destroyed by the rewrite.
PR GoogleCloudPlatform#425 adds image_pinned: true to ScionConfig as an opt-out flag. When set in a template's scion-agent.yaml, the registry rewrite is skipped.
Assessment: This solves the immediate problem but has drawbacks:
- Misleading name —
image_pinned sounds like version pinning, not "skip registry rewrite"
- Manual escape hatch — every template with a custom
scion-* image must remember to set this flag
- Doesn't fix the heuristic — the
scion- prefix convention remains the decision mechanism; the flag just patches around it
The real issue
The scion- prefix is being used as a proxy for "this image is managed by the broker and should be portable across registries." But scion- is a naming convention, not an ownership signal. A custom image can legitimately use the scion- prefix while being hosted outside the broker's registry.
The resolution chain itself is well-designed (template overrides harness-config, CLI overrides both). The problem is narrowly in the registry rewrite heuristic.
Alternative proposals
Alternative A: Format-based detection (recommended)
Only rewrite images that are "bare" names (no registry/path prefix). If an image contains a registry domain, treat it as an intentional fully-qualified reference and don't rewrite.
scion-claude:latest → rewrite (bare name, no registry)
library/scion-claude:latest → rewrite (no domain, treated as relative)
ghcr.io/myorg/scion-elixir:latest → keep (has registry domain)
us-central1-docker.pkg.dev/.../scion-base:v2 → keep (has registry domain)
Detection: a registry domain is present if the part before the first / contains a . or : (following Docker/OCI convention).
Pros: Follows Docker's own semantics. No new config fields. Backwards-compatible for the common case (bare scion-* names get rewritten; qualified references don't). Template authors don't need to learn a new flag.
Cons: Changes behavior for any existing template that specifies a fully-qualified scion-* image AND expects it to be rewritten (this would be unusual — why specify a full registry if you want it replaced?).
Alternative B: Template variable interpolation
Let templates use image: ${image_registry}/scion-elixir:latest. Templates wanting broker-portable images use the variable; templates wanting fixed images use a literal. Eliminates the rewrite function entirely in favor of explicit intent.
Pros: Fully explicit, no heuristic at all.
Cons: Requires a variable system, breaking change for existing templates.
Alternative C: Rename and restructure the PR GoogleCloudPlatform#425 approach
Accept the opt-out mechanism but with a clearer name (skip_image_registry_rewrite or similar) and consider nesting under an image config struct for extensibility.
Pros: Minimal change from current PR.
Cons: Still a manual escape hatch per-template.
Recommendation
Alternative A (format-based detection) addresses the root cause with minimal complexity. It's the approach Docker itself uses to distinguish "pull from default registry" vs "pull from specified registry." The change is localized to RewriteImageRegistry() and its test suite.
If Alternative A is too risky for existing deployments, Alternative C (rename + restructure PR GoogleCloudPlatform#425) is a reasonable interim step.
References
Problem
The
RewriteImageRegistry()function (pkg/config/settings_v1.go:191-216) decides whether to rewrite a container image reference based on whether the image basename starts withscion-. This is a naming-convention heuristic, and it breaks when template authors provide fully-qualifiedscion-*images from external registries.Example failure:
image: ghcr.io/myorg/scion-elixir:latestimage_registry: us-central1-docker.pkg.dev/my-project/scion-imagesus-central1-docker.pkg.dev/my-project/scion-images/scion-elixir:latestThe template author's intentional, fully-qualified image reference is silently destroyed by the rewrite.
What PR GoogleCloudPlatform#425 does
PR GoogleCloudPlatform#425 adds
image_pinned: truetoScionConfigas an opt-out flag. When set in a template'sscion-agent.yaml, the registry rewrite is skipped.Assessment: This solves the immediate problem but has drawbacks:
image_pinnedsounds like version pinning, not "skip registry rewrite"scion-*image must remember to set this flagscion-prefix convention remains the decision mechanism; the flag just patches around itThe real issue
The
scion-prefix is being used as a proxy for "this image is managed by the broker and should be portable across registries." Butscion-is a naming convention, not an ownership signal. A custom image can legitimately use thescion-prefix while being hosted outside the broker's registry.The resolution chain itself is well-designed (template overrides harness-config, CLI overrides both). The problem is narrowly in the registry rewrite heuristic.
Alternative proposals
Alternative A: Format-based detection (recommended)
Only rewrite images that are "bare" names (no registry/path prefix). If an image contains a registry domain, treat it as an intentional fully-qualified reference and don't rewrite.
Detection: a registry domain is present if the part before the first
/contains a.or:(following Docker/OCI convention).Pros: Follows Docker's own semantics. No new config fields. Backwards-compatible for the common case (bare
scion-*names get rewritten; qualified references don't). Template authors don't need to learn a new flag.Cons: Changes behavior for any existing template that specifies a fully-qualified
scion-*image AND expects it to be rewritten (this would be unusual — why specify a full registry if you want it replaced?).Alternative B: Template variable interpolation
Let templates use
image: ${image_registry}/scion-elixir:latest. Templates wanting broker-portable images use the variable; templates wanting fixed images use a literal. Eliminates the rewrite function entirely in favor of explicit intent.Pros: Fully explicit, no heuristic at all.
Cons: Requires a variable system, breaking change for existing templates.
Alternative C: Rename and restructure the PR GoogleCloudPlatform#425 approach
Accept the opt-out mechanism but with a clearer name (
skip_image_registry_rewriteor similar) and consider nesting under an image config struct for extensibility.Pros: Minimal change from current PR.
Cons: Still a manual escape hatch per-template.
Recommendation
Alternative A (format-based detection) addresses the root cause with minimal complexity. It's the approach Docker itself uses to distinguish "pull from default registry" vs "pull from specified registry." The change is localized to
RewriteImageRegistry()and its test suite.If Alternative A is too risky for existing deployments, Alternative C (rename + restructure PR GoogleCloudPlatform#425) is a reasonable interim step.
References
feat: add image_pinned to skip registry rewrite for custom imagesRewriteImageRegistry():pkg/config/settings_v1.go:191-216pkg/agent/run.go:206-335pkg/agent/provision.go:616-700MergeScionConfig():pkg/config/templates.go:668-827