Skip to content

Image registry rewrite uses unreliable scion-* heuristic; consider format-based detection #265

Description

@ptone

Problem

The RewriteImageRegistry() function (pkg/config/settings_v1.go:191-216) decides whether to rewrite a container image reference based on whether the image basename starts with scion-. This is a naming-convention heuristic, and it breaks when template authors provide fully-qualified scion-* images from external registries.

Example failure:

  • Template sets image: ghcr.io/myorg/scion-elixir:latest
  • Broker has image_registry: us-central1-docker.pkg.dev/my-project/scion-images
  • Rewrite produces: us-central1-docker.pkg.dev/my-project/scion-images/scion-elixir:latest
  • The image doesn't exist in the broker's registry → agent fails to start

The template author's intentional, fully-qualified image reference is silently destroyed by the rewrite.

What PR GoogleCloudPlatform#425 does

PR GoogleCloudPlatform#425 adds image_pinned: true to ScionConfig as an opt-out flag. When set in a template's scion-agent.yaml, the registry rewrite is skipped.

Assessment: This solves the immediate problem but has drawbacks:

  • Misleading nameimage_pinned sounds like version pinning, not "skip registry rewrite"
  • Manual escape hatch — every template with a custom scion-* image must remember to set this flag
  • Doesn't fix the heuristic — the scion- prefix convention remains the decision mechanism; the flag just patches around it

The real issue

The scion- prefix is being used as a proxy for "this image is managed by the broker and should be portable across registries." But scion- is a naming convention, not an ownership signal. A custom image can legitimately use the scion- prefix while being hosted outside the broker's registry.

The resolution chain itself is well-designed (template overrides harness-config, CLI overrides both). The problem is narrowly in the registry rewrite heuristic.

Alternative proposals

Alternative A: Format-based detection (recommended)

Only rewrite images that are "bare" names (no registry/path prefix). If an image contains a registry domain, treat it as an intentional fully-qualified reference and don't rewrite.

scion-claude:latest              → rewrite (bare name, no registry)
library/scion-claude:latest      → rewrite (no domain, treated as relative)
ghcr.io/myorg/scion-elixir:latest → keep (has registry domain)
us-central1-docker.pkg.dev/.../scion-base:v2 → keep (has registry domain)

Detection: a registry domain is present if the part before the first / contains a . or : (following Docker/OCI convention).

Pros: Follows Docker's own semantics. No new config fields. Backwards-compatible for the common case (bare scion-* names get rewritten; qualified references don't). Template authors don't need to learn a new flag.

Cons: Changes behavior for any existing template that specifies a fully-qualified scion-* image AND expects it to be rewritten (this would be unusual — why specify a full registry if you want it replaced?).

Alternative B: Template variable interpolation

Let templates use image: ${image_registry}/scion-elixir:latest. Templates wanting broker-portable images use the variable; templates wanting fixed images use a literal. Eliminates the rewrite function entirely in favor of explicit intent.

Pros: Fully explicit, no heuristic at all.
Cons: Requires a variable system, breaking change for existing templates.

Alternative C: Rename and restructure the PR GoogleCloudPlatform#425 approach

Accept the opt-out mechanism but with a clearer name (skip_image_registry_rewrite or similar) and consider nesting under an image config struct for extensibility.

Pros: Minimal change from current PR.
Cons: Still a manual escape hatch per-template.

Recommendation

Alternative A (format-based detection) addresses the root cause with minimal complexity. It's the approach Docker itself uses to distinguish "pull from default registry" vs "pull from specified registry." The change is localized to RewriteImageRegistry() and its test suite.

If Alternative A is too risky for existing deployments, Alternative C (rename + restructure PR GoogleCloudPlatform#425) is a reasonable interim step.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions