Skip to content

fix(helm): use native gRPC probe on relay :4222 and bump default to v1.17.15#2282

Open
pesarkhobeee wants to merge 2 commits into
microsoft:mainfrom
pesarkhobeee:fix/hubble-relay-native-grpc-probe
Open

fix(helm): use native gRPC probe on relay :4222 and bump default to v1.17.15#2282
pesarkhobeee wants to merge 2 commits into
microsoft:mainfrom
pesarkhobeee:fix/hubble-relay-native-grpc-probe

Conversation

@pesarkhobeee

Copy link
Copy Markdown

Closes #2165

@nddq external contributors can't self-assign on this repo — could you assign #2165 to me, please?

Summary

  1. templates/hubble-relay/deployment.yaml — replace the conditional grpc_health_probe exec / native gRPC probe branching with a single native grpc: probe pointed at the relay's dedicated health server on :4222. Works regardless of hubble.tls.enabled (the health listener is plaintext; only the main API on listenPort is TLS-wrapped). Mirrors what upstream Cilium's chart adopted after cilium/cilium#37806.
  2. values.yaml — bump default hubble.relay.image from mcr.microsoft.com/oss/cilium/hubble-relay:v1.15.0 to quay.io/cilium/hubble-relay:v1.17.15 (lowest currently-supported Cilium minor at its latest patch, smallest jump that lands on a maintained line). Refs: v1.17.15 release notes, available tags.

Why

With the chart's default hubble.tls.enabled: true, the probe template selects the exec grpc_health_probe branch. Upstream Cilium removed that binary from the relay image (backported to all supported branches), so any v1.16+ tag fails the startup probe with executable file not found in $PATH and the rollout stalls. Reproduced against v1.17.15.

Probing :4222 instead of listenPort (4245) avoids this without forcing operators to disable TLS or ship a custom relay image — the health server is always plaintext regardless of the main API's TLS state.

The chart's hubble-relay probe template branches between a native gRPC
probe and exec'ing `grpc_health_probe`, gated on Kubernetes version and
`hubble.tls.enabled`. With the default `hubble.tls.enabled: true` the
exec branch is selected, but upstream Cilium removed the
`grpc_health_probe` binary from the hubble-relay image in
cilium/cilium#37806, so any chart user pinning a v1.16+ relay image
hits `executable file not found in $PATH` on every startup probe and
the rollout stalls.

Switch the probe to the relay's dedicated gRPC health server on :4222,
which is always plaintext regardless of `hubble.tls.enabled` (TLS only
applies to the main API listener on `listenPort`). This matches the
approach upstream Cilium's chart already uses since they dropped the
binary, and works for both TLS-enabled and TLS-disabled deployments
without forcing operators to disable TLS just to upgrade the relay
image.

Refs: microsoft#2165
Signed-off-by: pesarkhobeee <ahmadian.farid.1988@gmail.com>
Switch the chart's default hubble-relay image from
mcr.microsoft.com/oss/cilium/hubble-relay:v1.15.0 to the official
upstream registry quay.io/cilium/hubble-relay at v1.17.15. v1.17 is the
lowest currently-supported Cilium minor and v1.17.15 is its latest
patch, which keeps the jump from the previous default (1.15.0) as small
as possible while landing on a maintained release line.

Refs: microsoft#2165
Signed-off-by: pesarkhobeee <ahmadian.farid.1988@gmail.com>
@pesarkhobeee pesarkhobeee requested a review from a team as a code owner May 4, 2026 22:08
@pesarkhobeee pesarkhobeee requested review from nddq and snguyen64 May 4, 2026 22:08
@pesarkhobeee

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree [company="goflink"]

@pesarkhobeee

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="goflink"

repository: "mcr.microsoft.com/oss/cilium/hubble-relay"
tag: "v1.15.0"
digest: "sha256:19cd56e7618832257bf88b2f281287cb57f9f7fcb9e04775a6198d4bc4daffae"
repository: "quay.io/cilium/hubble-relay"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change all hubble related repo to quay.io as well, since the mcr images are deprecated

- -rpc-timeout=5s
{{- end }}
{{- end }}
port: 4222

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets parameterize this in values.yaml

repository: "quay.io/cilium/hubble-relay"
tag: "v1.17.15"
digest: "sha256:60dcac76e5841a14d5c4813377cb463822db78568146e8c93ffc5b5cc0e894fb"
useDigest: false

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's flip this to true to leverage the sha digest

@slariviere

Copy link
Copy Markdown
Contributor

@pesarkhobeee I've addressed the review comments in pesarkhobeee#1 if you'd like to take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hubble-relay image v1.15.0 missing h2 ALPN support that breaks hubble CLI (grpc-go >= 1.67)

3 participants