feat(node-data-broker): run broker as main container with health probes by giuliocalzo · Pull Request #368 · NVIDIA/topograph

giuliocalzo · 2026-06-27T14:32:02Z

Description

Replace the node-data-broker chart's init container plus curl sleeper with a single main container running node-data-broker-initc, removing the dependency on the curlimages/curl image for this subchart.

node-data-broker runs node-data-broker-initc as the DaemonSet's main container. The binary applies node annotations once at startup, then serves /healthz on a configurable port (default 8080) until SIGTERM so the pod stays Running.
A startup probe gates liveness/readiness until /healthz serves, giving slow providers (e.g. infiniband ibnetdiscover) up to failureThreshold × periodSeconds (default 5m) to finish the initial apply.
Annotations are re-applied periodically via refreshInterval (default 5m; set to 0 to disable) so node metadata stays current without pod restarts. Failures on refresh are logged only.
The separate init container, the initc values block, the node-data-broker.initImage helper, and the tail -f /dev/null placeholder are removed; initc.extraArgs moves to top-level extraArgs.
Docs (docs/providers/infiniband.md) and helm-unittest suites/snapshots updated to match.

Complements #363 (node-observer in-process health wait).

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
All commits are signed off per DCO (git commit -s).

Test plan

go test ./cmd/node-data-broker-initc/...
make chart-test

github-actions · 2026-06-27T14:34:25Z

🌿 Preview your docs: https://nvidia-preview-pull-request-368.docs.buildwithfern.com/topograph

greptile-apps · 2026-06-27T14:35:38Z

Greptile Summary

This PR replaces the node-data-broker chart's two-container pattern (init container + tail -f /dev/null curl placeholder) with a single long-running main container running node-data-broker-initc. The binary applies node annotations at startup, then serves /healthz until SIGTERM, allowing Kubernetes startup/liveness/readiness probes to gate the pod lifecycle correctly.

The curlimages/curl placeholder container and the separate IB image variant (Dockerfile.ib, docker-ib.yml) are removed; InfiniBand tooling (ibnetdiscover) is folded into the main Alpine image via apk add rdma-core, and the nodeBroker struct holds a single kubernetes.Clientset created once at startup (not re-created on each refresh).
A configurable refreshInterval (default 5 minutes) re-applies node annotations periodically; startupProbe gates liveness/readiness until the initial apply completes, giving slow providers like ibnetdiscover up to failureThreshold × periodSeconds before being considered failed.
Helm values, subchart tests, snapshots, example values files, and documentation are all updated to match the new single-container design.

Confidence Score: 5/5

Safe to merge. The refactor is well-contained — the single-container design is simpler than the old init + curl pattern, shutdown is handled correctly via signal context, and the clientset is created once at startup rather than on each refresh.

The core logic (Get-then-Update node annotations, refresh loop, graceful HTTP shutdown) is correct and race-free. Periodic refreshes are strictly sequential within the loop. The IB tooling consolidation into the main Alpine image is valid — Alpine's rdma-core package includes ibnetdiscover. Tests cover the new health server, shutdown path, refresh interval, error continuation, and context cancellation. No data-loss or correctness issues were found.

No files require special attention.

Important Files Changed

Filename	Overview
cmd/node-data-broker-initc/main.go	Core logic refactor: adds nodeBroker struct, periodic refresh loop, HTTP health server, and graceful shutdown via signal context. Clientset is created once at startup. Logic is clean and safe.
cmd/node-data-broker-initc/main_test.go	New tests for health handler, graceful shutdown, refresh loop interval, error continuation, and context cancellation. Uses sync.Once to prevent double-close; coverage is solid.
charts/topograph/charts/node-data-broker/templates/daemonset.yaml	Init container removed; main container now runs node-data-broker-initc with startup/liveness/readiness probes. Port and refreshInterval args wired from values.
charts/topograph/charts/node-data-broker/values.yaml	initc block removed; port, refreshInterval, extraArgs, and startupProbe config added. Image now defaults to ghcr.io/nvidia/topograph matching the main chart.
Dockerfile	Adds `apk add --no-cache rdma-core` to the Alpine final stage; rdma-core on Alpine 3.x includes ibnetdiscover, merging the IB variant into the main image.
.github/workflows/docker-ib.yml	Deleted: IB-specific Docker workflow is no longer needed since rdma-core tooling is folded into the main image build.
charts/topograph/tests/subcharts_test.yaml	Helm unit tests updated: initContainers path replaced with containers[0], new tests for refreshInterval arg and startup probe values.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker-initc
    participant Provider as Cloud/IB Provider
    participant KAPI as Kubernetes API

    K8s->>NDB: start container
    NDB->>KAPI: newInClusterClientset()
    KAPI-->>NDB: clientset + config
    NDB->>Provider: getAnnotations(ctx) [initial apply]
    Provider-->>NDB: annotations map
    NDB->>KAPI: Nodes().Get(nodeName)
    KAPI-->>NDB: node object
    NDB->>KAPI: Nodes().Update(node + annotations)
    KAPI-->>NDB: updated node
    Note over NDB: startupProbe gates here
    NDB->>NDB: serveHealth(:8080)
    Note over K8s,NDB: startup probe passes → liveness/readiness active
    loop every refreshInterval (default 5m)
        NDB->>Provider: getAnnotations(ctx) [periodic refresh]
        Provider-->>NDB: annotations map
        NDB->>KAPI: Nodes().Get + Update
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: ctx cancelled → stop refresh loop
    NDB->>NDB: srv.Shutdown(5s timeout)
    NDB-->>K8s: exit 0

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker-initc
    participant Provider as Cloud/IB Provider
    participant KAPI as Kubernetes API

    K8s->>NDB: start container
    NDB->>KAPI: newInClusterClientset()
    KAPI-->>NDB: clientset + config
    NDB->>Provider: getAnnotations(ctx) [initial apply]
    Provider-->>NDB: annotations map
    NDB->>KAPI: Nodes().Get(nodeName)
    KAPI-->>NDB: node object
    NDB->>KAPI: Nodes().Update(node + annotations)
    KAPI-->>NDB: updated node
    Note over NDB: startupProbe gates here
    NDB->>NDB: serveHealth(:8080)
    Note over K8s,NDB: startup probe passes → liveness/readiness active
    loop every refreshInterval (default 5m)
        NDB->>Provider: getAnnotations(ctx) [periodic refresh]
        Provider-->>NDB: annotations map
        NDB->>KAPI: Nodes().Get + Update
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: ctx cancelled → stop refresh loop
    NDB->>NDB: srv.Shutdown(5s timeout)
    NDB-->>K8s: exit 0

_{Reviews (3): Last reviewed commit: "docs(infiniband): drop stale IB image re..." | Re-trigger Greptile}

Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

dmitsh · 2026-06-27T18:01:27Z

@giuliocalzo , there is one problem with this implementation.
In most cases, Topograph indeed uses curlimages/curl image for node-data-broker.
But for the InfiniBand-based deployments, it requires ghcr.io/nvidia/topograph/ib image instead.
This PR (as implemented now) will break IB clusters.

giuliocalzo · 2026-06-27T19:11:55Z

@dmitsh Good catch — you're right that #368 breaks IB clusters as-is if we drop the /ib image override without replacing what it provides.

I think we can fix this by baking IB tooling into the main Alpine image instead of maintaining a separate ghcr.io/nvidia/topograph/ib:

RUN apk add --no-cache rdma-core

On Alpine, rdma-core already includes ibnetdiscover (there is no separate infiniband-diags package like on Ubuntu). That gives node-data-broker both node-data-broker-initc and the binary Topograph execs into for infiniband-k8s, so IB example values could go back to the default ghcr.io/nvidia/topograph image.

Follow-up would be: delete Dockerfile.ib + the docker-ib workflow, update the IB Helm examples/snapshots, and smoke-test ibnetdiscover in a privileged broker pod on IB hardware (same /sys/class mount as today).

Does that approach work for you, or is there something the Ubuntu /ib image provides beyond ibnetdiscover that we'd lose on Alpine (e.g. ibutils)?

copy-pr-bot · 2026-06-27T19:12:12Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Replace the init container plus curl sleeper with a single container running node-data-broker-initc. The binary applies node annotations, serves /healthz, and re-applies annotations on a configurable refreshInterval (default 5m). Add a startup probe so slow providers can finish before liveness kicks in, move initc.extraArgs to top-level extraArgs, and update infiniband docs and helm-unittest coverage. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Install rdma-core in the Alpine runtime image so node-data-broker and infiniband-k8s no longer need ghcr.io/nvidia/topograph/ib. Remove Dockerfile.ib, the docker-ib workflow, and /ib overrides from Helm examples and docs. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

Document that the default topograph image includes ibnetdiscover via rdma-core, fix the node-data-broker init-container wording, and remove the obsolete IB/ubuntu variant note from chart values comments. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>

giuliocalzo · 2026-06-28T13:49:48Z

@dmitsh Follow-up on the IB image concern — implemented in the latest commits.

Docker (d0214af)

Main Dockerfile now installs Alpine rdma-core (ships ibnetdiscover)
Removed Dockerfile.ib and the docker-ib workflow
IB/GCP Helm examples no longer override ghcr.io/nvidia/topograph/ib; they use the default ghcr.io/nvidia/topograph image

Docs (0ea7c82)

docs/providers/infiniband.md documents that the default image includes ibnetdiscover and that IB deployments still need privileged broker + /sys/class mount (see values.k8s.ib-example.yaml)
Stale references to the Ubuntu /ib variant removed from chart values comments and k8s docs

So node-data-broker now runs the same Topograph image for all providers, including infiniband-k8s. Remaining validation on real IB hardware: confirm ibnetdiscover works in a privileged broker pod with the existing host mounts.

Let me know if anything from the old Ubuntu /ib image (e.g. ibutils) is still required on Alpine.

dmitsh · 2026-06-28T18:36:13Z

Hi @giuliocalzo ,
I have two small comments.

Should we rename cmd/node-data-broker-initc to cmd/node-data-broker, since we got rid of the init container?
There might be some concurrency issue, when Topograph is reading topology node annotations before node-data-broker applies them. Maybe we should wait until node-data-broker's pods are ready? Just keep in mind that node-data-broker is optional, so it might be not present.

IMO, having an init container was more clean way to implement. It also allows to use different image in the future, if we need to support new switch vendor.

giuliocalzo · 2026-06-28T19:28:19Z

@dmitsh Thanks for the follow-up.

1. Rename cmd/node-data-broker-initc

Agreed — with the init container gone, the -initc suffix is misleading. I'll rename to cmd/node-data-broker (and the binary to node-data-broker) in a follow-up commit on this PR (even in this one).

2. Startup race (Topograph reading annotations before broker applies them)

Good point. I'm happy to add a wait until node-data-broker pods are Ready before the first topology request, with the caveat that node-data-broker is optional — when the subchart is disabled or not deployed, Topograph should proceed as today without blocking.

3. Init container vs single main container

My preference is the single long-running container model: node-data-broker applies annotations, serves /healthz, and refreshes periodically — closer to a small operator than a one-shot init + placeholder pod. That keeps lifecycle, probes, and retries in one place and removes the curlimages/curl sleeper.

That said, I'm flexible on the shape. The init-container pattern does make it easier to swap images per vendor without touching the main runtime image. If we want to preserve that flexibility long term, we could revisit — but for now the unified image (rdma-core in the main Alpine build) plus optional image override in Helm values still covers IB without a separate /ib publish path.

I'll proceed with the rename and the broker-readiness gate unless you'd prefer to keep the init-container design for this PR.

giuliocalzo requested review from dmitsh and ravisoundar as code owners June 27, 2026 14:32

giuliocalzo mentioned this pull request Jun 27, 2026

feat(node-observer): wait for topograph health in-process #363

Closed

6 tasks

greptile-apps Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread cmd/node-data-broker-initc/main_test.go

Comment thread cmd/node-data-broker-initc/main.go Outdated

giuliocalzo marked this pull request as draft June 27, 2026 19:12

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 12f1918 to dbaa0fc Compare June 27, 2026 19:16

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from b45abfe to 869c9f9 Compare June 27, 2026 19:27

giuliocalzo added 2 commits June 27, 2026 21:28

giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 869c9f9 to 57e4ff7 Compare June 27, 2026 19:29

greptile-apps Bot mentioned this pull request Jun 28, 2026

docs: add CHANGELOG and wire it into agent guidance #369

Open

2 tasks

giuliocalzo marked this pull request as ready for review June 28, 2026 13:47

giuliocalzo added 2 commits June 28, 2026 15:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(node-data-broker): run broker as main container with health probes#368

feat(node-data-broker): run broker as main container with health probes#368
giuliocalzo wants to merge 4 commits into
NVIDIA:mainfrom
giuliocalzo:feat/node-data-broker-main-container

giuliocalzo commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

greptile-apps Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dmitsh commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 27, 2026

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 28, 2026

Uh oh!

dmitsh commented Jun 28, 2026 •

edited

Loading

Uh oh!

giuliocalzo commented Jun 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

giuliocalzo commented Jun 27, 2026

Description

Checklist

Test plan

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

greptile-apps Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

dmitsh commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 27, 2026

Uh oh!

copy-pr-bot Bot commented Jun 27, 2026

Uh oh!

giuliocalzo commented Jun 28, 2026

Uh oh!

dmitsh commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giuliocalzo commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 27, 2026 •

edited

Loading

dmitsh commented Jun 28, 2026 •

edited

Loading

giuliocalzo commented Jun 28, 2026 •

edited

Loading