Skip to content

feat(node-data-broker): run broker as main container with health probes#368

Open
giuliocalzo wants to merge 4 commits into
NVIDIA:mainfrom
giuliocalzo:feat/node-data-broker-main-container
Open

feat(node-data-broker): run broker as main container with health probes#368
giuliocalzo wants to merge 4 commits into
NVIDIA:mainfrom
giuliocalzo:feat/node-data-broker-main-container

Conversation

@giuliocalzo

Copy link
Copy Markdown
Collaborator

Description

Replace the node-data-broker chart's init container plus curl sleeper with a single main container running node-data-broker-initc, removing the dependency on the curlimages/curl image for this subchart.

  • node-data-broker runs node-data-broker-initc as the DaemonSet's main container. The binary applies node annotations once at startup, then serves /healthz on a configurable port (default 8080) until SIGTERM so the pod stays Running.
  • A startup probe gates liveness/readiness until /healthz serves, giving slow providers (e.g. infiniband ibnetdiscover) up to failureThreshold × periodSeconds (default 5m) to finish the initial apply.
  • Annotations are re-applied periodically via refreshInterval (default 5m; set to 0 to disable) so node metadata stays current without pod restarts. Failures on refresh are logged only.
  • The separate init container, the initc values block, the node-data-broker.initImage helper, and the tail -f /dev/null placeholder are removed; initc.extraArgs moves to top-level extraArgs.
  • Docs (docs/providers/infiniband.md) and helm-unittest suites/snapshots updated to match.

Complements #363 (node-observer in-process health wait).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • All commits are signed off per DCO (git commit -s).

Test plan

  • go test ./cmd/node-data-broker-initc/...
  • make chart-test

@github-actions

Copy link
Copy Markdown

@greptile-apps

greptile-apps Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR replaces the node-data-broker chart's two-container pattern (init container + tail -f /dev/null curl placeholder) with a single long-running main container running node-data-broker-initc. The binary applies node annotations at startup, then serves /healthz until SIGTERM, allowing Kubernetes startup/liveness/readiness probes to gate the pod lifecycle correctly.

  • The curlimages/curl placeholder container and the separate IB image variant (Dockerfile.ib, docker-ib.yml) are removed; InfiniBand tooling (ibnetdiscover) is folded into the main Alpine image via apk add rdma-core, and the nodeBroker struct holds a single kubernetes.Clientset created once at startup (not re-created on each refresh).
  • A configurable refreshInterval (default 5 minutes) re-applies node annotations periodically; startupProbe gates liveness/readiness until the initial apply completes, giving slow providers like ibnetdiscover up to failureThreshold × periodSeconds before being considered failed.
  • Helm values, subchart tests, snapshots, example values files, and documentation are all updated to match the new single-container design.

Confidence Score: 5/5

Safe to merge. The refactor is well-contained — the single-container design is simpler than the old init + curl pattern, shutdown is handled correctly via signal context, and the clientset is created once at startup rather than on each refresh.

The core logic (Get-then-Update node annotations, refresh loop, graceful HTTP shutdown) is correct and race-free. Periodic refreshes are strictly sequential within the loop. The IB tooling consolidation into the main Alpine image is valid — Alpine's rdma-core package includes ibnetdiscover. Tests cover the new health server, shutdown path, refresh interval, error continuation, and context cancellation. No data-loss or correctness issues were found.

No files require special attention.

Important Files Changed

Filename Overview
cmd/node-data-broker-initc/main.go Core logic refactor: adds nodeBroker struct, periodic refresh loop, HTTP health server, and graceful shutdown via signal context. Clientset is created once at startup. Logic is clean and safe.
cmd/node-data-broker-initc/main_test.go New tests for health handler, graceful shutdown, refresh loop interval, error continuation, and context cancellation. Uses sync.Once to prevent double-close; coverage is solid.
charts/topograph/charts/node-data-broker/templates/daemonset.yaml Init container removed; main container now runs node-data-broker-initc with startup/liveness/readiness probes. Port and refreshInterval args wired from values.
charts/topograph/charts/node-data-broker/values.yaml initc block removed; port, refreshInterval, extraArgs, and startupProbe config added. Image now defaults to ghcr.io/nvidia/topograph matching the main chart.
Dockerfile Adds apk add --no-cache rdma-core to the Alpine final stage; rdma-core on Alpine 3.x includes ibnetdiscover, merging the IB variant into the main image.
.github/workflows/docker-ib.yml Deleted: IB-specific Docker workflow is no longer needed since rdma-core tooling is folded into the main image build.
charts/topograph/tests/subcharts_test.yaml Helm unit tests updated: initContainers path replaced with containers[0], new tests for refreshInterval arg and startup probe values.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker-initc
    participant Provider as Cloud/IB Provider
    participant KAPI as Kubernetes API

    K8s->>NDB: start container
    NDB->>KAPI: newInClusterClientset()
    KAPI-->>NDB: clientset + config
    NDB->>Provider: getAnnotations(ctx) [initial apply]
    Provider-->>NDB: annotations map
    NDB->>KAPI: Nodes().Get(nodeName)
    KAPI-->>NDB: node object
    NDB->>KAPI: Nodes().Update(node + annotations)
    KAPI-->>NDB: updated node
    Note over NDB: startupProbe gates here
    NDB->>NDB: serveHealth(:8080)
    Note over K8s,NDB: startup probe passes → liveness/readiness active
    loop every refreshInterval (default 5m)
        NDB->>Provider: getAnnotations(ctx) [periodic refresh]
        Provider-->>NDB: annotations map
        NDB->>KAPI: Nodes().Get + Update
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: ctx cancelled → stop refresh loop
    NDB->>NDB: srv.Shutdown(5s timeout)
    NDB-->>K8s: exit 0
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant K8s as Kubernetes
    participant NDB as node-data-broker-initc
    participant Provider as Cloud/IB Provider
    participant KAPI as Kubernetes API

    K8s->>NDB: start container
    NDB->>KAPI: newInClusterClientset()
    KAPI-->>NDB: clientset + config
    NDB->>Provider: getAnnotations(ctx) [initial apply]
    Provider-->>NDB: annotations map
    NDB->>KAPI: Nodes().Get(nodeName)
    KAPI-->>NDB: node object
    NDB->>KAPI: Nodes().Update(node + annotations)
    KAPI-->>NDB: updated node
    Note over NDB: startupProbe gates here
    NDB->>NDB: serveHealth(:8080)
    Note over K8s,NDB: startup probe passes → liveness/readiness active
    loop every refreshInterval (default 5m)
        NDB->>Provider: getAnnotations(ctx) [periodic refresh]
        Provider-->>NDB: annotations map
        NDB->>KAPI: Nodes().Get + Update
    end
    K8s->>NDB: SIGTERM
    NDB->>NDB: ctx cancelled → stop refresh loop
    NDB->>NDB: srv.Shutdown(5s timeout)
    NDB-->>K8s: exit 0
Loading

Reviews (3): Last reviewed commit: "docs(infiniband): drop stale IB image re..." | Re-trigger Greptile

Comment thread cmd/node-data-broker-initc/main_test.go
Comment thread cmd/node-data-broker-initc/main.go Outdated
giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@dmitsh

dmitsh commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

@giuliocalzo , there is one problem with this implementation.
In most cases, Topograph indeed uses curlimages/curl image for node-data-broker.
But for the InfiniBand-based deployments, it requires ghcr.io/nvidia/topograph/ib image instead.
This PR (as implemented now) will break IB clusters.

@giuliocalzo

Copy link
Copy Markdown
Collaborator Author

@dmitsh Good catch — you're right that #368 breaks IB clusters as-is if we drop the /ib image override without replacing what it provides.

I think we can fix this by baking IB tooling into the main Alpine image instead of maintaining a separate ghcr.io/nvidia/topograph/ib:

RUN apk add --no-cache rdma-core

On Alpine, rdma-core already includes ibnetdiscover (there is no separate infiniband-diags package like on Ubuntu). That gives node-data-broker both node-data-broker-initc and the binary Topograph execs into for infiniband-k8s, so IB example values could go back to the default ghcr.io/nvidia/topograph image.

Follow-up would be: delete Dockerfile.ib + the docker-ib workflow, update the IB Helm examples/snapshots, and smoke-test ibnetdiscover in a privileged broker pod on IB hardware (same /sys/class mount as today).

Does that approach work for you, or is there something the Ubuntu /ib image provides beyond ibnetdiscover that we'd lose on Alpine (e.g. ibutils)?

@giuliocalzo giuliocalzo marked this pull request as draft June 27, 2026 19:12
@copy-pr-bot

copy-pr-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 12f1918 to dbaa0fc Compare June 27, 2026 19:16
giuliocalzo added a commit to giuliocalzo/topograph that referenced this pull request Jun 27, 2026
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from b45abfe to 869c9f9 Compare June 27, 2026 19:27
Replace the init container plus curl sleeper with a single container running
node-data-broker-initc. The binary applies node annotations, serves /healthz,
and re-applies annotations on a configurable refreshInterval (default 5m).
Add a startup probe so slow providers can finish before liveness kicks in,
move initc.extraArgs to top-level extraArgs, and update infiniband docs and
helm-unittest coverage.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Reuse a single in-cluster clientset for startup and periodic annotation
refresh instead of constructing one on every apply.

Fix a race in refresh-loop tests where a third ticker tick could close the
done channel twice; gate shutdown with sync.Once and cancel the loop
immediately when the test condition is met.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo giuliocalzo force-pushed the feat/node-data-broker-main-container branch from 869c9f9 to 57e4ff7 Compare June 27, 2026 19:29
@giuliocalzo giuliocalzo marked this pull request as ready for review June 28, 2026 13:47
Install rdma-core in the Alpine runtime image so node-data-broker and
infiniband-k8s no longer need ghcr.io/nvidia/topograph/ib. Remove
Dockerfile.ib, the docker-ib workflow, and /ib overrides from Helm
examples and docs.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Document that the default topograph image includes ibnetdiscover via
rdma-core, fix the node-data-broker init-container wording, and remove
the obsolete IB/ubuntu variant note from chart values comments.

Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
@giuliocalzo

Copy link
Copy Markdown
Collaborator Author

@dmitsh Follow-up on the IB image concern — implemented in the latest commits.

Docker (d0214af)

  • Main Dockerfile now installs Alpine rdma-core (ships ibnetdiscover)
  • Removed Dockerfile.ib and the docker-ib workflow
  • IB/GCP Helm examples no longer override ghcr.io/nvidia/topograph/ib; they use the default ghcr.io/nvidia/topograph image

Docs (0ea7c82)

  • docs/providers/infiniband.md documents that the default image includes ibnetdiscover and that IB deployments still need privileged broker + /sys/class mount (see values.k8s.ib-example.yaml)
  • Stale references to the Ubuntu /ib variant removed from chart values comments and k8s docs

So node-data-broker now runs the same Topograph image for all providers, including infiniband-k8s. Remaining validation on real IB hardware: confirm ibnetdiscover works in a privileged broker pod with the existing host mounts.

Let me know if anything from the old Ubuntu /ib image (e.g. ibutils) is still required on Alpine.

@dmitsh

dmitsh commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Hi @giuliocalzo ,
I have two small comments.

  1. Should we rename cmd/node-data-broker-initc to cmd/node-data-broker, since we got rid of the init container?
  2. There might be some concurrency issue, when Topograph is reading topology node annotations before node-data-broker applies them. Maybe we should wait until node-data-broker's pods are ready? Just keep in mind that node-data-broker is optional, so it might be not present.

IMO, having an init container was more clean way to implement. It also allows to use different image in the future, if we need to support new switch vendor.

@giuliocalzo

giuliocalzo commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator Author

@dmitsh Thanks for the follow-up.

1. Rename cmd/node-data-broker-initc

Agreed — with the init container gone, the -initc suffix is misleading. I'll rename to cmd/node-data-broker (and the binary to node-data-broker) in a follow-up commit on this PR (even in this one).

2. Startup race (Topograph reading annotations before broker applies them)

Good point. I'm happy to add a wait until node-data-broker pods are Ready before the first topology request, with the caveat that node-data-broker is optional — when the subchart is disabled or not deployed, Topograph should proceed as today without blocking.

3. Init container vs single main container

My preference is the single long-running container model: node-data-broker applies annotations, serves /healthz, and refreshes periodically — closer to a small operator than a one-shot init + placeholder pod. That keeps lifecycle, probes, and retries in one place and removes the curlimages/curl sleeper.

That said, I'm flexible on the shape. The init-container pattern does make it easier to swap images per vendor without touching the main runtime image. If we want to preserve that flexibility long term, we could revisit — but for now the unified image (rdma-core in the main Alpine build) plus optional image override in Helm values still covers IB without a separate /ib publish path.

I'll proceed with the rename and the broker-readiness gate unless you'd prefer to keep the init-container design for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants