feat(node-data-broker): run broker as main container with health probes#368
feat(node-data-broker): run broker as main container with health probes#368giuliocalzo wants to merge 4 commits into
Conversation
|
🌿 Preview your docs: https://nvidia-preview-pull-request-368.docs.buildwithfern.com/topograph |
Greptile SummaryThis PR replaces the
Confidence Score: 5/5Safe to merge. The refactor is well-contained — the single-container design is simpler than the old init + curl pattern, shutdown is handled correctly via signal context, and the clientset is created once at startup rather than on each refresh. The core logic (Get-then-Update node annotations, refresh loop, graceful HTTP shutdown) is correct and race-free. Periodic refreshes are strictly sequential within the loop. The IB tooling consolidation into the main Alpine image is valid — Alpine's rdma-core package includes ibnetdiscover. Tests cover the new health server, shutdown path, refresh interval, error continuation, and context cancellation. No data-loss or correctness issues were found. No files require special attention. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant K8s as Kubernetes
participant NDB as node-data-broker-initc
participant Provider as Cloud/IB Provider
participant KAPI as Kubernetes API
K8s->>NDB: start container
NDB->>KAPI: newInClusterClientset()
KAPI-->>NDB: clientset + config
NDB->>Provider: getAnnotations(ctx) [initial apply]
Provider-->>NDB: annotations map
NDB->>KAPI: Nodes().Get(nodeName)
KAPI-->>NDB: node object
NDB->>KAPI: Nodes().Update(node + annotations)
KAPI-->>NDB: updated node
Note over NDB: startupProbe gates here
NDB->>NDB: serveHealth(:8080)
Note over K8s,NDB: startup probe passes → liveness/readiness active
loop every refreshInterval (default 5m)
NDB->>Provider: getAnnotations(ctx) [periodic refresh]
Provider-->>NDB: annotations map
NDB->>KAPI: Nodes().Get + Update
end
K8s->>NDB: SIGTERM
NDB->>NDB: ctx cancelled → stop refresh loop
NDB->>NDB: srv.Shutdown(5s timeout)
NDB-->>K8s: exit 0
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant K8s as Kubernetes
participant NDB as node-data-broker-initc
participant Provider as Cloud/IB Provider
participant KAPI as Kubernetes API
K8s->>NDB: start container
NDB->>KAPI: newInClusterClientset()
KAPI-->>NDB: clientset + config
NDB->>Provider: getAnnotations(ctx) [initial apply]
Provider-->>NDB: annotations map
NDB->>KAPI: Nodes().Get(nodeName)
KAPI-->>NDB: node object
NDB->>KAPI: Nodes().Update(node + annotations)
KAPI-->>NDB: updated node
Note over NDB: startupProbe gates here
NDB->>NDB: serveHealth(:8080)
Note over K8s,NDB: startup probe passes → liveness/readiness active
loop every refreshInterval (default 5m)
NDB->>Provider: getAnnotations(ctx) [periodic refresh]
Provider-->>NDB: annotations map
NDB->>KAPI: Nodes().Get + Update
end
K8s->>NDB: SIGTERM
NDB->>NDB: ctx cancelled → stop refresh loop
NDB->>NDB: srv.Shutdown(5s timeout)
NDB-->>K8s: exit 0
Reviews (3): Last reviewed commit: "docs(infiniband): drop stale IB image re..." | Re-trigger Greptile |
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
|
@giuliocalzo , there is one problem with this implementation. |
|
@dmitsh Good catch — you're right that #368 breaks IB clusters as-is if we drop the I think we can fix this by baking IB tooling into the main Alpine image instead of maintaining a separate RUN apk add --no-cache rdma-coreOn Alpine, Follow-up would be: delete Does that approach work for you, or is there something the Ubuntu |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
12f1918 to
dbaa0fc
Compare
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
b45abfe to
869c9f9
Compare
Replace the init container plus curl sleeper with a single container running node-data-broker-initc. The binary applies node annotations, serves /healthz, and re-applies annotations on a configurable refreshInterval (default 5m). Add a startup probe so slow providers can finish before liveness kicks in, move initc.extraArgs to top-level extraArgs, and update infiniband docs and helm-unittest coverage. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Reuse a single in-cluster clientset for startup and periodic annotation refresh instead of constructing one on every apply. Fix a race in refresh-loop tests where a third ticker tick could close the done channel twice; gate shutdown with sync.Once and cancel the loop immediately when the test condition is met. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
869c9f9 to
57e4ff7
Compare
Install rdma-core in the Alpine runtime image so node-data-broker and infiniband-k8s no longer need ghcr.io/nvidia/topograph/ib. Remove Dockerfile.ib, the docker-ib workflow, and /ib overrides from Helm examples and docs. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
Document that the default topograph image includes ibnetdiscover via rdma-core, fix the node-data-broker init-container wording, and remove the obsolete IB/ubuntu variant note from chart values comments. Signed-off-by: Giulio Calzolari <gcalzolari@nvidia.com>
|
@dmitsh Follow-up on the IB image concern — implemented in the latest commits. Docker (
Docs (
So node-data-broker now runs the same Topograph image for all providers, including Let me know if anything from the old Ubuntu |
|
Hi @giuliocalzo ,
IMO, having an init container was more clean way to implement. It also allows to use different image in the future, if we need to support new switch vendor. |
|
@dmitsh Thanks for the follow-up. 1. Rename Agreed — with the init container gone, the 2. Startup race (Topograph reading annotations before broker applies them) Good point. I'm happy to add a wait until node-data-broker pods are Ready before the first topology request, with the caveat that node-data-broker is optional — when the subchart is disabled or not deployed, Topograph should proceed as today without blocking. 3. Init container vs single main container My preference is the single long-running container model: node-data-broker applies annotations, serves That said, I'm flexible on the shape. The init-container pattern does make it easier to swap images per vendor without touching the main runtime image. If we want to preserve that flexibility long term, we could revisit — but for now the unified image ( I'll proceed with the rename and the broker-readiness gate unless you'd prefer to keep the init-container design for this PR. |
Description
Replace the node-data-broker chart's init container plus curl sleeper with a single main container running
node-data-broker-initc, removing the dependency on thecurlimages/curlimage for this subchart.node-data-broker-initcas the DaemonSet's main container. The binary applies node annotations once at startup, then serves/healthzon a configurableport(default 8080) until SIGTERM so the pod stays Running./healthzserves, giving slow providers (e.g. infinibandibnetdiscover) up tofailureThreshold × periodSeconds(default 5m) to finish the initial apply.refreshInterval(default 5m; set to0to disable) so node metadata stays current without pod restarts. Failures on refresh are logged only.initcvalues block, thenode-data-broker.initImagehelper, and thetail -f /dev/nullplaceholder are removed;initc.extraArgsmoves to top-levelextraArgs.docs/providers/infiniband.md) and helm-unittest suites/snapshots updated to match.Complements #363 (node-observer in-process health wait).
Checklist
git commit -s).Test plan
go test ./cmd/node-data-broker-initc/...make chart-test