Nvswitch telemetry gaps#2945
Draft
mkoci wants to merge 27 commits into
Draft
Conversation
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…ted mappings Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…VUE REST Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
… sources Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…etheus sink Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…el cardinality fixes Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…ation changes Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…h_serial label
Compose OTLP metric name as {prefix}_{name}_{metric_type}_{unit} to match the
Prometheus sink, and promote switch_serial/switch_id onto datapoint attributes so
Grafana switch dashboards resolve identically across export paths.
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
The NMX-T collector built its reqwest client without danger_accept_invalid_certs, unlike the sibling NVUE REST collector. On minimal runtime images this fails at client build time (native-root-CA load) and the switch serves a self-signed cert anyway, so NMX-T never collected. Match the NVUE REST self-signed handling. Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
tonic 0.14 auto-injects a strict system-root TLS verifier for https:// URIs (Endpoint::from) and layers its own TlsConnector over any custom connector (channel/service/connector.rs). That silently negated the hand-rolled hyper-rustls skip-verify connector, so tonic strictly verified and rejected NVOS's self-signed gNMI cert -- the channel died right after the server Certificate message (opaque 'transport error', no HTTP/2 frames). Use Endpoint::tls_config_with_verifier(ClientTlsConfig::new(), <verifier>) so the AcceptAnyCertVerifier is applied in tonic's own TLS layer; drop the hand-rolled connector. tls.rs now exposes accept_any_cert_verifier() instead of self_signed_tls_config(). Validated on gb-nvl-124-switch06: gNMI SAMPLE+ON_CHANGE streams connect and 86 carbide_hardware_health_nvue_gnmi_* metric families flow via the OtlpSink into VictoriaMetrics. Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
…nfig Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
… config for dev Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
The generated matrix/validation docs were already dropped in 3b0a075 (chore(health): remove temp docs from repo), but the one-shot generator script was missed. It has no callers, its required inputs are not in the repo, and its outputs are no longer tracked, so it cannot run from a clean checkout. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Signed-off-by: mkoci <26286151+mkoci@users.noreply.github.com>
b4b9e2a to
961fd8c
Compare
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR covers the NV Switch port from #2283. It closes the GB200 NVSwitch telemetry gaps for NVUE gNMI streaming, NVUE REST, and NMX-T.
Type of Change
Related Issues
#2283
Testing
Additional Notes
Gated previously insecure TLS verification behind
dangerously_skip_tls_verificationin config surfaceNVOS gNMI endpoints must set
dangerously_skip_tls_verification = trueto connect.