Skip to content

critest: test metric descriptors#2017

Open
dgrisonnet wants to merge 2 commits into
kubernetes-sigs:masterfrom
dgrisonnet:test-metrics-desc
Open

critest: test metric descriptors#2017
dgrisonnet wants to merge 2 commits into
kubernetes-sigs:masterfrom
dgrisonnet:test-metrics-desc

Conversation

@dgrisonnet

Copy link
Copy Markdown
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is built on top of #1931 with an updated list of expected metrics descriptors for Kubernetes 1.37

Which issue(s) this PR fixes:

Special notes for your reviewer:

I included PSI metrics since the feature is in Beta, enabled by default and I think it is better to validate here before the feature becomes stable.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. labels Mar 18, 2026
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 18, 2026
@dgrisonnet dgrisonnet force-pushed the test-metrics-desc branch 7 times, most recently from 22a6b74 to 10c0ca7 Compare March 18, 2026 18:53
Comment thread pkg/validate/pod.go Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 21, 2026
@dgrisonnet dgrisonnet force-pushed the test-metrics-desc branch from 10c0ca7 to 1ae5f93 Compare May 27, 2026 13:35
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dgrisonnet
Once this PR has been reviewed and has the lgtm label, please assign tallclair for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@dgrisonnet dgrisonnet force-pushed the test-metrics-desc branch from 1ae5f93 to 2568261 Compare May 27, 2026 13:38
Signed-off-by: Peter Hunt <pehunt@redhat.com>

critest: drop cpuLoad metrics from test

some context: kubernetes/kubernetes#134981

Signed-off-by: Peter Hunt <pehunt@redhat.com>

critest/metrics: generate some disk usage to guarantee io metrics are present

Signed-off-by: Peter Hunt <pehunt@redhat.com>

crio: update config to enable metrics

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@dgrisonnet dgrisonnet force-pushed the test-metrics-desc branch from 2568261 to e740d73 Compare May 27, 2026 13:48
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 27, 2026
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
@dgrisonnet dgrisonnet force-pushed the test-metrics-desc branch from e740d73 to 7dde0e4 Compare May 28, 2026 13:25

@saschagrunert saschagrunert left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding metrics validation. A few issues to address.

Comment thread pkg/validate/pod.go
)

BeforeEach(func() {
_, err := rc.ListMetricDescriptors(context.TODO())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other BeforeEach/AfterEach/It closures in this file (and across the validate package) use func(ctx SpecContext) and pass ctx to CRI calls. This integrates with Ginkgo's per-node timeout and cancellation.

The context.TODO() usages here (lines 223, 237, 239, 255, 260, 263, 266) and the missing ctx SpecContext parameter on lines 222, 234, 243, 252 should all be updated to match the existing pattern. The helper functions listMetricDescriptors (line 370) and listPodSandboxMetrics (line 436) would also need a ctx parameter added.

Compare with e.g. the AfterEach at line 126 and It blocks at lines 135, 146, 155, 168 in this same file.

Comment thread pkg/validate/pod.go
if s.Code() == codes.Unimplemented {
Skip("CRI Metrics endpoints not supported by this runtime version")
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling flow is more convoluted than it needs to be. The if s.Code() == codes.Unimplemented on line 228 is redundant because the Expect on line 226 already fails for any other code.

Suggested simplification:

BeforeEach(func(ctx SpecContext) {
	_, err := rc.ListMetricDescriptors(ctx)
	if err == nil {
		return
	}
	s, ok := grpcstatus.FromError(err)
	if ok && s.Code() == codes.Unimplemented {
		Skip("CRI Metrics endpoints not supported by this runtime version")
	}
	Expect(err).NotTo(HaveOccurred(), "failed to list MetricDescriptors")
})

Comment thread pkg/validate/pod.go
startContainer(context.TODO(), rc, containerID)

_, _, err := rc.ExecSync(
context.TODO(), containerID, []string{"/bin/sh", "-c", "for i in $(seq 1 10); do echo hi >> /var/lib/mydisktest/inode_test_file_$i; done; sync"},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing in the container config creates /var/lib/mydisktest/. If that directory does not exist in the container image, this echo >> will fail. Consider prepending mkdir -p /var/lib/mydisktest && to the command.

Comment thread pkg/validate/pod.go
continue
}

if len(values) != len(desc.GetLabelKeys()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This only checks that the number of label values matches the number of label keys. It does not verify that the actual keys correspond. A metric could return the right count of values for completely different labels and this check would still pass.

run: |
sudo mkdir -p /etc/crio/crio.conf.d
printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee /etc/crio/crio.conf.d/01-log-level.conf
printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee -a /etc/crio/crio.conf.d/01-base.conf

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Since the file does not exist before this step, the first write should use tee (overwrite) rather than tee -a (append). Reserve -a for the second write on line 66. This makes the intent clearer and avoids duplicate config sections if the step ever re-runs.

Suggested change
printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee -a /etc/crio/crio.conf.d/01-base.conf
printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee /etc/crio/crio.conf.d/01-base.conf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants