critest: test metric descriptors#2017
Conversation
22a6b74 to
10c0ca7
Compare
10c0ca7 to
1ae5f93
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dgrisonnet The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
1ae5f93 to
2568261
Compare
Signed-off-by: Peter Hunt <pehunt@redhat.com> critest: drop cpuLoad metrics from test some context: kubernetes/kubernetes#134981 Signed-off-by: Peter Hunt <pehunt@redhat.com> critest/metrics: generate some disk usage to guarantee io metrics are present Signed-off-by: Peter Hunt <pehunt@redhat.com> crio: update config to enable metrics Signed-off-by: Peter Hunt <pehunt@redhat.com>
2568261 to
e740d73
Compare
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
e740d73 to
7dde0e4
Compare
saschagrunert
left a comment
There was a problem hiding this comment.
Thanks for adding metrics validation. A few issues to address.
| ) | ||
|
|
||
| BeforeEach(func() { | ||
| _, err := rc.ListMetricDescriptors(context.TODO()) |
There was a problem hiding this comment.
All other BeforeEach/AfterEach/It closures in this file (and across the validate package) use func(ctx SpecContext) and pass ctx to CRI calls. This integrates with Ginkgo's per-node timeout and cancellation.
The context.TODO() usages here (lines 223, 237, 239, 255, 260, 263, 266) and the missing ctx SpecContext parameter on lines 222, 234, 243, 252 should all be updated to match the existing pattern. The helper functions listMetricDescriptors (line 370) and listPodSandboxMetrics (line 436) would also need a ctx parameter added.
Compare with e.g. the AfterEach at line 126 and It blocks at lines 135, 146, 155, 168 in this same file.
| if s.Code() == codes.Unimplemented { | ||
| Skip("CRI Metrics endpoints not supported by this runtime version") | ||
| } | ||
| } |
There was a problem hiding this comment.
The error handling flow is more convoluted than it needs to be. The if s.Code() == codes.Unimplemented on line 228 is redundant because the Expect on line 226 already fails for any other code.
Suggested simplification:
BeforeEach(func(ctx SpecContext) {
_, err := rc.ListMetricDescriptors(ctx)
if err == nil {
return
}
s, ok := grpcstatus.FromError(err)
if ok && s.Code() == codes.Unimplemented {
Skip("CRI Metrics endpoints not supported by this runtime version")
}
Expect(err).NotTo(HaveOccurred(), "failed to list MetricDescriptors")
})| startContainer(context.TODO(), rc, containerID) | ||
|
|
||
| _, _, err := rc.ExecSync( | ||
| context.TODO(), containerID, []string{"/bin/sh", "-c", "for i in $(seq 1 10); do echo hi >> /var/lib/mydisktest/inode_test_file_$i; done; sync"}, |
There was a problem hiding this comment.
Nothing in the container config creates /var/lib/mydisktest/. If that directory does not exist in the container image, this echo >> will fail. Consider prepending mkdir -p /var/lib/mydisktest && to the command.
| continue | ||
| } | ||
|
|
||
| if len(values) != len(desc.GetLabelKeys()) { |
There was a problem hiding this comment.
nit: This only checks that the number of label values matches the number of label keys. It does not verify that the actual keys correspond. A metric could return the right count of values for completely different labels and this check would still pass.
| run: | | ||
| sudo mkdir -p /etc/crio/crio.conf.d | ||
| printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee /etc/crio/crio.conf.d/01-log-level.conf | ||
| printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee -a /etc/crio/crio.conf.d/01-base.conf |
There was a problem hiding this comment.
nit: Since the file does not exist before this step, the first write should use tee (overwrite) rather than tee -a (append). Reserve -a for the second write on line 66. This makes the intent clearer and avoids duplicate config sections if the step ever re-runs.
| printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee -a /etc/crio/crio.conf.d/01-base.conf | |
| printf '[crio.runtime]\nlog_level = "debug"\n[crio.image]\nshort_name_mode = "disabled"\n' | sudo tee /etc/crio/crio.conf.d/01-base.conf |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR is built on top of #1931 with an updated list of expected metrics descriptors for Kubernetes 1.37
Which issue(s) this PR fixes:
Special notes for your reviewer:
I included PSI metrics since the feature is in Beta, enabled by default and I think it is better to validate here before the feature becomes stable.
Does this PR introduce a user-facing change?