FIX: Prevent operator crash when topology ConfigMap exceeds 1MB limit#2048
FIX: Prevent operator crash when topology ConfigMap exceeds 1MB limit#2048Boomatang wants to merge 1 commit into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthrough
ChangesTopologyReconciler: size cap, OTel tracing, and interface widening
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
internal/controller/topology_reconciler.go (1)
111-111: 💤 Low valueUse direct string comparison instead of
strings.Compare.Go's
strings.Comparedocumentation recommends: "It is usually clearer and always faster to use the built-in string comparison operators". Direct comparisond != cm.Data["topology"]is idiomatic here.♻️ Suggested change
- if d, found := cmTopology.Data["topology"]; !found || strings.Compare(d, cm.Data["topology"]) != 0 { + if d, found := cmTopology.Data["topology"]; !found || d != cm.Data["topology"] {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/topology_reconciler.go` at line 111, In the if condition in topology_reconciler.go, replace the strings.Compare function call with a direct string comparison operator. Instead of using strings.Compare(d, cm.Data["topology"]) != 0, use d != cm.Data["topology"] directly. This is more idiomatic Go code and performs better, as recommended by Go's standard library documentation for string comparisons.internal/controller/topology_reconciler_test.go (2)
159-180: 💤 Low valueConsider simplifying return signature since error is always nil.
The
errorreturn value is never set to non-nil; all paths returnnil. The(string, bool)signature would be clearer and remove redundantassert.NilError(t, err)calls from test code.♻️ Suggested change
-func unstructuredNestedString(obj map[string]any, fields ...string) (string, bool, error) { +func unstructuredNestedString(obj map[string]any, fields ...string) (string, bool) { current := obj for i, field := range fields { if i == len(fields)-1 { val, ok := current[field] if !ok { - return "", false, nil + return "", false } s, ok := val.(string) - return s, ok, nil + return s, ok } next, ok := current[field] if !ok { - return "", false, nil + return "", false } current, ok = next.(map[string]any) if !ok { - return "", false, nil + return "", false } } - return "", false, nil + return "", false }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/topology_reconciler_test.go` around lines 159 - 180, The unstructuredNestedString function has an error return value that is never set to non-nil, making the return signature unnecessarily complex. Simplify the function signature by removing the error return type, changing from (string, bool, error) to (string, bool). Then update all return statements in the function to return only two values instead of three, removing the trailing nil error value from each return statement (including the final return statement at the end of the function).
22-27: ⚡ Quick winConsider adding a test for the oversized topology reconciliation path.
This test validates the placeholder constant, but there's no test that exercises the reconciler's behaviour when
topology.ToDot()exceedsmaxTopologyBytes. Such a test would confirm the placeholder substitution logic works end-to-end.Creating a topology large enough to exceed 900KB may be impractical in unit tests, but you could consider extracting the size-check logic into a testable helper or using a test-only lower threshold.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/controller/topology_reconciler_test.go` around lines 22 - 27, The TestTopologyReconciler_OversizedPlaceholder test validates the placeholder constant itself but does not exercise the actual reconciler behavior when topology.ToDot() output exceeds maxTopologyBytes. To fix this, either extract the size-checking logic from the reconciler into a separate testable helper function that can be called with different thresholds, or introduce a test-only reduced maxTopologyBytes threshold that allows creating a realistically large topology to trigger the placeholder substitution logic end-to-end. This ensures the placeholder substitution path in the reconciler works correctly when actual topology data exceeds the limit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/controller/topology_reconciler_test.go`:
- Around line 159-180: The unstructuredNestedString function has an error return
value that is never set to non-nil, making the return signature unnecessarily
complex. Simplify the function signature by removing the error return type,
changing from (string, bool, error) to (string, bool). Then update all return
statements in the function to return only two values instead of three, removing
the trailing nil error value from each return statement (including the final
return statement at the end of the function).
- Around line 22-27: The TestTopologyReconciler_OversizedPlaceholder test
validates the placeholder constant itself but does not exercise the actual
reconciler behavior when topology.ToDot() output exceeds maxTopologyBytes. To
fix this, either extract the size-checking logic from the reconciler into a
separate testable helper function that can be called with different thresholds,
or introduce a test-only reduced maxTopologyBytes threshold that allows creating
a realistically large topology to trigger the placeholder substitution logic
end-to-end. This ensures the placeholder substitution path in the reconciler
works correctly when actual topology data exceeds the limit.
In `@internal/controller/topology_reconciler.go`:
- Line 111: In the if condition in topology_reconciler.go, replace the
strings.Compare function call with a direct string comparison operator. Instead
of using strings.Compare(d, cm.Data["topology"]) != 0, use d !=
cm.Data["topology"] directly. This is more idiomatic Go code and performs
better, as recommended by Go's standard library documentation for string
comparisons.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d02da6c1-ebbe-4905-8855-9ebdc8ee4386
📒 Files selected for processing (2)
internal/controller/topology_reconciler.gointernal/controller/topology_reconciler_test.go
|
@R-Lawton I want to give you a head up on this. It will have an effect on the console-plugin if the topology becomes to large. I am not sure how the console-plugin would handle the placeholder topology that is added. |
Topology workaround added to allow the creation of large topologies. Signed-off-by: Jim Fitzpatrick <jfitzpat@redhat.com>
4055738 to
f7486c2
Compare
Problem
The kuadrant-operator crashes when the topology ConfigMap exceeds the Kubernetes 1MB ConfigMap size limit. This happens at scale when approximately 1,500 or more resource sets (HTTPRoute + AuthPolicy + RateLimitPolicy) are present in the cluster. The topology DOT graph grows linearly (~785 bytes per route), and once it exceeds 1MB the API server rejects the write, causing the reconciler to enter a permanent error loop. The operator becomes unable to process any further events, effectively a denial of service.
Reproduction
Deploy kuadrant from the operator repo:
Increase operator resources (speeds up reproduction, does not affect outcome):
Create the Kuadrant CR and a test namespace:
Create ~1500 resource sets (HTTPRoute + AuthPolicy + RateLimitPolicy each):
Watch the operator logs:
Without this fix the operator will enter a permanent error loop with:
Solution
This PR adds a size guard to the
TopologyReconciler. Before writing the topology ConfigMap, the reconciler checks the size of the serialized DOT graph against a 900KB threshold (safely under the 1MB Kubernetes limit). When the topology exceeds this threshold, a small placeholder DOT graph is written instead, preventing the API server rejection and allowing the operator to continue reconciling other resources.Changes
maxTopologyBytesconstant (900KB) and anoversizedPlaceholderDOT stringClientfield type from*dynamic.DynamicClienttodynamic.Interfaceto support test fakesLimitations
This is a temporary workaround. The placeholder means the console-plugin will not display the full topology graph at scale. A proper solution will require a different storage or serialization strategy, coordinated with the console-plugin.
Summary by CodeRabbit
Release Notes
Bug Fixes
New Features
Tests