fix: build NodeInfoMap before ExcludeTaintedNodePods filter#289
Closed
leomao10 wants to merge 1 commit into
Closed
fix: build NodeInfoMap before ExcludeTaintedNodePods filter#289leomao10 wants to merge 1 commit into
leomao10 wants to merge 1 commit into
Conversation
When ExcludeTaintedNodePods is enabled, NodeInfoMap was previously built from the already-filtered pod list, causing NodeEmpty() to always return true for tainted nodes. This resulted in premature soft-path deletion of nodes that still had running pods. Fix: move CreateNodeNameToInfoMap() to before the ExcludeTaintedNodePods filter so it always reflects the true pod state. The filtered pod list is used only for capacity calculations (CPU/memory utilisation). Also adds a regression test verifying that a tainted node with a running non-daemonset pod is not deleted via the soft path.
dtnyn
reviewed
Apr 16, 2026
| // Filter to pods on untainted nodes | ||
| // Build NodeInfoMap BEFORE filtering tainted node pods so it always reflects | ||
| // the true pod state on every node. TryRemoveTaintedNodes uses NodeInfoMap to | ||
| // determine whether a tainted node is empty; if we built it from an already- |
Collaborator
There was a problem hiding this comment.
nit: can we trim this down, the context of the bug is only relevant in the context of this PR being made
| }) | ||
|
|
||
| // Build NodeInfoMap from ALL pods (including the pod on the tainted node). | ||
| // This is what the fixed code does: NodeInfoMap is constructed before the |
Collaborator
There was a problem hiding this comment.
question: this should be a comment on the PR instead of a code comment, after this is merged "fixed code" has no meaning, should this be removed or reworded?
| // Build NodeInfoMap from ALL pods (including the pod on the tainted node). | ||
| // This is what the fixed code does: NodeInfoMap is constructed before the | ||
| // ExcludeTaintedNodePods filter so it accurately reflects tainted-node state. | ||
| nodeGroupsState[nodeGroupOpts.Name].NodeInfoMap = k8s.CreateNodeNameToInfoMap(allPods, allNodes) |
Collaborator
There was a problem hiding this comment.
question: this manually builds NodeInfoMap from all pods, which is the correct (post-fix) behavior
but it means the test doesn't exercise the actual bug path through scaleNodeGroup. test passes on master too
maybe consider adding an integration test that goes through scaleNodeGroup to prove the ordering fix?
Contributor
Author
|
Reopen a new one in #290 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When ExcludeTaintedNodePods is enabled, NodeInfoMap was previously built from the already-filtered pod list, causing NodeEmpty() to always return true for tainted nodes. This resulted in premature soft-path deletion of nodes that still had running pods.
Fix: move CreateNodeNameToInfoMap() to before the ExcludeTaintedNodePods filter so it always reflects the true pod state. The filtered pod list is used only for capacity calculations (CPU/memory utilisation).
Also adds a regression test verifying that a tainted node with a running non-daemonset pod is not deleted via the soft path.
Rovo Dev code review: Rovo Dev couldn't review this pull request
The pull request author does not have access to Rovo Dev.