fix: clear canaryStatus when rollback cancellation completes#340
fix: clear canaryStatus when rollback cancellation completes#340LittleChimera wants to merge 1 commit into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #340 +/- ##
==========================================
+ Coverage 51.38% 51.57% +0.19%
==========================================
Files 66 66
Lines 8559 8560 +1
==========================================
+ Hits 4398 4415 +17
+ Misses 3575 3556 -19
- Partials 586 589 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
When a canary release is cancelled via rollback-directly, doFinalising drains the BatchRelease and restores traffic routing, but canaryStatus is left frozen at the step-machine state captured when the rollback was triggered (e.g. currentStepIndex: 1, currentStepState: StepPaused, podTemplateHash: <failed-revision>). The rollout settles into phase: Healthy with Progressing: False/Completed, yet the stale canaryStatus makes it look stuck at "step N paused", and kubectl kruise rollout approve is a no-op because the controller no longer dispatches reconcileRolloutProgressing while phase is Healthy. Mirror what handleContinuousRelease already does: clear the sub-status when cancellation cleanup is complete, before flipping the Progressing condition to Completed. Signed-off-by: Luka Skugor <luka.skugor@chimeramail.com>
91562eb to
39dab28
Compare
Ⅰ. Describe what this PR does
When a canary release is cancelled via rollback-directly,
doFinalisingdrains theBatchReleaseand restores traffic routing correctly, butcanaryStatusis left frozen at the step-machine state captured when the rollback was triggered (e.g.currentStepIndex: 1, currentStepState: StepPaused, podTemplateHash: <failed-revision>). The rollout settles intophase: HealthywithProgressing: False / Completed, yet the stalecanaryStatusmakes it look stuck at "step N paused", andkubectl kruise rollout approveis a no-op because the controller no longer dispatchesreconcileRolloutProgressingwhile phase is Healthy.This PR mirrors what
handleContinuousReleasealready does: clear the sub-status when the cancellation cleanup is complete, before flipping theProgressingcondition toCompleted.Ⅱ. Does this pull request fix one issue?
Fixes #339
Ⅲ. Describe how to verify it
A regression test was added:
TestReconcileRolloutProgressing/ReconcileRolloutProgressing_cancelling_->_completed_clears_canaryStatus.It sets up a rollout mid-cancellation with stale canary status (
StepPausedat step 1,podTemplateHashpointing at the failed revision,canaryReplicas: 1), drivesreconcileRolloutProgressingthrough the last finalising step, and asserts that afterdone=truethe rollout has:Progressing: False / CompletedSucceeded: FalsecanaryStatus: nilWithout the fix the test fails because the stale
canaryStatusis preserved.Ⅳ. Special notes for reviews
The clear is intentionally scoped to the rollback/cancellation path (
ProgressingReasonCancellingbranch). The success path (ProgressingReasonFinalising) is left untouched because there thecanaryStatusreflects the version that was actually deployed and is a useful observability artefact after a successful rollout.