🌱 Set Progressing to Succeeded on ClusterExtensionRevision only after availability probes pass#2524
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR modifies the ClusterExtensionRevision controller to delay setting the Progressing condition to Succeeded until availability probes pass, rather than immediately when the rollout transition completes. This ensures that the progressDeadlineMinutes feature can correctly detect workloads that are deployed but fail to become available (e.g., due to bad image references).
Changes:
- Modified
Progressingcondition logic to remain atRollingOutduring transition and only set toSucceededafter availability probes confirm readiness - Replaced
sync.Mapprogress deadline tracking with deterministic clock-based requeue logic using injectedclock.Clockinterface - Added E2E test scenario for progress deadline with availability failures
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
internal/operator-controller/controllers/clusterextensionrevision_controller.go |
Refactored progress deadline logic to use injected clock, moved Progressing: Succeeded status to only be set when IsComplete() is true |
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go |
Added clock mocking infrastructure, updated test expectations to reflect new behavior, and used fixed timestamps for deterministic progress deadline testing |
test/e2e/features/install.feature |
Added new scenario testing progress deadline expiration when rollout doesn't become available |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e505739 to
36db108
Compare
36db108 to
c64a338
Compare
…fter availability probes pass Previously, `Progressing` reason was set to `Succeeded` as soon as the rollout transition completed, not taking into account availability probes of the last phase. This meant setting `.spec.progressDeadlineMinutes` could not catch workloads that were applied to the cluster but never became available, due to for example bad image reference. Now `Progressing` stays at `RollingOut` during transition and is only set to `Succeeded` once probes confirm availability, allowing the progress deadline to correctly expire when a workload fails to become available in time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c64a338 to
76b1cbc
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if time.Since(existingRev.CreationTimestamp.Time) > timeout { | ||
| if c.Clock.Since(existingRev.CreationTimestamp.Time) > timeout { | ||
| // progress deadline reached, reset any errors and stop reconciling this revision | ||
| markAsNotProgressing(reconciledRev, ocv1.ReasonProgressDeadlineExceeded, fmt.Sprintf("Revision has not rolled out for %d minutes.", pd)) |
There was a problem hiding this comment.
The error message uses "minutes" which is always plural, but when the progress deadline is 1 minute, it should say "minute" (singular). Consider using a conditional or pluralization helper to make the message grammatically correct for all values. For example: fmt.Sprintf("Revision has not rolled out for %d minute(s).", pd) or a more sophisticated approach that checks if pd == 1.
| markAsNotProgressing(reconciledRev, ocv1.ReasonProgressDeadlineExceeded, fmt.Sprintf("Revision has not rolled out for %d minutes.", pd)) | |
| unit := "minutes" | |
| if pd == 1 { | |
| unit = "minute" | |
| } | |
| markAsNotProgressing(reconciledRev, ocv1.ReasonProgressDeadlineExceeded, fmt.Sprintf("Revision has not rolled out for %d %s.", pd, unit)) |
Description
Previously,
Progressingreason was set toSucceededas soon as the rollouttransition completed, not taking into account availability probes of the last phase.
This meant setting
.spec.progressDeadlineMinutescould not catch workloads that wereapplied to the cluster but never became available, due to for example bad image reference.
Now
Progressingstays atRollingOutduring transition and is only set toSucceededonce probes confirm availability, allowing the progress deadlineto correctly expire when a workload fails to become available in time.
Reviewer Checklist