-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
Implement Pause/Resume Reconciliation
-
False
-
-
False
-
Not Selected
-
Proposed
-
Proposed
-
To Do
-
RHOSSTRAT-800 - Provide service Operator reconciliation control
-
Proposed
-
rhos-conplat-core-operators
-
Proposed
-
100% To Do, 0% In Progress, 0% Done
-
-
-
Moderate
User Story: As a Cluster Administrator, I want to use a single field to pause and resume reconciliation for the entire OpenStack environment and have the Operator's status reflect this state, so that I can control when updates are applied and have clear visibility.
Background: There is concern that OSP services get restarted when operators are updated. Now that we have an initialization resource we could provide more control over how the operator updates proceed. The initialization resource currently updates all operators at once, but we could implement a mechanism to sequence operator updates so that we minimize upgrade impacts ensuring that only 1 OSP service is reconciled at a time.
We already know for example that https://github.com/rabbitmq/cluster-operator/releases/tag/v2.11.0 will cause the RabbitMQ cluster to immediately restart (we aren’t using 2.11 yet but soon will be). Furthermore may lib-common changes to our core labels/annotations or primitive k8s structures would likewise cause a similar restart concern for OSP services immediately upon operator updates. While these updates are normal and expected is is the fact that we are doing them all simultaneously that is a concern here.
Goal: Provide an administrator with the ability to pause and resume the reconciliation for the entire OpenStack environment using a single field. This allows updates to be applied in a controlled manner during a maintenance window. This capability temporarily suspends all automated management, including self-healing, and should be used with caution.
Acceptance Criteria:
- A boolean field, (e.g. reconciliationPaused), is available within the OpenStack Custom Resource.
- When reconciliationPaused is set to true, the Operator must cease applying any new changes to all managed OpenStack services.
- While reconciliation is paused, the Operator must not recreate a managed resource if it is manually deleted.
- When reconciliationPaused is set to false, the Operator must begin reconciling all services to match the state defined in their CRs.
- When reconciliation is paused, the status of the OpenStack Operator CR must contain a condition that clearly indicates the paused state.
Open Questions:
- What should the specific Type, Reason, and Message be for the status condition that indicates reconciliation is paused?
- duplicates
-
OSPRH-16111 operator updates: consider updating operators in sequence to minimize downtime
-
- Closed
-