Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-17396

As a Cluster Administrator, I want to use a single field to pause and resume reconciliation for the entire OpenStack environment and have the Operator's status reflect this state, so that I can control when updates are applied and have clear visibility.

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • openstack-operator
    • None
    • Implement Pause/Resume Reconciliation
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • Proposed
    • Proposed
    • To Do
    • RHOSSTRAT-800 - Provide service Operator reconciliation control
    • Proposed
    • rhos-conplat-core-operators
    • Proposed
    • 100% To Do, 0% In Progress, 0% Done

      User Story: As a Cluster Administrator, I want to use a single field to pause and resume reconciliation for the entire OpenStack environment and have the Operator's status reflect this state, so that I can control when updates are applied and have clear visibility.

      Background: There is concern that OSP services get restarted when operators are updated. Now that we have an initialization resource we could provide more control over how the operator updates proceed. The initialization resource currently updates all operators at once, but we could implement a mechanism to sequence operator updates so that we minimize upgrade impacts ensuring that only 1 OSP service is reconciled at a time.
       
      We already know for example that  https://github.com/rabbitmq/cluster-operator/releases/tag/v2.11.0 will cause the RabbitMQ cluster to immediately restart (we aren’t using 2.11 yet but soon will be). Furthermore may lib-common changes to our core labels/annotations or primitive k8s structures would likewise cause a similar restart concern for OSP services immediately upon operator updates. While these updates are normal and expected is is the fact that we are doing them all simultaneously that is a concern here.

      Goal: Provide an administrator with the ability to pause and resume the reconciliation for the entire OpenStack environment using a single field. This allows updates to be applied in a controlled manner during a maintenance window. This capability temporarily suspends all automated management, including self-healing, and should be used with caution.

      Acceptance Criteria:

      • A boolean field, (e.g. reconciliationPaused), is available within the OpenStack Custom Resource.
      • When reconciliationPaused is set to true, the Operator must cease applying any new changes to all managed OpenStack services.
      • While reconciliation is paused, the Operator must not recreate a managed resource if it is manually deleted.
      • When reconciliationPaused is set to false, the Operator must begin reconciling all services to match the state defined in their CRs.
      • When reconciliation is paused, the status of the OpenStack Operator CR must contain a condition that clearly indicates the paused state.

      Open Questions:

      • What should the specific Type, Reason, and Message be for the status condition that indicates reconciliation is paused?

              Unassigned Unassigned
              lmadsen@redhat.com Leif Madsen
              rhos-conplat-core-operators
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: