Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-64681

Arbiter and CP nodes can force quorum loss during MCO reboot and upgrade events

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 4.20.z
    • Two Node with Arbiter
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 0
    • None
    • None
    • None
    • None
    • None
    • OCPEDGE Sprint 280
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      As a developer of OCPBUGS, I need:

      • Prevent the arbiter and control-plane machine-config pools from restarting nodes at the same time.
      • Avoid quorum loss during MCO reboot and upgrade events.

      The core issue is that the sync logic happens per-pool, but it needs to handle arbiter nodes as a special case.
      https://github.com/openshift/machine-config-operator/blob/747fddc7df5fe50e1f8c568dd9fc5cef47866b5b/pkg/controller/node/node_controller.go#L1076

      Impact Notes

      • This doesn't actually block the upgrade, but it does introduce temporary quorum loss as the node reboots.

      Acceptance Criteria

      • A patch is merged into MCO
      • A test is merged into CI that verifies that MCO guards against this
      • The test is used to verify the bug
      • Documentation is updated with a known issue that explains how to work around this

      Supporting Documents
      MCO MaxUnavailable Docs

      Issue synthesized with help from gemini Engineering Jira Buddy gem

              ehila@redhat.com Egli Hila
              jpoulin Jeremy Poulin
              None
              None
              John George John George
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: