Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58289

Cluster and bootstrap PreprovisioningImage controllers can fight

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      We run the baremetal-operator on both the bootstrap (to provision the control plane nodes) and in the cluster (to provision the worker nodes). Currently we use leader election on both instances to ensure that the BMO in the cluster can't start reconciling hosts until after the one on the bootstrap is shut down. (We also pause all worker BMHs until after the bootstrap provisioner is shut down, so that the bootstrap BMO will not operate on the workers.)

      However, we also have image-customization-controller running in both places, and it does not have leader election. So if the cluster control plane starts to come up while there is still one control plane node that hasn't been provisioned yet, the two ICCs can start fighting. They constantly flip the URL for the PreprovisioningImage between http://127.0.0.1:8084/... and http://metal3-image-customization-service.openshift-machine-api.svc.cluster.local/....

      Example CI run demonstrating this behaviour: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/9814/pull-ci-openshift-installer-main-e2e-metal-ipi-ovn/1940018221864194048/artifacts/e2e-metal-ipi-ovn/ (the cause in this case is virtualmedia being broken by the PR).

      We should probably enable leader election for i-c-c like we do for bmo.

              hroy@redhat.com Himanshu Roy
              zabitter Zane Bitter
              None
              None
              Jad Haj Yahya Jad Haj Yahya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: