-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16.0, 4.17.0, 4.18.0, 4.19
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
We run the baremetal-operator on both the bootstrap (to provision the control plane nodes) and in the cluster (to provision the worker nodes). Currently we use leader election on both instances to ensure that the BMO in the cluster can't start reconciling hosts until after the one on the bootstrap is shut down. (We also pause all worker BMHs until after the bootstrap provisioner is shut down, so that the bootstrap BMO will not operate on the workers.)
However, we also have image-customization-controller running in both places, and it does not have leader election. So if the cluster control plane starts to come up while there is still one control plane node that hasn't been provisioned yet, the two ICCs can start fighting. They constantly flip the URL for the PreprovisioningImage between http://127.0.0.1:8084/... and http://metal3-image-customization-service.openshift-machine-api.svc.cluster.local/....
Example CI run demonstrating this behaviour: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_installer/9814/pull-ci-openshift-installer-main-e2e-metal-ipi-ovn/1940018221864194048/artifacts/e2e-metal-ipi-ovn/ (the cause in this case is virtualmedia being broken by the PR).
We should probably enable leader election for i-c-c like we do for bmo.
- is related to
-
OCPBUGS-33493 baremetal operator not starting on assisted/agent installs
-
- Closed
-