Details
-
Bug
-
Resolution: Unresolved
-
Major
-
4.9
-
Important
-
3
-
Metal Platform 241, Metal Platform 243, Metal Platform 244, Metal Platform 245, Metal Platform 246, Metal Platform 247, Metal Platform 248, Metal Platform 249, Metal Platform 250, Metal Platform 251, Metal Platform 252
-
11
-
Rejected
-
Unspecified
-
If docs needed, set a value
Description
Description of problem:
Installing a cluster with 4.9.0-rc.0, adding a new node after installation and upgrading the cluster to 4.9.0-rc.1 works. However adding then new nodes and upgrading the cluster to 4.9.0-rc.3 fails for the new added nodes added after upgrading to rc1 as they get beack into ironic discovery when rebooted by the upgrade
Version-Release number of selected component (if applicable):
4.9.0-rc.1
How reproducible:
Steps to Reproduce:
1. set up IPI HW cluster on IBM cloud following https://deploy-preview-36529--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_ibm_cloud/install-ibm-cloud-installing-on-ibm-cloud.html
2. confirm cluster is up and running with 3 maters and 2 nodes (dmoessne-m0,dmoessne-m1,dmoessne-m2,dmoessne-w0,dmoessne-w1) up, co are all fine
3. add a 3rd worker (dmoessne-w2) following https://deploy-preview-36529--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_bare_metal_ipi/ipi-install-expanding-the-cluster.html
4. Confirm that 3rd (dmoessne-w2) worker was added successfully (ready)
5. upgrade to 4.9.0-rc.1
6 verify upgrade was successful (oc get co, oc adm upgrade, oc get nodes,mcp ..)
7. add additional nodes and confirm they have been successfully added (dmoessne-w3,dmoessne-w4,dmoessne-w5)
8. set in mcp worker maxUnavailable to 3 (that is just to avoid getting stuck in case dmoessne-w3,dmoessne-w4,dmoessne-w5 are choosen as a start for worker upgrade )
9. upgrade to 4.9.0-rc.3
Actual results:
masters and and workes added prior to upgrade to rc1 are successfully updated (dmoessne-m0,dmoessne-m1,dmoessne-m2,dmoessne-w0,dmoessne-w1,dmoessne-w2)
however, workers added after upgrading to rc1 (dmoessne-w3,dmoessne-w4,dmoessne-w5) get stuck and checking their console shows that they are back in ironic inspection without coming back (DHCP boot is still enabled for them despite being provisioned)
Expected results:
all nodes get updated successfully/ironic dhcp boot is disabled for successfully provisioned nodes
Additional info:
Disabling provisioning, i.e.
- oc edit provisioning
~~~
[...]
spec:
preProvisioningOSDownloadURLs: {}
provisioningNetwork: Disabled
provisioningOSDownloadURL: http://<IP>:8080/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.gz?sha256=00cb56c8711686255744646394e22a8ca5f27e059016f6758f14388e5a0a14cb
status:
[...]
~~~
and powercycling the nodes let's the cluster upgrade.
Enabling provisioner again and rebooting one of the affected nodes immediately brings them back in iroic discovery