Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8988

[IPI on HW provided by IBM cloud] Upgrade for nodes added after updating to 4.9.0-rc.1 fails as nodes boot into ironic discovery

    XMLWordPrintable

Details

    • Important
    • 3
    • Metal Platform 241, Metal Platform 243, Metal Platform 244, Metal Platform 245, Metal Platform 246, Metal Platform 247, Metal Platform 248, Metal Platform 249, Metal Platform 250, Metal Platform 251, Metal Platform 252
    • 11
    • Rejected
    • Unspecified
    • If docs needed, set a value

    Description

      Description of problem:
      Installing a cluster with 4.9.0-rc.0, adding a new node after installation and upgrading the cluster to 4.9.0-rc.1 works. However adding then new nodes and upgrading the cluster to 4.9.0-rc.3 fails for the new added nodes added after upgrading to rc1 as they get beack into ironic discovery when rebooted by the upgrade

      Version-Release number of selected component (if applicable):
      4.9.0-rc.1

      How reproducible:

      Steps to Reproduce:
      1. set up IPI HW cluster on IBM cloud following https://deploy-preview-36529--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_ibm_cloud/install-ibm-cloud-installing-on-ibm-cloud.html

      2. confirm cluster is up and running with 3 maters and 2 nodes (dmoessne-m0,dmoessne-m1,dmoessne-m2,dmoessne-w0,dmoessne-w1) up, co are all fine

      3. add a 3rd worker (dmoessne-w2) following https://deploy-preview-36529--osdocs.netlify.app/openshift-enterprise/latest/installing/installing_bare_metal_ipi/ipi-install-expanding-the-cluster.html

      4. Confirm that 3rd (dmoessne-w2) worker was added successfully (ready)

      5. upgrade to 4.9.0-rc.1

      6 verify upgrade was successful (oc get co, oc adm upgrade, oc get nodes,mcp ..)

      7. add additional nodes and confirm they have been successfully added (dmoessne-w3,dmoessne-w4,dmoessne-w5)

      8. set in mcp worker maxUnavailable to 3 (that is just to avoid getting stuck in case dmoessne-w3,dmoessne-w4,dmoessne-w5 are choosen as a start for worker upgrade )

      9. upgrade to 4.9.0-rc.3

      Actual results:
      masters and and workes added prior to upgrade to rc1 are successfully updated (dmoessne-m0,dmoessne-m1,dmoessne-m2,dmoessne-w0,dmoessne-w1,dmoessne-w2)
      however, workers added after upgrading to rc1 (dmoessne-w3,dmoessne-w4,dmoessne-w5) get stuck and checking their console shows that they are back in ironic inspection without coming back (DHCP boot is still enabled for them despite being provisioned)

      Expected results:
      all nodes get updated successfully/ironic dhcp boot is disabled for successfully provisioned nodes

      Additional info:

      Disabling provisioning, i.e.

      1. oc edit provisioning
        ~~~
        [...]
        spec:
        preProvisioningOSDownloadURLs: {}
        provisioningNetwork: Disabled
        provisioningOSDownloadURL: http://<IP>:8080/rhcos-49.84.202107010027-0-openstack.x86_64.qcow2.gz?sha256=00cb56c8711686255744646394e22a8ca5f27e059016f6758f14388e5a0a14cb
        status:

      [...]
      ~~~

      and powercycling the nodes let's the cluster upgrade.
      Enabling provisioner again and rebooting one of the affected nodes immediately brings them back in iroic discovery

      Attachments

        Activity

          People

            dhiggins@redhat.com Derek Higgins
            rhn-support-dmoessner Daniel Moessner
            Tomas Sedovic Tomas Sedovic
            Red Hat Employee
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: