Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38373

4.12 -> 4.13 upgrade using IPI on Azure does not work

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-37534. The following is the description of the original issue:

      Description of problem:

      Prow jobs upgrading from 4.9 to 4.16 are failing when they upgrade from 4.12 to 4.13.
      
      Nodes become NotReady when MCO tries to apply the new 4.13 configuration to the MCPs.
      
      The failing job is: periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.9-azure-ipi-f28
      
      We have reproduced the issue and we found an ordering cycle error in the journal log
      
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 systemd-journald.service[838]: Runtime Journal (/run/log/journal/960b04f10e4f44d98453ce5faae27e84) is 8.0M, max 641.9M, 633.9M free.
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found ordering cycle on network-online.target/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on node-valid-hostname.service/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on ovs-configuration.service/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on firstboot-osupdate.target/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-firstboot.service/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-pull.service/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Job network-online.target/start deleted to break ordering cycle starting with machine-config-daemon-pull.service/start
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: Queued start job for default target Graphical Interface.
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: (This warning is only shown for the first unit using IP firewalling.)
      Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: Deactivated successfully.
      
          

      Version-Release number of selected component (if applicable):

          Using IPI on Azure, these are the version involved in the current issue upgrading from 4.9 to 4.13:
          
            version: 4.13.0-0.nightly-2024-07-23-154444
            version: 4.12.0-0.nightly-2024-07-23-230744
            version: 4.11.59
            version: 4.10.67
            version: 4.9.59
      
          

      How reproducible:

          Always
          

      Steps to Reproduce:

          1. Upgrade an IPI on Azure cluster from 4.9 to 4.13. Theoretically, upgrading from 4.12 to 4.13 should be enough, but we reproduced it following the whole path.
      
          

      Actual results:

      
          Nodes become not ready
      $ oc get nodes
      NAME                                                 STATUS                        ROLES    AGE     VERSION
      ci-op-g94jvswm-cc71e-998q8-master-0                  Ready                         master   6h14m   v1.25.16+306a47e
      ci-op-g94jvswm-cc71e-998q8-master-1                  Ready                         master   6h13m   v1.25.16+306a47e
      ci-op-g94jvswm-cc71e-998q8-master-2                  NotReady,SchedulingDisabled   master   6h13m   v1.25.16+306a47e
      ci-op-g94jvswm-cc71e-998q8-worker-centralus1-c7ngb   NotReady,SchedulingDisabled   worker   6h2m    v1.25.16+306a47e
      ci-op-g94jvswm-cc71e-998q8-worker-centralus2-2ppf6   Ready                         worker   6h4m    v1.25.16+306a47e
      ci-op-g94jvswm-cc71e-998q8-worker-centralus3-nqshj   Ready                         worker   6h6m    v1.25.16+306a47e
      
      And in the NotReady nodes we can see the ordering cycle error mentioned in the description of this ticket.
          
      
          

      Expected results:

      No ordering cycle error should happen and the upgrade should be executed without problems.
          

      Additional info:

      
          

            team-mco Team MCO
            openshift-crt-jira-prow OpenShift Prow Bot
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: