Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3504

[4.12] Incorrect network configuration in worker node with two interfaces


    • Important
    • None
    • SDN Sprint 226, SDN Sprint 227, SDN Sprint 228
    • 3
    • Approved
    • False
    • Hide

      This is a regression as same scenarios are working fine on 4.11 release

      This is a regression as same scenarios are working fine on 4.11 release

      Description of problem:

      Upgrade OCP 4.11 --> 4.12 fails with one 'NotReady,SchedulingDisabled' node and MachineConfigDaemonFailed.

      Version-Release number of selected component (if applicable):

      Upgrade from OCP 4.11.0-0.nightly-2022-09-19-214532 on top of OSP RHOS-16.2-RHEL-8-20220804.n.1 to 4.12.0-0.nightly-2022-09-20-040107.
      Network Type: OVNKubernetes

      How reproducible:

      Twice out of two attempts.

      Steps to Reproduce:

      1. Install OCP 4.11.0-0.nightly-2022-09-19-214532 (IPI) on top of OSP RHOS-16.2-RHEL-8-20220804.n.1.
         The cluster is up and running with three workers:
         $ oc get clusterversion
         NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
         version   4.11.0-0.nightly-2022-09-19-214532   True        False         51m     Cluster version is 4.11.0-0.nightly-2022-09-19-214532
      2. Run the OC command to upgrade to 4.12.0-0.nightly-2022-09-20-040107:
      $ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-09-20-040107 --allow-explicit-upgrade --force=true
      warning: Using by-tag pull specs is dangerous, and while we still allow it in combination with --force for backward compatibility, it would be much safer to pass a by-digest pull spec instead
      warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
      warning: --force overrides cluster verification of your supplied release image and waives any update precondition failures.
      Requesting update to release image registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-09-20-040107 
      3. The upgrade is not succeeds: [0]
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.11.0-0.nightly-2022-09-19-214532   True        True          17h     Unable to apply 4.12.0-0.nightly-2022-09-20-040107: wait has exceeded 40 minutes for these operators: network
      One node degrided to 'NotReady,SchedulingDisabled' status:
      $ oc get nodes
      NAME                          STATUS                        ROLES    AGE   VERSION
      ostest-9vllk-master-0         Ready                         master   19h   v1.24.0+07c9eb7
      ostest-9vllk-master-1         Ready                         master   19h   v1.24.0+07c9eb7
      ostest-9vllk-master-2         Ready                         master   19h   v1.24.0+07c9eb7
      ostest-9vllk-worker-0-4x4pt   NotReady,SchedulingDisabled   worker   18h   v1.24.0+3882f8f
      ostest-9vllk-worker-0-h6kcs   Ready                         worker   18h   v1.24.0+3882f8f
      ostest-9vllk-worker-0-xhz9b   Ready                         worker   18h   v1.24.0+3882f8f
      $ oc get pods -A | grep -v -e Completed -e Running
      NAMESPACE                                          NAME                                                         READY   STATUS      RESTARTS       AGE
      openshift-openstack-infra                          coredns-ostest-9vllk-worker-0-4x4pt                          0/2     Init:0/1    0              18h
      $ oc get events
      LAST SEEN   TYPE      REASON                                        OBJECT            MESSAGE
      7m15s       Warning   OperatorDegraded: MachineConfigDaemonFailed   /machine-config   Unable to apply 4.12.0-0.nightly-2022-09-20-040107: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
      7m15s       Warning   MachineConfigDaemonFailed                     /machine-config   Cluster not available for [{operator 4.11.0-0.nightly-2022-09-19-214532}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
      $ oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.0-0.nightly-2022-09-20-040107   True        False         False      18h    
      baremetal                                  4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      cloud-controller-manager                   4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      cloud-credential                           4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      cluster-autoscaler                         4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      config-operator                            4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      console                                    4.12.0-0.nightly-2022-09-20-040107   True        False         False      18h    
      control-plane-machine-set                  4.12.0-0.nightly-2022-09-20-040107   True        False         False      17h    
      csi-snapshot-controller                    4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      dns                                        4.12.0-0.nightly-2022-09-20-040107   True        True          False      19h     DNS "default" reports Progressing=True: "Have 5 available node-resolver pods, want 6."
      etcd                                       4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      image-registry                             4.12.0-0.nightly-2022-09-20-040107   True        True          False      18h     Progressing: The registry is ready...
      ingress                                    4.12.0-0.nightly-2022-09-20-040107   True        False         False      18h    
      insights                                   4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      kube-apiserver                             4.12.0-0.nightly-2022-09-20-040107   True        True          False      18h     NodeInstallerProgressing: 1 nodes are at revision 11; 2 nodes are at revision 13
      kube-controller-manager                    4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      kube-scheduler                             4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      kube-storage-version-migrator              4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      machine-api                                4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      machine-approver                           4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      machine-config                             4.11.0-0.nightly-2022-09-19-214532   False       True          True       16h     Cluster not available for [{operator 4.11.0-0.nightly-2022-09-19-214532}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 6, updated: 6, ready: 5, unavailable: 1)]
      marketplace                                4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      monitoring                                 4.12.0-0.nightly-2022-09-20-040107   True        False         False      18h    
      network                                    4.12.0-0.nightly-2022-09-20-040107   True        True          True       19h     DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making progress - last change 2022-09-20T14:16:13Z...
      node-tuning                                4.12.0-0.nightly-2022-09-20-040107   True        False         False      17h    
      openshift-apiserver                        4.12.0-0.nightly-2022-09-20-040107   True        False         False      18h    
      openshift-controller-manager               4.12.0-0.nightly-2022-09-20-040107   True        False         False      17h    
      openshift-samples                          4.12.0-0.nightly-2022-09-20-040107   True        False         False      17h    
      operator-lifecycle-manager                 4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      operator-lifecycle-manager-catalog         4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      operator-lifecycle-manager-packageserver   4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      service-ca                                 4.12.0-0.nightly-2022-09-20-040107   True        False         False      19h    
      storage                                    4.12.0-0.nightly-2022-09-20-040107   True        True          False      19h     ManilaCSIDriverOperatorCRProgressing: ManilaDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...
      [0] http://pastebin.test.redhat.com/1074531

      Actual results:

      OCP 4.11 --> 4.12 upgrade fails.

      Expected results:

      OCP 4.11 --> 4.12 upgrade success.

      Additional info:

      Attached logs of the NotReady node - [^journalctl_ostest-9vllk-worker-0-4x4pt.log.tar.gz]

            mkennell@redhat.com Martin Kennelly
            rhn-support-imatza Itay Matza
            Itay Matza Itay Matza
            0 Vote for this issue
            8 Start watching this issue
