Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-646

The worker-rt nodes flips between two rendered-worker-rt* currentConfigs and desiredConfigs after upgrade to OCP4.11.1

    • Important
    • None
    • CNF Compute Sprint 237
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      After upgrade the OCP env from OCP4.10.26 to OCP4.11.1, the worker-rt nodes flips between two rendered-worker-rt* currentConfigs and desiredConfigs, causing the nodes in the pool to be in a reboot loop. This happens during and after the upgrade process; because of the issue, during the upgrade, some ClusterOperators can't finish the upgrade, so we deleted the PerformanceProfile to let the upgrade could finish; then after the upgrade, when everything is settled (clusteroperators and MCPs are all good), we reapplied the PerformanceProfile, the issue was reproduced.
      
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
      Fri Aug 26 14:32:31 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
      Fri Aug 26 14:32:36 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
      Fri Aug 26 14:42:51 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
      Fri Aug 26 14:42:57 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
      Fri Aug 26 14:51:48 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
      Fri Aug 26 14:51:51 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
      [ocohen@ocohen ~]$ 
      [ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
      Fri Aug 26 15:41:21 IDT 2022
      zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
      zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725

      Version-Release number of selected component (if applicable):

      ocp4.11.1

      How reproducible:

       

      Steps to Reproduce:

      1. Update the env from OCP4.10.26 to OCP4.11.1
      2. During the process, check the OCP env status(the operators/nodes etc)
      3. Delete the PerformanceProfile, continue check the OCP env status
      4. After the upgrade finishes, reapply the PerformanceProfile

      Actual results:

      In step2, it's found the 2 real time nodes were rebooting in turn; and 3 clusteroperators were unable to complete the upgrade because of that, please refer to the attachment 'oc get clusteroperators.txt' for details.
      In step3, the upgrade could finish.
      In step4, the 2 real time nodes began to reboot again, i.e. flips between two rendered-worker-rt* currentConfigs and desiredConfigs
      
      

      Expected results:

       

      Additional info:

       

            [OCPBUGS-646] The worker-rt nodes flips between two rendered-worker-rt* currentConfigs and desiredConfigs after upgrade to OCP4.11.1

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.13.2 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:3367

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.13.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3367

            Shereen Haj added a comment -

            Verification:
            OCP: 4.13.0-0.nightly-2023-06-01-050820

            Steps:
            1.Install OCP with 3 worker nodes (all with same label) each with a different CPU's count as follows:
            worker-0 with 24 CPUs; 
            worker-1 with 18 CPUs;
            worker-2 with 12 CPUs;

            2.Apply the following tuned profile:

            apiVersion: tuned.openshift.io/v1
            kind: Tuned
            metadata:
              name: openshift-bootcmdline-cpu
              namespace: openshift-cluster-node-tuning-operator
            spec:
              profile:
              - data: |
                  [main]
                  summary=Custom OpenShift profile
                  [bootloader]
                  cmdline=+cpus=${f:exec:/usr/bin/bash:-c:nproc|tr -d '\n'}
                name: openshift-bootcmdline-cpu  recommend:
              - machineConfigLabels:
                  machineconfiguration.openshift.io/role: "worker"
                priority: 20
                profile: openshift-bootcmdline-cpu 

            3. Observe NTO logs and status:

            [root@ocp-edge89 ~]# oc get co/node-tuning
            NAME          VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
            node-tuning   4.13.0-0.nightly-2023-06-01-050820   True        False         True       19h     2/6 Profiles with bootcmdline conflict
            
            [root@ocp-edge89 ~]# oc logs cluster-node-tuning-operator-7dc4f7bdfc-jbznj -n openshift-cluster-node-tuning-operator | tail
            I0605 06:13:38.455266       1 status.go:288] 2/6 Profiles with bootcmdline conflict
            I0605 06:13:59.925422       1 status.go:288] 2/6 Profiles with bootcmdline conflict
            I0605 06:13:59.927749       1 status.go:288] 2/6 Profiles with bootcmdline conflict
            I0605 06:13:59.929914       1 status.go:288] 1/6 Profiles with bootcmdline conflict
            W0605 06:13:59.948681       1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-2.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP?
            I0605 06:13:59.949892       1 status.go:288] 2/6 Profiles with bootcmdline conflict
            I0605 06:13:59.960087       1 status.go:288] 1/6 Profiles with bootcmdline conflict
            W0605 06:13:59.961113       1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-0.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP?
            I0605 06:13:59.963120       1 status.go:288] 2/6 Profiles with bootcmdline conflict
            I0605 06:13:59.965279       1 status.go:288] 2/6 Profiles with bootcmdline conflict
            [root@ocp-edge89 ~]# 
             

            Worker mcp got into "updating" status and so did the nodes, but the update was not actually applied because of unequal CPU count on the nodes.
            Verified the fix successfully.

            Shereen Haj added a comment - Verification: OCP: 4.13.0-0.nightly-2023-06-01-050820 Steps: 1.Install OCP with 3 worker nodes (all with same label) each with a different CPU's count as follows: worker-0 with 24 CPUs;  worker-1 with 18 CPUs; worker-2 with 12 CPUs; 2.Apply the following tuned profile: apiVersion: tuned.openshift.io/v1 kind: Tuned metadata:   name: openshift-bootcmdline-cpu   namespace: openshift-cluster-node-tuning- operator spec:   profile:   - data: |       [main]       summary=Custom OpenShift profile       [bootloader]       cmdline=+cpus=${f:exec:/usr/bin/bash:-c:nproc|tr -d '\n' }     name: openshift-bootcmdline-cpu  recommend:   - machineConfigLabels:       machineconfiguration.openshift.io/role: "worker"     priority: 20     profile: openshift-bootcmdline-cpu 3. Observe NTO logs and status: [root@ocp-edge89 ~]# oc get co/node-tuning NAME          VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE node-tuning   4.13.0-0.nightly-2023-06-01-050820   True        False         True       19h     2/6 Profiles with bootcmdline conflict [root@ocp-edge89 ~]# oc logs cluster-node-tuning- operator -7dc4f7bdfc-jbznj -n openshift-cluster-node-tuning- operator | tail I0605 06:13:38.455266       1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.925422       1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.927749       1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.929914       1 status.go:288] 1/6 Profiles with bootcmdline conflict W0605 06:13:59.948681       1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-2.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP? I0605 06:13:59.949892       1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.960087       1 status.go:288] 1/6 Profiles with bootcmdline conflict W0605 06:13:59.961113       1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-0.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP? I0605 06:13:59.963120       1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.965279       1 status.go:288] 2/6 Profiles with bootcmdline conflict [root@ocp-edge89 ~]#  Worker mcp got into "updating" status and so did the nodes, but the update was not actually applied because of unequal CPU count on the nodes. Verified the fix successfully.

            Shereen Haj added a comment -

            msivak@redhat.com HI, can you add a target version please?

            Shereen Haj added a comment - msivak@redhat.com HI, can you add a target version please?

            I believe this PR fixes the issue in 4.13: https://github.com/openshift/cluster-node-tuning-operator/pull/558

            Martin Sivak added a comment - I believe this PR fixes the issue in 4.13: https://github.com/openshift/cluster-node-tuning-operator/pull/558

            Jiri Mencak added a comment -

            Putting nodes with conflicting topology in the same machine pool is something you must not do and is documented in OCP docs. The operands will be sending conflicting kernel parameters back to the operator and the operator will just accept them and update the machine configs accordingly. I never tried what is going to happen, but I assume boot loops.

            Jiri Mencak added a comment - Putting nodes with conflicting topology in the same machine pool is something you must not do and is documented in OCP docs . The operands will be sending conflicting kernel parameters back to the operator and the operator will just accept them and update the machine configs accordingly. I never tried what is going to happen, but I assume boot loops.

            jmencak What will NTO do to the tuned configuration / kernel args when there are nodes with conflicting topology at the same time in the same pool?

             

            Martin Sivak added a comment - jmencak What will NTO do to the tuned configuration / kernel args when there are nodes with conflicting topology at the same time in the same pool?  

            Nini Gu added a comment - - edited

            It's found the cpu numbers are not equal for the 2 real time nodes:

            [core@zeus08 ~]$ lscpu
            ......
            NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
            NUMA node1 CPU(s):   1,3,5,7,9,11,13,15

             

            [core@zeus10 ~]$ lscpu
            ......
            NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20 ...... 78
            NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21 ...... 79

             

            After removing node zeus10 from the real time scenario, it works on the only node zeus08:

            [root@dell-r640-kvm-qe-01 ocp]# oc get node/zeus08.lab.eng.tlv2.redhat.com -o yaml
            apiVersion: v1
            kind: Node
            metadata:
              annotations:
                csi.volume.kubernetes.io/nodeid: '{"csi.ovirt.org":"4c4c4544-0039-5010-8056-c2c04f325332","csi.trident.netapp.io":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.cephfs.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.rbd.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com"}'
                kubevirt.io/heartbeat: "2022-09-01T04:08:50Z"
                machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
                machineconfiguration.openshift.io/desiredConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
                machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
                machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da

            ......

            [core@zeus08 ~]$ uptime
             04:13:39 up 21:11,  1 user,  load average: 5.59, 6.02, 6.38
            [core@zeus08 ~]$ uname -r
            4.18.0-372.19.1.rt7.176.el8_6.x86_64
            [core@zeus08 ~]

            ......

            Nini Gu added a comment - - edited It's found the cpu numbers are not equal for the 2 real time nodes: [core@zeus08 ~] $ lscpu ...... NUMA node0 CPU(s):   0,2,4,6,8,10,12,14 NUMA node1 CPU(s):   1,3,5,7,9,11,13,15   [core@zeus10 ~] $ lscpu ...... NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20 ...... 78 NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21 ...... 79   After removing node zeus10 from the real time scenario, it works on the only node zeus08: [root@dell-r640-kvm-qe-01 ocp] # oc get node/zeus08.lab.eng.tlv2.redhat.com -o yaml apiVersion: v1 kind: Node metadata:   annotations:     csi.volume.kubernetes.io/nodeid: '{"csi.ovirt.org":"4c4c4544-0039-5010-8056-c2c04f325332","csi.trident.netapp.io":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.cephfs.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.rbd.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com"}'     kubevirt.io/heartbeat: "2022-09-01T04:08:50Z"     machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable     machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da     machineconfiguration.openshift.io/desiredConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da     machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da     machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da ...... [core@zeus08 ~] $ uptime  04:13:39 up 21:11,  1 user,  load average: 5.59, 6.02, 6.38 [core@zeus08 ~] $ uname -r 4.18.0-372.19.1.rt7.176.el8_6.x86_64 [core@zeus08 ~] $  ......

            Yu Qi Zhang added a comment -

            The diffs between the two MachineConfigs is:

             

            <   - tuned.non_isolcpus=00005555
            <   - systemd.cpu_affinity=0,2,4,6,8,10,12,14

            >   - tuned.non_isolcpus=0000ffff,ffffffff,ffff5555
            >   - systemd.cpu_affinity=0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79

             

            The MCO can't directly "decide" what the desired config is. It always sets that based on the machineconfigs that target a pool.

             

            In your case, it is most likely the MC 50-nto-worker-rt that is being regenerated. for example, the config

            ` - systemd.cpu_affinity=0,2,4,6,8,10,12,14`

            Doesn't currently exist anywhere in the non-rendered configs

            Yu Qi Zhang added a comment - The diffs between the two MachineConfigs is:   <   - tuned.non_isolcpus=00005555 <   - systemd.cpu_affinity=0,2,4,6,8,10,12,14 — >   - tuned.non_isolcpus=0000ffff,ffffffff,ffff5555 >   - systemd.cpu_affinity=0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79   The MCO can't directly "decide" what the desired config is. It always sets that based on the machineconfigs that target a pool.   In your case, it is most likely the MC 50-nto-worker-rt that is being regenerated. for example, the config ` - systemd.cpu_affinity=0,2,4,6,8,10,12,14` Doesn't currently exist anywhere in the non-rendered configs

            Oren Cohen added a comment -

            Please see a must-gather ran with NTO addon against the cluster while the issue is happening:

            https://drive.google.com/file/d/1SPNgt_b6JXZYzKgFyeta8HRkt5lqMqaM/view?usp=sharing

            /cc msivak@redhat.com 

            Oren Cohen added a comment - Please see a must-gather ran with NTO addon against the cluster while the issue is happening: https://drive.google.com/file/d/1SPNgt_b6JXZYzKgFyeta8HRkt5lqMqaM/view?usp=sharing /cc msivak@redhat.com  

              msivak@redhat.com Martin Sivak
              ngu@redhat.com Nini Gu
              Shereen Haj Shereen Haj
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: