[OCPBUGS-646] The worker-rt nodes flips between two rendered-worker-rt* currentConfigs and desiredConfigs after upgrade to OCP4.11.1

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.13.0
Affects Version/s: 4.11
Component/s: Performance Addon Operator
Labels:
None

Severity:
Important
Regression:
None
Sprint:
CNF Compute Sprint 237
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.13.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

After upgrade the OCP env from OCP4.10.26 to OCP4.11.1, the worker-rt nodes flips between two rendered-worker-rt* currentConfigs and desiredConfigs, causing the nodes in the pool to be in a reboot loop. This happens during and after the upgrade process; because of the issue, during the upgrade, some ClusterOperators can't finish the upgrade, so we deleted the PerformanceProfile to let the upgrade could finish; then after the upgrade, when everything is settled (clusteroperators and MCPs are all good), we reapplied the PerformanceProfile, the issue was reproduced.

[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 14:32:31 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
Fri Aug 26 14:32:36 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 14:42:51 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
Fri Aug 26 14:42:57 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 14:51:48 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
Fri Aug 26 14:51:51 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 15:41:21 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725

Version-Release number of selected component (if applicable):

ocp4.11.1

How reproducible:

Steps to Reproduce:

1. Update the env from OCP4.10.26 to OCP4.11.1
2. During the process, check the OCP env status(the operators/nodes etc)
3. Delete the PerformanceProfile, continue check the OCP env status
4. After the upgrade finishes, reapply the PerformanceProfile

Actual results:

In step2, it's found the 2 real time nodes were rebooting in turn; and 3 clusteroperators were unable to complete the upgrade because of that, please refer to the attachment 'oc get clusteroperators.txt' for details.
In step3, the upgrade could finish.
In step4, the 2 real time nodes began to reboot again, i.e. flips between two rendered-worker-rt* currentConfigs and desiredConfigs

Expected results:

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

MCD logs.txt
9 kB
2022/08/29 7:43 AM
oc get clusteroperators.txt
4 kB
2022/08/29 7:49 AM
oc logs tuned-75flp.txt
16 kB
2022/08/29 7:43 AM
PerformanceProfile-rt.yaml
2 kB
2022/08/29 7:44 AM
rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da.yaml
155 kB
2022/08/30 9:59 AM
rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725.yaml
155 kB
2022/08/30 9:59 AM

Errata Tool added a comment - 2023/06/07 1:50 AM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (Important: OpenShift Container Platform 4.13.2 bug fix and security update), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2023:3367

Errata Tool added a comment - 2023/06/07 1:50 AM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.13.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3367

Shereen Haj added a comment - 2023/06/05 6:18 AM

Verification:
OCP: 4.13.0-0.nightly-2023-06-01-050820

Steps:
1.Install OCP with 3 worker nodes (all with same label) each with a different CPU's count as follows:
worker-0 with 24 CPUs;
worker-1 with 18 CPUs;
worker-2 with 12 CPUs;

2.Apply the following tuned profile:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: openshift-bootcmdline-cpu
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom OpenShift profile
      [bootloader]
      cmdline=+cpus=${f:exec:/usr/bin/bash:-c:nproc|tr -d '\n'}
    name: openshift-bootcmdline-cpu  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: "worker"
    priority: 20
    profile: openshift-bootcmdline-cpu

3. Observe NTO logs and status:

[root@ocp-edge89 ~]# oc get co/node-tuning
NAME          VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
node-tuning   4.13.0-0.nightly-2023-06-01-050820   True        False         True       19h     2/6 Profiles with bootcmdline conflict

[root@ocp-edge89 ~]# oc logs cluster-node-tuning-operator-7dc4f7bdfc-jbznj -n openshift-cluster-node-tuning-operator | tail
I0605 06:13:38.455266       1 status.go:288] 2/6 Profiles with bootcmdline conflict
I0605 06:13:59.925422       1 status.go:288] 2/6 Profiles with bootcmdline conflict
I0605 06:13:59.927749       1 status.go:288] 2/6 Profiles with bootcmdline conflict
I0605 06:13:59.929914       1 status.go:288] 1/6 Profiles with bootcmdline conflict
W0605 06:13:59.948681       1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-2.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP?
I0605 06:13:59.949892       1 status.go:288] 2/6 Profiles with bootcmdline conflict
I0605 06:13:59.960087       1 status.go:288] 1/6 Profiles with bootcmdline conflict
W0605 06:13:59.961113       1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-0.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP?
I0605 06:13:59.963120       1 status.go:288] 2/6 Profiles with bootcmdline conflict
I0605 06:13:59.965279       1 status.go:288] 2/6 Profiles with bootcmdline conflict
[root@ocp-edge89 ~]#

Worker mcp got into "updating" status and so did the nodes, but the update was not actually applied because of unequal CPU count on the nodes.
Verified the fix successfully.

Shereen Haj added a comment - 2023/06/05 6:18 AM Verification: OCP: 4.13.0-0.nightly-2023-06-01-050820 Steps: 1.Install OCP with 3 worker nodes (all with same label) each with a different CPU's count as follows: worker-0 with 24 CPUs; worker-1 with 18 CPUs; worker-2 with 12 CPUs; 2.Apply the following tuned profile: apiVersion: tuned.openshift.io/v1 kind: Tuned metadata: name: openshift-bootcmdline-cpu namespace: openshift-cluster-node-tuning- operator spec: profile: - data: | [main] summary=Custom OpenShift profile [bootloader] cmdline=+cpus=${f:exec:/usr/bin/bash:-c:nproc|tr -d '\n' } name: openshift-bootcmdline-cpu recommend: - machineConfigLabels: machineconfiguration.openshift.io/role: "worker" priority: 20 profile: openshift-bootcmdline-cpu 3. Observe NTO logs and status: [root@ocp-edge89 ~]# oc get co/node-tuning NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE node-tuning 4.13.0-0.nightly-2023-06-01-050820 True False True 19h 2/6 Profiles with bootcmdline conflict [root@ocp-edge89 ~]# oc logs cluster-node-tuning- operator -7dc4f7bdfc-jbznj -n openshift-cluster-node-tuning- operator | tail I0605 06:13:38.455266 1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.925422 1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.927749 1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.929914 1 status.go:288] 1/6 Profiles with bootcmdline conflict W0605 06:13:59.948681 1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-2.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP? I0605 06:13:59.949892 1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.960087 1 status.go:288] 1/6 Profiles with bootcmdline conflict W0605 06:13:59.961113 1 controller.go:832] refusing to update MachineConfig 50-nto-worker for ocp4131592097-worker-0.libvirt.lab.eng.tlv2.redhat.com due to kernel arguments change with unchanged input configuration (9/openshift-bootcmdline-cpu). Node(s) with different (CPU) topology in the same MCP? I0605 06:13:59.963120 1 status.go:288] 2/6 Profiles with bootcmdline conflict I0605 06:13:59.965279 1 status.go:288] 2/6 Profiles with bootcmdline conflict [root@ocp-edge89 ~]# Worker mcp got into "updating" status and so did the nodes, but the update was not actually applied because of unequal CPU count on the nodes. Verified the fix successfully.

Shereen Haj added a comment - 2023/05/29 7:59 AM

msivak@redhat.com HI, can you add a target version please?

Shereen Haj added a comment - 2023/05/29 7:59 AM msivak@redhat.com HI, can you add a target version please?

Martin Sivak added a comment - 2023/03/27 11:51 AM

I believe this PR fixes the issue in 4.13: https://github.com/openshift/cluster-node-tuning-operator/pull/558

Martin Sivak added a comment - 2023/03/27 11:51 AM I believe this PR fixes the issue in 4.13: https://github.com/openshift/cluster-node-tuning-operator/pull/558

Jiri Mencak added a comment - 2022/10/26 2:44 PM

Putting nodes with conflicting topology in the same machine pool is something you must not do and is documented in OCP docs. The operands will be sending conflicting kernel parameters back to the operator and the operator will just accept them and update the machine configs accordingly. I never tried what is going to happen, but I assume boot loops.

Jiri Mencak added a comment - 2022/10/26 2:44 PM Putting nodes with conflicting topology in the same machine pool is something you must not do and is documented in OCP docs . The operands will be sending conflicting kernel parameters back to the operator and the operator will just accept them and update the machine configs accordingly. I never tried what is going to happen, but I assume boot loops.

Martin Sivak added a comment - 2022/10/26 2:13 PM

jmencak What will NTO do to the tuned configuration / kernel args when there are nodes with conflicting topology at the same time in the same pool?

Martin Sivak added a comment - 2022/10/26 2:13 PM jmencak What will NTO do to the tuned configuration / kernel args when there are nodes with conflicting topology at the same time in the same pool?

Nini Gu added a comment - 2022/09/01 4:23 AM - edited

It's found the cpu numbers are not equal for the 2 real time nodes:

[core@zeus08 ~]$ lscpu
......
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15

[core@zeus10 ~]$ lscpu
......
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20 ...... 78
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21 ...... 79

After removing node zeus10 from the real time scenario, it works on the only node zeus08:

[root@dell-r640-kvm-qe-01 ocp]# oc get node/zeus08.lab.eng.tlv2.redhat.com -o yaml
apiVersion: v1
kind: Node
metadata:
annotations:
csi.volume.kubernetes.io/nodeid: '{"csi.ovirt.org":"4c4c4544-0039-5010-8056-c2c04f325332","csi.trident.netapp.io":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.cephfs.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.rbd.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com"}'
kubevirt.io/heartbeat: "2022-09-01T04:08:50Z"
machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
machineconfiguration.openshift.io/desiredConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da

......

[core@zeus08 ~]$ uptime
04:13:39 up 21:11, 1 user, load average: 5.59, 6.02, 6.38
[core@zeus08 ~]$ uname -r
4.18.0-372.19.1.rt7.176.el8_6.x86_64
[core@zeus08 ~]$

......

Nini Gu added a comment - 2022/09/01 4:23 AM - edited It's found the cpu numbers are not equal for the 2 real time nodes: [core@zeus08 ~] $ lscpu ...... NUMA node0 CPU(s): 0,2,4,6,8,10,12,14 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15 [core@zeus10 ~] $ lscpu ...... NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20 ...... 78 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21 ...... 79 After removing node zeus10 from the real time scenario, it works on the only node zeus08: [root@dell-r640-kvm-qe-01 ocp] # oc get node/zeus08.lab.eng.tlv2.redhat.com -o yaml apiVersion: v1 kind: Node metadata: annotations: csi.volume.kubernetes.io/nodeid: '{"csi.ovirt.org":"4c4c4544-0039-5010-8056-c2c04f325332","csi.trident.netapp.io":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.cephfs.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com","openshift-storage.rbd.csi.ceph.com":"zeus08.lab.eng.tlv2.redhat.com"}' kubevirt.io/heartbeat: "2022-09-01T04:08:50Z" machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable machineconfiguration.openshift.io/currentConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da machineconfiguration.openshift.io/desiredConfig: rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da ...... [core@zeus08 ~] $ uptime 04:13:39 up 21:11, 1 user, load average: 5.59, 6.02, 6.38 [core@zeus08 ~] $ uname -r 4.18.0-372.19.1.rt7.176.el8_6.x86_64 [core@zeus08 ~] $ ......

Yu Qi Zhang added a comment - 2022/08/30 2:26 PM

The diffs between the two MachineConfigs is:

< - tuned.non_isolcpus=00005555
< - systemd.cpu_affinity=0,2,4,6,8,10,12,14
—
> - tuned.non_isolcpus=0000ffff,ffffffff,ffff5555
> - systemd.cpu_affinity=0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79

The MCO can't directly "decide" what the desired config is. It always sets that based on the machineconfigs that target a pool.

In your case, it is most likely the MC 50-nto-worker-rt that is being regenerated. for example, the config

` - systemd.cpu_affinity=0,2,4,6,8,10,12,14`

Doesn't currently exist anywhere in the non-rendered configs

Yu Qi Zhang added a comment - 2022/08/30 2:26 PM The diffs between the two MachineConfigs is: < - tuned.non_isolcpus=00005555 < - systemd.cpu_affinity=0,2,4,6,8,10,12,14 — > - tuned.non_isolcpus=0000ffff,ffffffff,ffff5555 > - systemd.cpu_affinity=0,2,4,6,8,10,12,14,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79 The MCO can't directly "decide" what the desired config is. It always sets that based on the machineconfigs that target a pool. In your case, it is most likely the MC 50-nto-worker-rt that is being regenerated. for example, the config ` - systemd.cpu_affinity=0,2,4,6,8,10,12,14` Doesn't currently exist anywhere in the non-rendered configs

Oren Cohen added a comment - 2022/08/30 11:51 AM

Please see a must-gather ran with NTO addon against the cluster while the issue is happening:

https://drive.google.com/file/d/1SPNgt_b6JXZYzKgFyeta8HRkt5lqMqaM/view?usp=sharing

/cc msivak@redhat.com

Oren Cohen added a comment - 2022/08/30 11:51 AM Please see a must-gather ran with NTO addon against the cluster while the issue is happening: https://drive.google.com/file/d/1SPNgt_b6JXZYzKgFyeta8HRkt5lqMqaM/view?usp=sharing /cc msivak@redhat.com

Assignee:: Martin Sivak

Reporter:: Nini Gu

QA Contact:: Shereen Haj

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2022/08/29 7:54 AM

Updated:: 2023/06/07 1:50 AM

Resolved:: 2023/06/07 1:50 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2023/06/07 1:50 AM

Expand comment: Errata Tool added a comment - 2023/06/07 1:50 AM

Collapse comment: Shereen Haj added a comment - 2023/06/05 6:18 AM

Expand comment: Shereen Haj added a comment - 2023/06/05 6:18 AM

Collapse comment: Shereen Haj added a comment - 2023/05/29 7:59 AM

Expand comment: Shereen Haj added a comment - 2023/05/29 7:59 AM

Collapse comment: Martin Sivak added a comment - 2023/03/27 11:51 AM

Expand comment: Martin Sivak added a comment - 2023/03/27 11:51 AM

Collapse comment: Jiri Mencak added a comment - 2022/10/26 2:44 PM

Expand comment: Jiri Mencak added a comment - 2022/10/26 2:44 PM

Collapse comment: Martin Sivak added a comment - 2022/10/26 2:13 PM

Expand comment: Martin Sivak added a comment - 2022/10/26 2:13 PM

Collapse comment: Nini Gu added a comment - 2022/09/01 4:23 AM, Edited by Nini Gu - 2022/09/01 4:24 AM

Expand comment: Nini Gu added a comment - 2022/09/01 4:23 AM, Edited by Nini Gu - 2022/09/01 4:24 AM

Collapse comment: Yu Qi Zhang added a comment - 2022/08/30 2:26 PM

Expand comment: Yu Qi Zhang added a comment - 2022/08/30 2:26 PM

Collapse comment: Oren Cohen added a comment - 2022/08/30 11:51 AM

Expand comment: Oren Cohen added a comment - 2022/08/30 11:51 AM

People

Dates