Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.13.0
Affects Version/s: 4.11
Component/s: Performance Addon Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:

4.13.z
Release Blocker:
Rejected
Sprint:
CNF Compute Sprint 237
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

After upgrade the OCP env from OCP4.10.26 to OCP4.11.1, the worker-rt nodes flips between two rendered-worker-rt* currentConfigs and desiredConfigs, causing the nodes in the pool to be in a reboot loop. This happens during and after the upgrade process; because of the issue, during the upgrade, some ClusterOperators can't finish the upgrade, so we deleted the PerformanceProfile to let the upgrade could finish; then after the upgrade, when everything is settled (clusteroperators and MCPs are all good), we reapplied the PerformanceProfile, the issue was reproduced.

[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 14:32:31 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
Fri Aug 26 14:32:36 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 14:42:51 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
Fri Aug 26 14:42:57 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 14:51:48 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/currentConfig")"'
Fri Aug 26 14:51:51 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da
[ocohen@ocohen ~]$ 
[ocohen@ocohen ~]$ date && oc get nodes -l node-role.kubernetes.io/worker-rt= -o json | jq -r '.items[] | "\(.metadata.name) \(.metadata.annotations."machineconfiguration.openshift.io/desiredConfig")"'
Fri Aug 26 15:41:21 IDT 2022
zeus08.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725
zeus10.lab.eng.tlv2.redhat.com rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725

Version-Release number of selected component (if applicable):

ocp4.11.1

How reproducible:

Steps to Reproduce:

1. Update the env from OCP4.10.26 to OCP4.11.1
2. During the process, check the OCP env status(the operators/nodes etc)
3. Delete the PerformanceProfile, continue check the OCP env status
4. After the upgrade finishes, reapply the PerformanceProfile

Actual results:

In step2, it's found the 2 real time nodes were rebooting in turn; and 3 clusteroperators were unable to complete the upgrade because of that, please refer to the attachment 'oc get clusteroperators.txt' for details.
In step3, the upgrade could finish.
In step4, the 2 real time nodes began to reboot again, i.e. flips between two rendered-worker-rt* currentConfigs and desiredConfigs

Expected results:

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

MCD logs.txt
9 kB
2022/08/29 7:43 AM
oc get clusteroperators.txt
4 kB
2022/08/29 7:49 AM
oc logs tuned-75flp.txt
16 kB
2022/08/29 7:43 AM
PerformanceProfile-rt.yaml
2 kB
2022/08/29 7:44 AM
rendered-worker-rt-1a5dc54b55d1f005ec37240578ec90da.yaml
155 kB
2022/08/30 9:59 AM
rendered-worker-rt-485e0aca2182afaaac3a28c45c29b725.yaml
155 kB
2022/08/30 9:59 AM

Assignee:: Martin Sivak

Reporter:: Nini Gu

Need Info From:: None

Contributors:: None

QA Contact:: Shereen Haj

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2022/08/29 7:54 AM

Updated:: 2025/07/29 11:42 AM

Resolved:: 2023/06/07 1:50 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates