Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Compliance Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
NHE Sprint 235
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

 OCP upgrade from 4.12.11->4.13.0 did not complete due to unknown error on a bare metal cluster with fip enabled and SRIOV operator

Version-Release number of selected component (if applicable):

4.12.11->4.13.0-rc.2-x86_64

How reproducible:

seen once against a cluster with fip enabled and sriov operator installed

Steps to Reproduce:

1. Tried upgrading a OCP 4.12.11/CNV4.12.3 cluster to 4.13.0-rc.2-x86_64
2.
3.

Actual results:

All the master nodes were updated fine, two of the worker nodes stayed tainted:
================
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get nodes
NAME                                             STATUS                     ROLES                  AGE     VERSION
cnv-qe-infra-29.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      control-plane,master   8h      v1.26.2+dc93b13
cnv-qe-infra-30.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      control-plane,master   8h      v1.26.2+dc93b13
cnv-qe-infra-31.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      control-plane,master   8h      v1.26.2+dc93b13
cnv-qe-infra-32.cnvqe2.lab.eng.rdu2.redhat.com   Ready                      worker                 7h21m   v1.26.2+dc93b13
cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   worker                 7h20m   v1.26.2+dc93b13
cnv-qe-infra-34.cnvqe2.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   worker                 7h20m   v1.26.2+dc93b13
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
==================
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-4193256bfd798c06fc09b2787927c3f5   True      False      False      3              3                   3                     0                      8h
worker   rendered-worker-84bfa5c08e63b044134da899b133c96f   False     False      False      3              1                   3                     0                      8h
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ 
===================
Worker mcp reports this:
===================
 lastTransitionTime: "2023-04-11T20:36:49Z"
      message: Pool is paused; will not update to rendered-worker-84bfa5c08e63b044134da899b133c96f
      reason: ""
      status: "False"
      type: Updating
===================
CO reports this:
===================
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get co
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-rc.2   True        False         False      70m     
cloud-controller-manager                   4.13.0-rc.2   True        False         False      8h      
cloud-credential                           4.13.0-rc.2   True        False         False      8h      
cluster-autoscaler                         4.13.0-rc.2   True        False         False      8h      
config-operator                            4.13.0-rc.2   True        False         False      8h      
console                                    4.13.0-rc.2   True        False         False      7h25m   
control-plane-machine-set                  4.13.0-rc.2   True        False         False      8h      
csi-snapshot-controller                    4.13.0-rc.2   True        False         False      8h      
dns                                        4.13.0-rc.2   True        False         False      8h      
etcd                                       4.13.0-rc.2   True        False         False      8h      
image-registry                             4.13.0-rc.2   True        False         False      112m    
ingress                                    4.13.0-rc.2   True        True          True       112m    The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
insights                                   4.13.0-rc.2   True        False         False      8h      
kube-apiserver                             4.13.0-rc.2   True        False         False      7h53m   
kube-controller-manager                    4.13.0-rc.2   True        False         False      8h      
kube-scheduler                             4.13.0-rc.2   True        False         False      8h      
kube-storage-version-migrator              4.13.0-rc.2   True        False         False      174m    
machine-api                                4.13.0-rc.2   True        False         False      7h30m   
machine-approver                           4.13.0-rc.2   True        False         False      8h      
machine-config                             4.13.0-rc.2   True        False         False      138m    
marketplace                                4.13.0-rc.2   True        False         False      8h      
monitoring                                 4.13.0-rc.2   False       True          True       99m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas
network                                    4.13.0-rc.2   True        False         False      8h      
node-tuning                                4.13.0-rc.2   True        False         False      3h45m   
openshift-apiserver                        4.13.0-rc.2   True        False         False      124m    
openshift-controller-manager               4.13.0-rc.2   True        False         False      8h      
openshift-samples                          4.13.0-rc.2   True        False         False      3h47m   
operator-lifecycle-manager                 4.13.0-rc.2   True        False         False      8h      
operator-lifecycle-manager-catalog         4.13.0-rc.2   True        False         False      8h      
operator-lifecycle-manager-packageserver   4.13.0-rc.2   True        False         False      7h57m   
service-ca                                 4.13.0-rc.2   True        False         False      8h      
storage                                    4.13.0-rc.2   True        False         False      8h      
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ 
======================

Expected results:

Upgrade to complete successfully.

Additional info:

Must gather is saved here: https://drive.google.com/drive/folders/11agooCxc0fUX9_utLTFonCoembhS-9mY?usp=share_link

Assignee:: Vincent Shen

Reporter:: Debarati Basu-Nag

Need Info From:: None

Contributors:: None

QA Contact:: Sergio Regidor de la Rosa

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/04/11 10:39 PM

Updated:: 2025/07/27 6:08 AM

Resolved:: 2024/08/12 8:33 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates