-
Bug
-
Resolution: Obsolete
-
Undefined
-
None
-
4.13.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
NHE Sprint 235
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
OCP upgrade from 4.12.11->4.13.0 did not complete due to unknown error on a bare metal cluster with fip enabled and SRIOV operator
Version-Release number of selected component (if applicable):
4.12.11->4.13.0-rc.2-x86_64
How reproducible:
seen once against a cluster with fip enabled and sriov operator installed
Steps to Reproduce:
1. Tried upgrading a OCP 4.12.11/CNV4.12.3 cluster to 4.13.0-rc.2-x86_64 2. 3.
Actual results:
All the master nodes were updated fine, two of the worker nodes stayed tainted:
================
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
cnv-qe-infra-29.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 8h v1.26.2+dc93b13
cnv-qe-infra-30.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 8h v1.26.2+dc93b13
cnv-qe-infra-31.cnvqe2.lab.eng.rdu2.redhat.com Ready control-plane,master 8h v1.26.2+dc93b13
cnv-qe-infra-32.cnvqe2.lab.eng.rdu2.redhat.com Ready worker 7h21m v1.26.2+dc93b13
cnv-qe-infra-33.cnvqe2.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled worker 7h20m v1.26.2+dc93b13
cnv-qe-infra-34.cnvqe2.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled worker 7h20m v1.26.2+dc93b13
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
==================
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get mcp
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
master rendered-master-4193256bfd798c06fc09b2787927c3f5 True False False 3 3 3 0 8h
worker rendered-worker-84bfa5c08e63b044134da899b133c96f False False False 3 1 3 0 8h
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
===================
Worker mcp reports this:
===================
lastTransitionTime: "2023-04-11T20:36:49Z"
message: Pool is paused; will not update to rendered-worker-84bfa5c08e63b044134da899b133c96f
reason: ""
status: "False"
type: Updating
===================
CO reports this:
===================
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$ oc get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.13.0-rc.2 True False False 70m
cloud-controller-manager 4.13.0-rc.2 True False False 8h
cloud-credential 4.13.0-rc.2 True False False 8h
cluster-autoscaler 4.13.0-rc.2 True False False 8h
config-operator 4.13.0-rc.2 True False False 8h
console 4.13.0-rc.2 True False False 7h25m
control-plane-machine-set 4.13.0-rc.2 True False False 8h
csi-snapshot-controller 4.13.0-rc.2 True False False 8h
dns 4.13.0-rc.2 True False False 8h
etcd 4.13.0-rc.2 True False False 8h
image-registry 4.13.0-rc.2 True False False 112m
ingress 4.13.0-rc.2 True True True 112m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: DeploymentReplicasAllAvailable=False (DeploymentReplicasNotAvailable: 1/2 of replicas are available)
insights 4.13.0-rc.2 True False False 8h
kube-apiserver 4.13.0-rc.2 True False False 7h53m
kube-controller-manager 4.13.0-rc.2 True False False 8h
kube-scheduler 4.13.0-rc.2 True False False 8h
kube-storage-version-migrator 4.13.0-rc.2 True False False 174m
machine-api 4.13.0-rc.2 True False False 7h30m
machine-approver 4.13.0-rc.2 True False False 8h
machine-config 4.13.0-rc.2 True False False 138m
marketplace 4.13.0-rc.2 True False False 8h
monitoring 4.13.0-rc.2 False True True 99m reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: got 1 unavailable replicas
network 4.13.0-rc.2 True False False 8h
node-tuning 4.13.0-rc.2 True False False 3h45m
openshift-apiserver 4.13.0-rc.2 True False False 124m
openshift-controller-manager 4.13.0-rc.2 True False False 8h
openshift-samples 4.13.0-rc.2 True False False 3h47m
operator-lifecycle-manager 4.13.0-rc.2 True False False 8h
operator-lifecycle-manager-catalog 4.13.0-rc.2 True False False 8h
operator-lifecycle-manager-packageserver 4.13.0-rc.2 True False False 7h57m
service-ca 4.13.0-rc.2 True False False 8h
storage 4.13.0-rc.2 True False False 8h
[cnv-qe-jenkins@cnv-qe-infra-01 ~]$
======================
Expected results:
Upgrade to complete successfully.
Additional info:
Must gather is saved here: https://drive.google.com/drive/folders/11agooCxc0fUX9_utLTFonCoembhS-9mY?usp=share_link