Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13.z
Component/s: Node / Kubelet
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Attempted upgrade of 3480 SNOs that were deployed from 4.13.11 to 4.14.0-rc.0 and 2 SNOs ended up with a degraded etcd cluster operator and partial upgrade.

Example:
# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.11   True        True          15h     Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: etcd # oc get co
NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.11       True        False         False      15h     
baremetal                                  4.13.11       True        False         False      36h     
cloud-controller-manager                   4.13.11       True        False         False      36h     
cloud-credential                           4.13.11       True        False         False      36h     
cluster-autoscaler                         4.13.11       True        False         False      36h     
config-operator                            4.14.0-rc.0   True        False         False      36h     
console                                    4.13.11       True        False         False      36h     
control-plane-machine-set                  4.13.11       True        False         False      36h     
csi-snapshot-controller                    4.13.11       True        False         False      36h     
dns                                        4.13.11       True        False         False      36h     
etcd                                       4.13.11       True        True          True       36h     MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "etcd" in namespace: "openshift-etcd" for revision: 4 on node: "vm02837" didn't show up, waited: 3m30s
image-registry                             4.13.11       True        False         False      36h     
ingress                                    4.13.11       True        False         False      36h     
insights                                   4.13.11       True        False         False      36h     
kube-apiserver                             4.14.0-rc.0   True        False         False      36h     
kube-controller-manager                    4.13.11       True        False         False      36h     
kube-scheduler                             4.13.11       True        False         False      36h     
kube-storage-version-migrator              4.13.11       True        False         False      36h     
machine-api                                4.13.11       True        False         False      36h     
machine-approver                           4.13.11       True        False         False      36h     
machine-config                             4.13.11       True        False         False      36h     
marketplace                                4.13.11       True        False         False      36h     
monitoring                                 4.13.11       True        False         False      36h     
network                                    4.13.11       True        False         False      36h     
node-tuning                                4.13.11       True        False         False      36h     
openshift-apiserver                        4.13.11       True        False         False      13h     
openshift-controller-manager               4.13.11       True        False         False      13h     
openshift-samples                          4.13.11       True        False         False      36h     
operator-lifecycle-manager                 4.13.11       True        False         False      36h     
operator-lifecycle-manager-catalog         4.13.11       True        False         False      36h     
operator-lifecycle-manager-packageserver   4.13.11       True        False         False      36h     
service-ca                                 4.13.11       True        False         False      36h     
storage                                    4.13.11       True        False         False      36h

Version-Release number of selected component (if applicable):

SNO OCP (managed clusters being upgraded) 4.13.11 upgraded to 4.14.0-rc.0
Hub OCP 4.13.12
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

Rare (2 out of 3480), represents 2 out of the 41 failed upgrades (~4.8% of failures)

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Restarting crio resolves the issue.  Maybe related to this bug - https://issues.redhat.com/browse/OCPBUGS-2474

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

vm02837-etcd-upgrade-fail.tar.gz
35.52 MB
2023/09/18 12:49 PM

Assignee:: Node Team Bot Account

Reporter:: Alex Krzos

Need Info From:: None

Contributors:: None

QA Contact:: Ge Liu

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/09/15 6:47 PM

Updated:: 2025/07/25 11:58 AM

Resolved:: 2023/09/18 2:42 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates