-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.16.0, 4.17.0
Description of problem:
kube-apiserver was stuck in updating versions when upgrade from 4.1 to 4.16 with AWS ipi installation
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-2024-05-01-111315
How reproducible:
always
Steps to Reproduce:
1. IPI Install an AWS 4.1 cluster, upgrade it to 4.16
2. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating
Actual results:
1. Upgrade was stuck in 4.15 to 4.16, waiting on etcd, kube-apiserver updating
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.15.0-0.nightly-2024-05-16-091947 True True 39m Working towards 4.16.0-0.nightly-2024-05-16-092402: 111 of 894 done (12% complete)
Expected results:
Upgrade should be successful.
Additional info:
Must-gather: https://gcsweb-qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.1-aws-ipi-f30/1791391925467615232/artifacts/aws-ipi-f30/gather-must-gather/artifacts/must-gather.tar
Checked the must-gather logs,
$ omg get clusterversion -oyaml
...
conditions:
- lastTransitionTime: '2024-05-17T09:35:29Z'
message: Done applying 4.15.0-0.nightly-2024-05-16-091947
status: 'True'
type: Available
- lastTransitionTime: '2024-05-18T06:31:41Z'
message: 'Multiple errors are preventing progress:
* Cluster operator kube-apiserver is updating versions
* Could not update flowschema "openshift-etcd-operator" (82 of 894): the server
does not recognize this resource, check extension API servers'
reason: MultipleErrors
status: 'True'
type: Failing
$ omg get co | grep -v '.*True.*False.*False'
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
kube-apiserver 4.15.0-0.nightly-2024-05-16-091947 True True False 10m
$ omg get pod -n openshift-kube-apiserver
NAME READY STATUS RESTARTS AGE
installer-40-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 2h29m
installer-41-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 2h25m
installer-43-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 2h22m
installer-44-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 1h35m
kube-apiserver-guard-ip-10-0-136-146.ec2.internal 1/1 Running 0 2h24m
kube-apiserver-guard-ip-10-0-143-206.ec2.internal 1/1 Running 0 2h24m
kube-apiserver-guard-ip-10-0-154-116.ec2.internal 0/1 Running 0 2h24m
kube-apiserver-ip-10-0-136-146.ec2.internal 5/5 Running 0 2h27m
kube-apiserver-ip-10-0-143-206.ec2.internal 5/5 Running 0 2h24m
kube-apiserver-ip-10-0-154-116.ec2.internal 4/5 Running 17 1h34m
revision-pruner-39-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 2h44m
revision-pruner-39-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 2h50m
revision-pruner-39-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 2h52m
revision-pruner-40-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 2h29m
revision-pruner-40-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 2h29m
revision-pruner-40-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 2h29m
revision-pruner-41-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 2h26m
revision-pruner-41-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 2h26m
revision-pruner-41-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 2h26m
revision-pruner-42-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 2h24m
revision-pruner-42-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 2h23m
revision-pruner-42-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 2h23m
revision-pruner-43-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 2h23m
revision-pruner-43-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 2h23m
revision-pruner-43-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 2h23m
revision-pruner-44-ip-10-0-136-146.ec2.internal 0/1 Succeeded 0 1h35m
revision-pruner-44-ip-10-0-143-206.ec2.internal 0/1 Succeeded 0 1h35m
revision-pruner-44-ip-10-0-154-116.ec2.internal 0/1 Succeeded 0 1h35m
Checked the kube-apiserver kube-apiserver-ip-10-0-154-116.ec2.internal logs, seems something wring with informers,
$ grep 'informers not started yet' current.log | wc -l
360
$ grep 'informers not started yet' current.log
2024-05-18T06:34:51.888804183Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.Secret *v1.FlowSchema *v1.ConfigMap]
2024-05-18T06:34:51.889350484Z [-]informer-sync failed: 4 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema *v1.Secret *v1.ConfigMap]
2024-05-18T06:34:52.004808401Z [-]informer-sync failed: 2 informers not started yet: [*v1.FlowSchema *v1.PriorityLevelConfiguration]
2024-05-18T06:34:52.095516498Z [-]informer-sync failed: 2 informers not started yet: [*v1.PriorityLevelConfiguration *v1.FlowSchema]
...
- blocks
-
OCPBUGS-34408 [Upgrade] kube-apiserver stuck in updating versions when upgrade from old releases
-
- Closed
-
- is blocked by
-
API-1813 Impact kube-apiserver stuck in updating versions when upgrade from old releases
-
- Closed
-
- is cloned by
-
OCPBUGS-34408 [Upgrade] kube-apiserver stuck in updating versions when upgrade from old releases
-
- Closed
-
- links to
-
RHEA-2024:3718
OpenShift Container Platform 4.17.z bug fix update