-
Bug
-
Resolution: Done
-
Undefined
-
4.15, 4.16
-
Important
-
No
-
CLOUD Sprint 252
-
1
-
Rejected
-
False
-
-
-
Enhancement
-
Done
Description of problem:
Upgrade from 4.15 to 4.16 is failing because kubelet reports this error: Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411346 7755 kubelet.go:308] "Adding static pod path" path="/etc/kubernetes/manifests" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411380 7755 file.go:69] "Watching path" path="/etc/kubernetes/manifests" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411406 7755 kubelet.go:319] "Adding apiserver pod source" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411426 7755 apiserver.go:42] "Waiting for node sync before watching apiserver pods" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.414274 7755 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="cri-o" version="1.28.4-4.rhaos4.15.git92d1839.el8" apiVersion="v1" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: E0315 17:03:31.414963 7755 kuberuntime_manager.go:273] "Failed to register CRI auth plugins" err="plugin binary executable /usr/libexec/kubelet-image-credential-provider-plugins/acr-credential-provider did not exist" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Failed with result 'exit-code'. Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: Failed to start Kubernetes Kubelet. Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Consumed 155ms CPU time We have seen this issue in prow job periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-workers-rhel8-f28 (a cluster with rhel workers) and in manual upgrades in IPI on GCP clusters (a cluster with coreos workers).
Version-Release number of selected component (if applicable):
Upgrade from 4.15.3 to 4.16.0-0.nightly-2024-03-13-061822 oc get clusterversion -o yaml ... history: - acceptedRisks: |- Target release version="" image="registry.build04.ci.openshift.org/ci-op-wb5fkm5k/release@sha256:da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b" cannot be verified, but continuing anyway because the update was forced: unable to verify sha256:da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b against keyrings: verifier-public-key-redhat [2024-03-15T15:33:11Z: prefix sha256-da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b in config map signatures-managed: no more signatures to check, 2024-03-15T15:33:11Z: unable to retrieve signature from https://storage.googleapis.com/openshift-release/official/signatures/openshift/release/sha256=da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b/signature-1: no more signatures to check, 2024-03-15T15:33:11Z: unable to retrieve signature from https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b/signature-1: no more signatures to check, 2024-03-15T15:33:11Z: parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2024-03-15T15:33:11Z: serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check, 2024-03-15T15:33:11Z: serial signature store wrapping config maps in openshift-config-managed with label "release.openshift.io/verification-signatures", serial signature store wrapping ClusterVersion signatureStores unset, falling back to default stores, parallel signature store wrapping containers/image signature store under https://storage.googleapis.com/openshift-release/official/signatures/openshift/release, containers/image signature store under https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release: no more signatures to check] Precondition "ClusterVersionRecommendedUpdate" failed because of "UnknownUpdate": RetrievedUpdates=True (), so the update from 4.15.3 to 4.16.0-0.nightly-2024-03-13-061822 is probably neither recommended nor supported. completionTime: null image: registry.build04.ci.openshift.org/ci-op-wb5fkm5k/release@sha256:da22f0582a13f19aae1792c6de2e3cc348c3ed1af67c1fbb5a9960833931341b startedTime: "2024-03-15T15:33:28Z" state: Partial verified: false version: 4.16.0-0.nightly-2024-03-13-061822 - completionTime: "2024-03-15T13:33:08Z" image: registry.build04.ci.openshift.org/ci-op-wb5fkm5k/release@sha256:8e8c6c2645553e6df8eb7985d8cb322f333a4152453e2aa85fff24ac5e0755b0 startedTime: "2024-03-15T13:02:04Z" state: Completed verified: false version: 4.15.3
How reproducible:
Always
Steps to Reproduce:
1. Upgrade from 4.15 to 4.16 using prow job periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.15-azure-ipi-workers-rhel8-f28 or an IPI on GCP cluster.
Actual results:
Worker nodes do not join the cluster when they are rebooted: sh-4.4$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-b566c3af4e215e2a77e6f9d9e5a988de True False False 3 3 3 0 3h59m worker rendered-worker-21862c92d0f14a4842f6093f65571bd1 False True False 3 0 0 0 3h59m sh-4.4$ oc get nodes NAME STATUS ROLES AGE VERSION ci-op-wb5fkm5k-e450c-s6m96-master-0 Ready control-plane,master 4h5m v1.29.2+a0beecc ci-op-wb5fkm5k-e450c-s6m96-master-1 Ready control-plane,master 4h6m v1.29.2+a0beecc ci-op-wb5fkm5k-e450c-s6m96-master-2 Ready control-plane,master 4h6m v1.29.2+a0beecc ci-op-wb5fkm5k-e450c-s6m96-rhel-1 NotReady,SchedulingDisabled worker 3h17m v1.28.7+6e2789b ci-op-wb5fkm5k-e450c-s6m96-rhel-2 Ready worker 3h17m v1.28.7+6e2789b ci-op-wb5fkm5k-e450c-s6m96-rhel-3 Ready worker 3h17m v1.28.7+6e2789b In the NotReady node we can see this error in kubelet Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411346 7755 kubelet.go:308] "Adding static pod path" path="/etc/kubernetes/manifests" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411380 7755 file.go:69] "Watching path" path="/etc/kubernetes/manifests" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411406 7755 kubelet.go:319] "Adding apiserver pod source" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.411426 7755 apiserver.go:42] "Waiting for node sync before watching apiserver pods" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: I0315 17:03:31.414274 7755 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="cri-o" version="1.28.4-4.rhaos4.15.git92d1839.el8" apiVersion="v1" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 kubenswrapper[7755]: E0315 17:03:31.414963 7755 kuberuntime_manager.go:273] "Failed to register CRI auth plugins" err="plugin binary executable /usr/libexec/kubelet-image-credential-provider-plugins/acr-credential-provider did not exist" Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Failed with result 'exit-code'. Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: Failed to start Kubernetes Kubelet. Mar 15 17:03:31 ci-op-wb5fkm5k-e450c-s6m96-rhel-1 systemd[1]: kubelet.service: Consumed 155ms CPU time
Expected results:
The upgrade should be executed without failures
Additional info:
In the first comment you can find the must-gather file and the journal.logs
- depends on
-
OCPBUGS-32057 Update RCHOS to include credential provider package
- Closed
- is blocked by
-
OCPBUGS-32057 Update RCHOS to include credential provider package
- Closed
-
OCPCLOUD-2582 Impact Upgrade from 4.15 to 4.16 fails because of kubelet reporting "Failed to register CRI auth plugins" error
- Closed
- links to
-
RHBA-2024:2068 OpenShift Container Platform 4.15.z bug fix update
(5 links to)