-
Bug
-
Resolution: Done-Errata
-
Major
-
4.14.0
-
No
-
MCO Sprint 240
-
1
-
Approved
-
False
-
Description of problem:
On attempting to perform EUS->EUS upgrade from 4.12.z->4.14 (CI builds), I am seeing consistently that after upgrade OCP to 4.14, worker machine configpool goes to degraded state, complaining about {noformat}message: 'Node c01-dbn-412-tzm44-worker-0-7w6wg is reporting: "failed to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no such file or directory", Node c01-dbn-412-tzm44-worker-0-cmqsl is reporting: "failed to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no such file or directory", Node c01-dbn-412-tzm44-worker-0-qrp6v is reporting: "failed to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no such file or directory"' {noformat}. And then clusterversion reports error: {noformat} [cloud-user@ocp-psi-executor dbasunag]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.ci-2023-08-14-110508 True True 125m Unable to apply 4.14.0-0.ci-2023-08-14-152624: wait has exceeded 40 minutes for these operators: machine-config [cloud-user@ocp-psi-executor dbasunag]$ {noformat} This is consistently reproducible in clusters with knmstate installed.
Version-Release number of selected component (if applicable):
4.12.29 -> 4.13.0-0.ci-2023-08-14-110508->4.14.0-0.ci-2023-08-14-152624
How reproducible:
100%
Steps to Reproduce:
1. Perform EUS upgrade on a cluster with CNV, ODF, Knmstate 2. After pausing worker mcp, upgraded OCP, ODF, CNV, KNMstate to 4.13 - everything worked fine 3. After upgrading OCP to 4.14, when master mcp is updated, worker mcp went to degraded state and clusterversion eventually reported error (all the master nodes were updated)
Actual results:
[cloud-user@ocp-psi-executor dbasunag]$ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.0-0.ci-2023-08-14-152624 True False False 9h baremetal 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h cloud-controller-manager 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h cloud-credential 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h cluster-autoscaler 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h config-operator 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h console 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h control-plane-machine-set 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h csi-snapshot-controller 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h dns 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h etcd 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h image-registry 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h ingress 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h insights 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h kube-apiserver 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h kube-controller-manager 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h kube-scheduler 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h kube-storage-version-migrator 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h machine-api 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h machine-approver 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h machine-config 4.13.0-0.ci-2023-08-14-110508 True True True 2d23h Unable to apply 4.14.0-0.ci-2023-08-14-152624: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)]] marketplace 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h monitoring 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h network 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h node-tuning 4.14.0-0.ci-2023-08-14-152624 True False False 95m openshift-apiserver 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h openshift-controller-manager 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h openshift-samples 4.14.0-0.ci-2023-08-14-152624 True False False 98m operator-lifecycle-manager 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h operator-lifecycle-manager-catalog 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h operator-lifecycle-manager-packageserver 4.14.0-0.ci-2023-08-14-152624 True False False 2d22h service-ca 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h storage 4.14.0-0.ci-2023-08-14-152624 True False False 2d23h [cloud-user@ocp-psi-executor dbasunag]$ [cloud-user@ocp-psi-executor dbasunag]$ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-693b054330417fe5e098b58716603fc8 True False False 3 3 3 0 2d23h worker rendered-worker-b2f5a9084e9919b4c1c491658c73bce5 False False True 3 0 0 3 2d23h [cloud-user@ocp-psi-executor dbasunag]$ [cloud-user@ocp-psi-executor dbasunag]$ oc get node NAME STATUS ROLES AGE VERSION c01-dbn-412-tzm44-master-0 Ready control-plane,master 2d23h v1.27.4+deb2c60 c01-dbn-412-tzm44-master-1 Ready control-plane,master 2d23h v1.27.4+deb2c60 c01-dbn-412-tzm44-master-2 Ready control-plane,master 2d23h v1.27.4+deb2c60 c01-dbn-412-tzm44-worker-0-7w6wg Ready worker 2d22h v1.25.11+1485cc9 c01-dbn-412-tzm44-worker-0-cmqsl Ready worker 2d22h v1.25.11+1485cc9 c01-dbn-412-tzm44-worker-0-qrp6v Ready worker 2d22h v1.25.11+1485cc9 [cloud-user@ocp-psi-executor dbasunag]$
Expected results:
EUS upgrade should work without error
Additional info:
Must-gather can be found here: https://drive.google.com/drive/folders/1SCZoYpGiRpOteTM-sTLmbfgr3hqsICVO?usp=drive_link