-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.15
-
Critical
-
No
-
MCO Sprint 255, MCO Sprint 256
-
2
-
False
-
-
-
Bug Fix
-
In Progress
This is a clone of issue OCPBUGS-28974. The following is the description of the original issue:
—
Description of problem:
Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15
Version-Release number of selected component (if applicable):
Upgrade from 4.1 to 4.15 4.1.41-x86_64, 4.2.36-x86_64, 4.3.40-x86_64, 4.4.33-x86_64, 4.5.41-x86_64, 4.6.62-x86_64, 4.7.60-x86_64, 4.8.57-x86_64, 4.9.59-x86_64, 4.10.67-x86_64, 4.11 nightly, 4.12 nightly, 4.13 nightly, 4.14 nightly, 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest
How reproducible:
Seems always, the issue was found in our prow ci, and I also reproduce it.
Steps to Reproduce:
1.Create an aws IPI 4.1 cluster, then upgrade it one by one to 4.14 liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2024-01-19-110702 True True 26m Working towards 4.12.0-0.nightly-2024-02-04-062856: 654 of 830 done (78% complete), waiting on authentication, openshift-apiserver, openshift-controller-manager liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2024-02-04-062856 True False 5m12s Cluster version is 4.12.0-0.nightly-2024-02-04-062856 liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2024-02-04-062856 True True 61m Working towards 4.13.0-0.nightly-2024-02-04-042638: 713 of 841 done (84% complete), waiting up to 40 minutes on machine-config liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2024-02-04-042638 True False 10m Cluster version is 4.13.0-0.nightly-2024-02-04-042638 liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-0.nightly-2024-02-04-042638 True True 17m Working towards 4.14.0-0.nightly-2024-02-02-173828: 233 of 860 done (27% complete), waiting on control-plane-machine-set, machine-api liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2024-02-02-173828 True False 18m Cluster version is 4.14.0-0.nightly-2024-02-02-173828 2.When it upgrade to 4.14, check the machine scale successfully liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa created liuhuali@Lius-MacBook-Pro huali-test % oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a 1 1 1 1 14h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa 0 0 3s ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f 2 2 2 2 14h liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa --replicas=1 machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa scaled liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-trzci0vq-8a8c4-dq95h-master-0 Running m6a.xlarge us-east-1 us-east-1f 15h ci-op-trzci0vq-8a8c4-dq95h-master-1 Running m6a.xlarge us-east-1 us-east-1a 15h ci-op-trzci0vq-8a8c4-dq95h-master-2 Running m6a.xlarge us-east-1 us-east-1f 15h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt Running m6a.xlarge us-east-1 us-east-1a 15h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa-mt9kh Running m6a.xlarge us-east-1 us-east-1a 15m ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k Running m6a.xlarge us-east-1 us-east-1f 15h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb Running m6a.xlarge us-east-1 us-east-1f 15h liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-128-51.ec2.internal Ready master 15h v1.27.10+28ed2d7 ip-10-0-143-198.ec2.internal Ready worker 14h v1.27.10+28ed2d7 ip-10-0-143-64.ec2.internal Ready worker 14h v1.27.10+28ed2d7 ip-10-0-143-80.ec2.internal Ready master 15h v1.27.10+28ed2d7 ip-10-0-144-123.ec2.internal Ready master 15h v1.27.10+28ed2d7 ip-10-0-147-94.ec2.internal Ready worker 14h v1.27.10+28ed2d7 ip-10-0-158-61.ec2.internal Ready worker 3m40s v1.27.10+28ed2d7 liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa --replicas=0 machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa scaled liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-128-51.ec2.internal Ready master 15h v1.27.10+28ed2d7 ip-10-0-143-198.ec2.internal Ready worker 15h v1.27.10+28ed2d7 ip-10-0-143-64.ec2.internal Ready worker 15h v1.27.10+28ed2d7 ip-10-0-143-80.ec2.internal Ready master 15h v1.27.10+28ed2d7 ip-10-0-144-123.ec2.internal Ready master 15h v1.27.10+28ed2d7 ip-10-0-147-94.ec2.internal Ready worker 15h v1.27.10+28ed2d7 liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-trzci0vq-8a8c4-dq95h-master-0 Running m6a.xlarge us-east-1 us-east-1f 15h ci-op-trzci0vq-8a8c4-dq95h-master-1 Running m6a.xlarge us-east-1 us-east-1a 15h ci-op-trzci0vq-8a8c4-dq95h-master-2 Running m6a.xlarge us-east-1 us-east-1f 15h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt Running m6a.xlarge us-east-1 us-east-1a 15h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k Running m6a.xlarge us-east-1 us-east-1f 15h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb Running m6a.xlarge us-east-1 us-east-1f 15h liuhuali@Lius-MacBook-Pro huali-test % oc delete machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa machineset.machine.openshift.io "ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1aa" deleted liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2024-02-02-173828 True False 43m Cluster version is 4.14.0-0.nightly-2024-02-02-173828 3.Upgrade to 4.15 As upgrade to 4.15 nightly stuck on operator-lifecycle-manager-packageserver which is a bug https://issues.redhat.com/browse/OCPBUGS-28744 so I build image with the fix pr (job build openshift/operator-framework-olm#679 succeeded) and upgrade to the image, upgrade successfully liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2024-02-02-173828 True True 7s Working towards 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest: 10 of 875 done (1% complete) liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False 23m Cluster version is 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest liuhuali@Lius-MacBook-Pro huali-test % oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 9h baremetal 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 11h cloud-controller-manager 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 8h cloud-credential 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h cluster-autoscaler 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h config-operator 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 13h console 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 3h19m control-plane-machine-set 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 5h csi-snapshot-controller 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 7h10m dns 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 9h etcd 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 14h image-registry 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 33m ingress 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 9h insights 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h kube-apiserver 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 14h kube-controller-manager 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 14h kube-scheduler 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 14h kube-storage-version-migrator 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 34m machine-api 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h machine-approver 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 13h machine-config 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 10h marketplace 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 10h monitoring 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 9h network 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h node-tuning 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 56m openshift-apiserver 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 9h openshift-controller-manager 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 4h56m openshift-samples 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 58m operator-lifecycle-manager 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h operator-lifecycle-manager-catalog 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h operator-lifecycle-manager-packageserver 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 57m service-ca 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 16h storage 4.15.0-0.ci.test-2024-02-05-022753-ci-ln-7mxfqgt-latest True False False 9h liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-trzci0vq-8a8c4-dq95h-master-0 Running m6a.xlarge us-east-1 us-east-1f 16h ci-op-trzci0vq-8a8c4-dq95h-master-1 Running m6a.xlarge us-east-1 us-east-1a 16h ci-op-trzci0vq-8a8c4-dq95h-master-2 Running m6a.xlarge us-east-1 us-east-1f 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt Running m6a.xlarge us-east-1 us-east-1a 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k Running m6a.xlarge us-east-1 us-east-1f 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb Running m6a.xlarge us-east-1 us-east-1f 16h 4.Check machine scale stuck in Provisioned, no csr pending liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 created liuhuali@Lius-MacBook-Pro huali-test % oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a 1 1 1 1 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 0 0 6s ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f 2 2 2 2 16h liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 --replicas=1 machineset.machine.openshift.io/ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1 scaled liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-trzci0vq-8a8c4-dq95h-master-0 Running m6a.xlarge us-east-1 us-east-1f 16h ci-op-trzci0vq-8a8c4-dq95h-master-1 Running m6a.xlarge us-east-1 us-east-1a 16h ci-op-trzci0vq-8a8c4-dq95h-master-2 Running m6a.xlarge us-east-1 us-east-1f 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt Running m6a.xlarge us-east-1 us-east-1a 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877 Provisioning m6a.xlarge us-east-1 us-east-1a 4s ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k Running m6a.xlarge us-east-1 us-east-1f 16h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb Running m6a.xlarge us-east-1 us-east-1f 16h liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE ci-op-trzci0vq-8a8c4-dq95h-master-0 Running m6a.xlarge us-east-1 us-east-1f 18h ci-op-trzci0vq-8a8c4-dq95h-master-1 Running m6a.xlarge us-east-1 us-east-1a 18h ci-op-trzci0vq-8a8c4-dq95h-master-2 Running m6a.xlarge us-east-1 us-east-1f 18h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a-pqnqt Running m6a.xlarge us-east-1 us-east-1a 18h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877 Provisioned m6a.xlarge us-east-1 us-east-1a 97m ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-h2f9k Running m6a.xlarge us-east-1 us-east-1f 18h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f-lgmjb Running m6a.xlarge us-east-1 us-east-1f 18h ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1f1-4ln47 Provisioned m6a.xlarge us-east-1 us-east-1f 50m liuhuali@Lius-MacBook-Pro huali-test % oc get node NAME STATUS ROLES AGE VERSION ip-10-0-128-51.ec2.internal Ready master 18h v1.28.6+a373c1b ip-10-0-143-198.ec2.internal Ready worker 18h v1.28.6+a373c1b ip-10-0-143-64.ec2.internal Ready worker 18h v1.28.6+a373c1b ip-10-0-143-80.ec2.internal Ready master 18h v1.28.6+a373c1b ip-10-0-144-123.ec2.internal Ready master 18h v1.28.6+a373c1b ip-10-0-147-94.ec2.internal Ready worker 18h v1.28.6+a373c1b liuhuali@Lius-MacBook-Pro huali-test % oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-596n7 21m kubernetes.io/kube-apiserver-client-kubelet system:node:ip-10-0-147-94.ec2.internal <none> Approved,Issued csr-7nr9m 42m kubernetes.io/kubelet-serving system:node:ip-10-0-147-94.ec2.internal <none> Approved,Issued csr-bc9n7 16m kubernetes.io/kube-apiserver-client-kubelet system:node:ip-10-0-128-51.ec2.internal <none> Approved,Issued csr-dmk27 18m kubernetes.io/kubelet-serving system:node:ip-10-0-128-51.ec2.internal <none> Approved,Issued csr-ggkgd 64m kubernetes.io/kube-apiserver-client-kubelet system:node:ip-10-0-143-198.ec2.internal <none> Approved,Issued csr-rs9cz 70m kubernetes.io/kubelet-serving system:node:ip-10-0-143-80.ec2.internal <none> Approved,Issued liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
Machine stuck in Provisioned
Expected results:
Machine should get Running
Additional info:
Must gather: https://drive.google.com/file/d/1TrZ_mb-cHKmrNMsuFl9qTdYo_eNPuF_l/view?usp=sharing I can see the provisioned machine on AWS console: https://drive.google.com/file/d/1-OcsmvfzU4JBeGh5cil8P2Hoe5DQsmqF/view?usp=sharing System log of ci-op-trzci0vq-8a8c4-dq95h-worker-us-east-1a1-5g877: https://drive.google.com/file/d/1spVT_o0S4eqeQxE5ivttbAazCCuSzj1e/view?usp=sharing Some log on the instance: https://drive.google.com/file/d/1zjxPxm61h4L6WVHYv-w7nRsSz5Fku26w/view?usp=sharing
- blocks
-
OCPBUGS-36769 Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15
- Closed
- clones
-
OCPBUGS-28974 Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15
- Closed
- is blocked by
-
OCPBUGS-28974 Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15
- Closed
- is cloned by
-
OCPBUGS-36769 Machine stuck in Provisioned when the cluster is upgraded from 4.1 to 4.15
- Closed
- links to
-
RHBA-2024:4469 OpenShift Container Platform 4.16.z bug fix update