-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.14.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
No
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In 4.14 on azure QE met 2 times cluster installation filed duo to cluster-autoscaler, 1 time upgrade from 4.13 to 4.14 stuck in cluster-autoscaler on nutanix.
Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-10-06-234925
How reproducible:
met 3 times
Steps to Reproduce:
1. Install 4.14 cluster on azure 2. 3.
Actual results:
Cluster installation is failed must-gather: https://drive.google.com/file/d/1vL6bFTLme1sst7p8b6DsxxTnX3P50ISL/view?usp=sharing $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.0-0.nightly-2023-10-06-234925 True False False 159m baremetal 4.14.0-0.nightly-2023-10-06-234925 True False False 3h5m cloud-controller-manager 4.14.0-0.nightly-2023-10-06-234925 True False False 3h8m cloud-credential 4.14.0-0.nightly-2023-10-06-234925 True False False 3h13m cluster-autoscaler config-operator 4.14.0-0.nightly-2023-10-06-234925 True False False 3h6m $ oc logs -f cluster-autoscaler-operator-6dfdc4d855-n8jsv I1008 02:09:49.952203 1 controller.go:219] "msg"="Starting workers" "controller"="machine_autoscaler_controller" "worker count"=1 I1008 02:09:49.961525 1 controller.go:219] "msg"="Starting workers" "controller"="cluster_autoscaler_controller" "worker count"=1 E1008 02:15:59.088613 1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused E1008 02:16:25.090997 1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused E1008 02:16:51.090660 1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused E1008 02:20:48.353228 1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused E1008 02:21:14.354668 1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused E1008 02:21:40.354228 1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
Expected results:
Cluster installation is successful
Additional info:
Similar install issue on Prow CI, https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[...]i-fips-f28-destructive/1710318771433902080 Upgrade stuck on clusterautoscaler operator: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[...]-f28/1710057799020449792?rerun=gh_redirect