Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20218

Cluster install failed due to cluster-autoscaler is not available

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.14.0
    • Cluster Autoscaler
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • No
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      In 4.14 on azure QE met 2 times cluster installation filed duo to cluster-autoscaler, 1 time upgrade from 4.13 to 4.14 stuck in cluster-autoscaler on nutanix.

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-10-06-234925

      How reproducible:

      met 3 times

      Steps to Reproduce:

      1. Install 4.14 cluster on azure
      2. 
      3.
      

      Actual results:

      Cluster installation is failed 
      must-gather: https://drive.google.com/file/d/1vL6bFTLme1sst7p8b6DsxxTnX3P50ISL/view?usp=sharing
      $ oc get co        
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.0-0.nightly-2023-10-06-234925   True        False         False      159m
      baremetal                                  4.14.0-0.nightly-2023-10-06-234925   True        False         False      3h5m
      cloud-controller-manager                   4.14.0-0.nightly-2023-10-06-234925   True        False         False      3h8m
      cloud-credential                           4.14.0-0.nightly-2023-10-06-234925   True        False         False      3h13m
      cluster-autoscaler
      config-operator                            4.14.0-0.nightly-2023-10-06-234925   True        False         False      3h6m
      
      $ oc logs -f cluster-autoscaler-operator-6dfdc4d855-n8jsv                   
      I1008 02:09:49.952203       1 controller.go:219]  "msg"="Starting workers" "controller"="machine_autoscaler_controller" "worker count"=1
      I1008 02:09:49.961525       1 controller.go:219]  "msg"="Starting workers" "controller"="cluster_autoscaler_controller" "worker count"=1
      E1008 02:15:59.088613       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
      E1008 02:16:25.090997       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
      E1008 02:16:51.090660       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
      E1008 02:20:48.353228       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
      E1008 02:21:14.354668       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused
      E1008 02:21:40.354228       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp 172.30.0.1:443: connect: connection refused

      Expected results:

      Cluster installation is successful

      Additional info:

      Similar install issue on Prow CI, https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[...]i-fips-f28-destructive/1710318771433902080
      Upgrade stuck on clusterautoscaler operator: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[...]-f28/1710057799020449792?rerun=gh_redirect

              joelspeed Joel Speed
              rhn-support-zhsun Zhaohua Sun
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: