Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45982

Permanent Failing at Autoscaler should use a ClusterAutoscaler that has 100 maximum total nodes count [It] It scales from/to zero

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.19.0
    • Cluster Autoscaler
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Permanent Failing at "Autoscaler should use a ClusterAutoscaler that has 100 maximum total nodes count [It] It scales from/to zero" due to too short "Waiting for machineSet replicas to scale in" of 3 minutes.
      
      We got only one job passed for a while, every run job gets the error "Timed out after 360.001s" Is it possible to increase the time out from 3 to 5 minutes?

      Version-Release number of selected component (if applicable):

      master   

      How reproducible:

      periodic-ci-openshift-cluster-api-actuator-pkg-master-e2e-openstack-operator-periodic    

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-cluster-api-actuator-pkg-master-e2e-openstack-operator-periodic/1865872906618146816/artifacts/e2e-openstack-operator-periodic/test/build-log.txt
      
        STEP: Waiting for machineSet replicas to scale in. Current replicas are 2, expected 0. @ 12/08/24 23:31:01.903
        STEP: Waiting for machineSet replicas to scale in. Current replicas are 2, expected 0. @ 12/08/24 23:31:04.924
        [FAILED] in [It] - /go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:321 @ 12/08/24 23:31:07.944
        [FAILED] in [AfterEach] - /go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:209 @ 12/08/24 23:31:08.415
        [FAILED] in [DeferCleanup (Each)] - /go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:168 @ 12/08/24 23:31:08.431
        << Timeline  [FAILED] Failed to get MachineSet jns7ig4m-9a4e0-tmdhd-a60ff
        Unexpected error:
            <*fmt.wrapError | 0xc000c20e80>: 
            error querying api for machineSet object: Get "https://api.jns7ig4m-9a4e0.shiftstack.devcluster.openshift.com:6443/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/jns7ig4m-9a4e0-tmdhd-a60ff": dial tcp 38.102.83.13:6443: connect: connection refused
            {
                msg: "error querying api for machineSet object: Get \"https://api.jns7ig4m-9a4e0.shiftstack.devcluster.openshift.com:6443/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/jns7ig4m-9a4e0-tmdhd-a60ff\": dial tcp 38.102.83.13:6443: connect: connection refused",
                err: <*url.Error | 0xc001fce630>{
                    Op: "Get",
                    URL: "https://api.jns7ig4m-9a4e0.shiftstack.devcluster.openshift.com:6443/apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/jns7ig4m-9a4e0-tmdhd-a60ff",
                    Err: <*net.OpError | 0xc0004a46e0>{
                        Op: "dial",
                        Net: "tcp",
                        Source: nil,
                        Addr: <*net.TCPAddr | 0xc0020cbe00>{
                            IP: [38, 102, 83, 13],
                            Port: 6443,
                            Zone: "",
                        },
                        Err: <*os.SyscallError | 0xc000c20e40>{
                            Syscall: "connect",
                            Err: <syscall.Errno>0x6f,
                        },
                    },
                },
            }
        occurred
        In [It] at: /go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:321 @ 12/08/24 23:31:07.944    

      Expected results:

      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-cluster-api-actuator-pkg-master-e2e-openstack-operator-periodic/1852357483125805056/artifacts/e2e-openstack-operator-periodic/test/build-log.txt
      
        STEP: Waiting for machineSet replicas to scale in. Current replicas are 2, expected 0. @ 11/01/24 16:17:51.842
        STEP: Waiting for machineSet replicas to scale in. Current replicas are 0, expected 0. @ 11/01/24 16:17:54.863
        STEP: Stopping Cluster Autoscaler event watcher @ 11/01/24 16:17:54.863
        STEP: Waiting for ClusterAutoscaler to delete. @ 11/01/24 16:17:54.863
        STEP: Waiting for MachineSets to be deleted... @ 11/01/24 16:17:58.216
        << Timeline    

      Additional info:

      failed job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-cluster-api-actuator-pkg-master-e2e-openstack-operator-periodic/1865872906618146816
      
      passed job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-cluster-api-actuator-pkg-master-e2e-openstack-operator-periodic/1852357483125805056    

              jkyros@redhat.com John Kyros
              rh-ee-mcarpio Miguel Carpio
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: