Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-961

The e2e-nutanix-operator test runs failed due to slow LTS network/enviornment

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      The ci/prow/e2e-nutanix-operator test runs failed with both 4.11 and 4.12. The test runs seemed failed at different test cases randomly. When running the test suites manually with the OCP cluster deployed with the LTS environment, it showed that the failures may be caused by the slow LTS network (DNS server). 

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      The ci/prow/e2e-nutanix-operator test runs always failed with 4.11 and 4.12

      Steps to Reproduce:

      Trigger the ci/prow/e2e-nutanix-operator test run with 4.11 or 4.12. Or manually run the actuator-pkg test suites with 

      Actual results:

      The ci/prow/e2e-nutanix-operator test runs failed at different test cases randomly.

      Expected results:

      The ci/prow/e2e-nutanix-operator test runs pass successfully.

      Additional info:

      Slack thread https://coreos.slack.com/archives/C0211848DBN/p1659363922100509
      
      When running the actuator-pkg tests manually with the OCP cluster deployed to the LTS-dev environment, I got the below test failure:
      ------------------------------
      [Feature:Machines] Managed cluster should
        recover from deleted worker machines
        /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/infra/infra.go:224
      I0729 18:10:15.329330   17649 request.go:601] Waited for 1.050859554s due to client-side throttling, not priority and fairness, request: GET:https://api.nutanix-dev.devcluster.openshift.com:6443/apis/monitoring.coreos.com/v1?timeout=32s
      STEP: Creating a new MachineSet
      E0729 18:10:49.218277   17649 machinesets.go:319] found 1 Machines in failed phase:
      E0729 18:10:49.218296   17649 machinesets.go:329] Failed machine: nutanix-dev-fxq6fkhm65-xkwzb, Reason: InvalidConfiguration, Message: nutanix-dev-fxq6fkhm65-xkwzb: failed in validating machine providerSpec: spec.providerSpec.value.cluster.uuid: Invalid value: “0005d9a4-8e4f-7c33-58d1-e9d0e2d48853”: Failed to find cluster with uuid 0005d9a4-8e4f-7c33-58d1-e9d0e2d48853. error: Get “https://prismcentral.lts-cluster.nutanix-dev.devcluster.openshift.com:9440/api/nutanix/v3/clusters/0005d9a4-8e4f-7c33-58d1-e9d0e2d48853”: dial tcp: lookup prismcentral.lts-cluster.nutanix-dev.devcluster.openshift.com on 172.30.0.10:53: read udp 10.128.0.49:40679->172.30.0.10:53: i/o timeout
      STEP: Deleting the new MachineSet
      • Failure in Spec Setup (BeforeEach) [51.122 seconds]
      [Feature:Machines] Managed cluster should
      /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/infra/infra.go:141
        recover from deleted worker machines [BeforeEach]
        /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/infra/infra.go:224
        Expected
          <int>: 1
        to equal
          <int>: 0
        /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/framework/machinesets.go:332
      ———————————————
      
      It seems the failure cause was the dns name lookup timeout when making the prism-cental api call.

      Attachments

        Activity

          People

            jimccann@redhat.com Jim McCann
            yanhli@redhat.com Yanhua Li
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: