Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.11
Component/s: Cloud Compute / Nutanix Provider
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.12.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The ci/prow/e2e-nutanix-operator test runs failed with both 4.11 and 4.12. The test runs seemed failed at different test cases randomly. When running the test suites manually with the OCP cluster deployed with the LTS environment, it showed that the failures may be caused by the slow LTS network (DNS server).

Version-Release number of selected component (if applicable):

How reproducible:

The ci/prow/e2e-nutanix-operator test runs always failed with 4.11 and 4.12

Steps to Reproduce:

Trigger the ci/prow/e2e-nutanix-operator test run with 4.11 or 4.12. Or manually run the actuator-pkg test suites with

Actual results:

The ci/prow/e2e-nutanix-operator test runs failed at different test cases randomly.

Expected results:

The ci/prow/e2e-nutanix-operator test runs pass successfully.

Additional info:

Slack thread https://coreos.slack.com/archives/C0211848DBN/p1659363922100509

When running the actuator-pkg tests manually with the OCP cluster deployed to the LTS-dev environment, I got the below test failure:
------------------------------
[Feature:Machines] Managed cluster should
  recover from deleted worker machines
  /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/infra/infra.go:224
I0729 18:10:15.329330   17649 request.go:601] Waited for 1.050859554s due to client-side throttling, not priority and fairness, request: GET:https://api.nutanix-dev.devcluster.openshift.com:6443/apis/monitoring.coreos.com/v1?timeout=32s
STEP: Creating a new MachineSet
E0729 18:10:49.218277   17649 machinesets.go:319] found 1 Machines in failed phase:
E0729 18:10:49.218296   17649 machinesets.go:329] Failed machine: nutanix-dev-fxq6fkhm65-xkwzb, Reason: InvalidConfiguration, Message: nutanix-dev-fxq6fkhm65-xkwzb: failed in validating machine providerSpec: spec.providerSpec.value.cluster.uuid: Invalid value: “0005d9a4-8e4f-7c33-58d1-e9d0e2d48853”: Failed to find cluster with uuid 0005d9a4-8e4f-7c33-58d1-e9d0e2d48853. error: Get “https://prismcentral.lts-cluster.nutanix-dev.devcluster.openshift.com:9440/api/nutanix/v3/clusters/0005d9a4-8e4f-7c33-58d1-e9d0e2d48853”: dial tcp: lookup prismcentral.lts-cluster.nutanix-dev.devcluster.openshift.com on 172.30.0.10:53: read udp 10.128.0.49:40679->172.30.0.10:53: i/o timeout
STEP: Deleting the new MachineSet
• Failure in Spec Setup (BeforeEach) [51.122 seconds]
[Feature:Machines] Managed cluster should
/Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/infra/infra.go:141
  recover from deleted worker machines [BeforeEach]
  /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/infra/infra.go:224
  Expected
    <int>: 1
  to equal
    <int>: 0
  /Users/yanhuali/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/framework/machinesets.go:332
———————————————

It seems the failure cause was the dns name lookup timeout when making the prism-cental api call.

Assignee:: Jim McCann

Reporter:: Yanhua Li (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/09/06 5:22 PM

Updated:: 2025/07/29 11:34 AM

Resolved:: 2022/09/27 2:16 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates