Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: 4.17.0
Affects Version/s: 4.16.0
Component/s: Machine Config Operator
Labels:
- mco-triaged
- pre-merge-tested

Severity:
Low
Regression:
No
Sprint:
MCO Sprint 255, MCO Sprint 256, MCO Sprint 257, MCO Sprint 258
sprint_count:
4
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.18.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When running on a FIPS-enabled cluster, the e2e test TestFirstBootHasSSHKeys times out.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Open a PR to the MCO repository.
2. Run the e2e-aws-ovn-fips-op job by commenting /test e2e-aws-ovn-fips-op (this job does not run automatically).
3. Eventually, the test will fail.

Actual results:

=== RUN TestFirstBootHasSSHKeys1065mcd_test.go:1019: did not get new node
--- FAIL: TestFirstBootHasSSHKeys (1201.83s)

Expected results:

=== RUN   TestFirstBootHasSSHKeys
    mcd_test.go:929: Got ssh key file data: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
        
--- PASS: TestFirstBootHasSSHKeys (334.86s)

Additional info:

It looks like we're hitting a 20-minute timeout during the test. By comparison, the passing case seems to execute in approximately 5.5 minutes.

I have two preliminary hypothesis' for this:
1. This operation takes longer in FIPS-enabled clusters for some reason.
2. It is possible that this is occurring due to a difference in which cloud these tests run. Our normal e2e-gcp-op tests run in GCP whereas this test suite runs in AWS. The underlying operations performed by the Machine API may just take longer in AWS than they do in GCP. If that is the case, this bug can be resolved as-is.

Failing job link: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/4172/pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-fips-op/1757476347388628992

Must-Gather link: https://drive.google.com/file/d/12GhTIP9bgcoNje0Jvyhr-c-akV3XnGn2/view?usp=sharing

links to

openshift/machine-config-operator#4415: OCPBUGS-29444: Rewrote TestFirstBootHasSSHKeys e2e test in pure Go

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

Assignee:: Zack Zlotnik

Reporter:: Zack Zlotnik

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/02/13 10:05 PM

Updated:: 2024/10/14 9:56 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates