Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29444

TestFirstBootHasSSHKeys fails on FIPS-enabled clusters

XMLWordPrintable

    • Low
    • No
    • MCO Sprint 255, MCO Sprint 256, MCO Sprint 257, MCO Sprint 258
    • 4
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When running on a FIPS-enabled cluster, the e2e test TestFirstBootHasSSHKeys times out.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Always

      Steps to Reproduce:

      1. Open a PR to the MCO repository.
      2. Run the e2e-aws-ovn-fips-op job by commenting /test e2e-aws-ovn-fips-op (this job does not run automatically).
      3. Eventually, the test will fail.

      Actual results:

      === RUN TestFirstBootHasSSHKeys1065mcd_test.go:1019: did not get new node
      --- FAIL: TestFirstBootHasSSHKeys (1201.83s)    

      Expected results:

      === RUN   TestFirstBootHasSSHKeys
          mcd_test.go:929: Got ssh key file data: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
              
      --- PASS: TestFirstBootHasSSHKeys (334.86s)

      Additional info:

      It looks like we're hitting a 20-minute timeout during the test. By comparison, the passing case seems to execute in approximately 5.5 minutes.
      
      I have two preliminary hypothesis' for this:
      1. This operation takes longer in FIPS-enabled clusters for some reason.
      2. It is possible that this is occurring due to a difference in which cloud these tests run. Our normal e2e-gcp-op tests run in GCP whereas this test suite runs in AWS. The underlying operations performed by the Machine API may just take longer in AWS than they do in GCP. If that is the case, this bug can be resolved as-is.

       

      Failing job link: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_machine-config-operator/4172/pull-ci-openshift-machine-config-operator-master-e2e-aws-ovn-fips-op/1757476347388628992

       

      Must-Gather link: https://drive.google.com/file/d/12GhTIP9bgcoNje0Jvyhr-c-akV3XnGn2/view?usp=sharing

            zzlotnik@redhat.com Zack Zlotnik
            zzlotnik@redhat.com Zack Zlotnik
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: