Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-26605

e2e-gcp-op-layering CI job continuously failing

XMLWordPrintable

    • No
    • MCO Sprint 247, MCO Sprint 248
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

      Description of problem:

      The e2e-gcp-op-layering CI job seems to be continuously and consistently failing during the teardown process. In particular, it appears to be the TestOnClusterBuildRollsOutImage test that is failing whenever it attempts to tear down the node. See: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering/1744805949165539328 for an example of a failing job.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Always

      Steps to Reproduce:

      Open a PR to the GitHub MCO repository.

      Actual results:

      The teardown portion of the TestOnClusterBuildsRollout test fails thusly:
      
        utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f
          utils.go:1098: 
                  Error Trace:    /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                              /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                              /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                              /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                              /usr/lib/golang/src/testing/testing.go:1150
                                              /usr/lib/golang/src/testing/testing.go:1328
                                              /usr/lib/golang/src/testing/testing.go:1570
                  Error:          Received unexpected error:
                                  exit status 1
                  Test:           TestOnClusterBuildRollsOutImage
          utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f
          utils.go:1098: 
                  Error Trace:    /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                              /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                              /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                              /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                              /usr/lib/golang/src/testing/testing.go:1150
                                              /usr/lib/golang/src/testing/testing.go:1328
                                              /usr/lib/golang/src/testing/testing.go:1312
                                              /usr/lib/golang/src/runtime/panic.go:522
                                              /usr/lib/golang/src/testing/testing.go:980
                                              /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098
                                              /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103
                                              /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149
                                              /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79
                                              /usr/lib/golang/src/testing/testing.go:1150
                                              /usr/lib/golang/src/testing/testing.go:1328
                                              /usr/lib/golang/src/testing/testing.go:1570
                  Error:          Received unexpected error:
                                  exit status 1
                  Test:           TestOnClusterBuildRollsOutImage

      Expected results:

      This part of the test should pass.

      Additional info:

      The way the test teardown process currently works is that it shells out to the oc command to delete the underlying Machine and Node. We delete the underlying machine and node so that the cloud provider will provision us a new one due to issues with opting out of on-cluster builds that have yet to be resolved.
      
      At the time this test was written, it was implemented in this way to avoid having to vendor the Machine client and API into the MCO codebase which has since happened. I suspect the issue is that oc is failing in some way since we get an exit status 1 from where it is invoked. Now that the Machine client and API are vendored into the MCO codebase, it makes more sense for us to use those directly instead of shelling out to oc in order to do this since we would get more verbose error messages instead.

            team-mco Team MCO
            zzlotnik@redhat.com Zack Zlotnik
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: