-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15
-
No
-
MCO Sprint 247, MCO Sprint 248
-
2
-
Proposed
-
False
-
-
N/A
-
Release Note Not Required
Description of problem:
The e2e-gcp-op-layering CI job seems to be continuously and consistently failing during the teardown process. In particular, it appears to be the TestOnClusterBuildRollsOutImage test that is failing whenever it attempts to tear down the node. See: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/4060/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op-layering/1744805949165539328 for an example of a failing job.
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
Open a PR to the GitHub MCO repository.
Actual results:
The teardown portion of the TestOnClusterBuildsRollout test fails thusly: utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f utils.go:1098: Error Trace: /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79 /usr/lib/golang/src/testing/testing.go:1150 /usr/lib/golang/src/testing/testing.go:1328 /usr/lib/golang/src/testing/testing.go:1570 Error: Received unexpected error: exit status 1 Test: TestOnClusterBuildRollsOutImage utils.go:1097: Deleting machine ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f / node ci-op-v5qcditr-46b3f-bh29c-worker-c-fcl9f utils.go:1098: Error Trace: /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79 /usr/lib/golang/src/testing/testing.go:1150 /usr/lib/golang/src/testing/testing.go:1328 /usr/lib/golang/src/testing/testing.go:1312 /usr/lib/golang/src/runtime/panic.go:522 /usr/lib/golang/src/testing/testing.go:980 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:1098 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/onclusterbuild_test.go:103 /go/src/github.com/openshift/machine-config-operator/test/e2e-layering/helpers_test.go:149 /go/src/github.com/openshift/machine-config-operator/test/helpers/utils.go:79 /usr/lib/golang/src/testing/testing.go:1150 /usr/lib/golang/src/testing/testing.go:1328 /usr/lib/golang/src/testing/testing.go:1570 Error: Received unexpected error: exit status 1 Test: TestOnClusterBuildRollsOutImage
Expected results:
This part of the test should pass.
Additional info:
The way the test teardown process currently works is that it shells out to the oc command to delete the underlying Machine and Node. We delete the underlying machine and node so that the cloud provider will provision us a new one due to issues with opting out of on-cluster builds that have yet to be resolved. At the time this test was written, it was implemented in this way to avoid having to vendor the Machine client and API into the MCO codebase which has since happened. I suspect the issue is that oc is failing in some way since we get an exit status 1 from where it is invoked. Now that the Machine client and API are vendored into the MCO codebase, it makes more sense for us to use those directly instead of shelling out to oc in order to do this since we would get more verbose error messages instead.
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update