-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.16, 4.17
-
Moderate
-
None
-
1
-
MCO Sprint 258
-
1
-
Approved
-
False
-
-
Release Note Not Required
-
In Progress
Description of problem:
Occasional machine-config daemon panics in test-preview. For example this run has:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-version-operator/1076/pull-ci-openshift-cluster-version-operator-master-e2e-aws-ovn-techpreview/1819082707058036736
And the referenced logs include a full stack trace, the crux of which appears to be:
E0801 19:23:55.012345 2908 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 127 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2424b80, 0x4166150}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0004d5340?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x2424b80?, 0x4166150?}) /usr/lib/golang/src/runtime/panic.go:770 +0x132 github.com/openshift/machine-config-operator/pkg/helpers.ListPools(0xc0007c5208, {0x0, 0x0}) /go/src/github.com/openshift/machine-config-operator/pkg/helpers/helpers.go:142 +0x17d github.com/openshift/machine-config-operator/pkg/helpers.GetPoolsForNode({0x0, 0x0}, 0xc0007c5208) /go/src/github.com/openshift/machine-config-operator/pkg/helpers/helpers.go:66 +0x65 github.com/openshift/machine-config-operator/pkg/daemon.(*PinnedImageSetManager).handleNodeEvent(0xc000a98480, {0x27e9e60?, 0xc0007c5208}) /go/src/github.com/openshift/machine-config-operator/pkg/daemon/pinned_image_set.go:955 +0x92
Version-Release number of selected component (if applicable):
$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?name=^periodic&type=junit&search=machine-config-daemon.*Observed+a+panic' | grep 'failures match' periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-techpreview (all) - 37 runs, 62% failed, 13% of failures match = 8% impact periodic-ci-openshift-release-master-ci-4.16-e2e-azure-ovn-techpreview-serial (all) - 6 runs, 83% failed, 20% of failures match = 17% impact periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-techpreview (all) - 5 runs, 60% failed, 33% of failures match = 20% impact periodic-ci-openshift-multiarch-master-nightly-4.17-ocp-e2e-aws-ovn-arm64-techpreview-serial (all) - 10 runs, 40% failed, 25% of failures match = 10% impact periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-techpreview-serial (all) - 7 runs, 29% failed, 50% of failures match = 14% impact periodic-ci-openshift-release-master-nightly-4.17-e2e-vsphere-ovn-techpreview-serial (all) - 7 runs, 100% failed, 14% of failures match = 14% impact periodic-ci-openshift-release-master-nightly-4.18-e2e-vsphere-ovn-techpreview-serial (all) - 5 runs, 100% failed, 20% of failures match = 20% impact periodic-ci-openshift-multiarch-master-nightly-4.17-ocp-e2e-aws-ovn-arm64-techpreview (all) - 10 runs, 40% failed, 25% of failures match = 10% impact periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-techpreview (all) - 5 runs, 40% failed, 50% of failures match = 20% impact periodic-ci-openshift-release-master-ci-4.16-e2e-aws-ovn-techpreview-serial (all) - 6 runs, 17% failed, 200% of failures match = 33% impact periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-single-node-techpreview-serial (all) - 7 runs, 100% failed, 14% of failures match = 14% impact periodic-ci-openshift-release-master-nightly-4.17-e2e-aws-ovn-single-node-techpreview (all) - 7 runs, 57% failed, 50% of failures match = 29% impact periodic-ci-openshift-release-master-ci-4.16-e2e-aws-ovn-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact periodic-ci-openshift-release-master-ci-4.17-e2e-gcp-ovn-techpreview (all) - 18 runs, 17% failed, 33% of failures match = 6% impact periodic-ci-openshift-release-master-ci-4.16-e2e-gcp-ovn-techpreview (all) - 6 runs, 17% failed, 100% of failures match = 17% impact periodic-ci-openshift-multiarch-master-nightly-4.16-ocp-e2e-aws-ovn-arm64-techpreview-serial (all) - 11 runs, 18% failed, 50% of failures match = 9% impact periodic-ci-openshift-release-master-ci-4.17-e2e-azure-ovn-techpreview-serial (all) - 7 runs, 57% failed, 25% of failures match = 14% impact
How reproducible:
looks like ~15% impact in those CI runs CI Search turns up.
Steps to Reproduce:
Run lots of CI. Look for MCD panics.
Actual results
CI Search results above.
Expected results
No hits.
- blocks
-
OCPBUGS-38846 Machine-config daemon ListPools panic during tech-preview CI runs
- Closed
- is cloned by
-
OCPBUGS-38846 Machine-config daemon ListPools panic during tech-preview CI runs
- Closed
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update