Details
-
Bug
-
Resolution: Done
-
Major
-
None
-
4.12
-
Important
-
No
-
Rejected
-
False
-
Description
Description of problem
These four tests are frequently failing:
- [sig-node] Kubelet when scheduling a busybox command in a pod should print the output to logs
- [sig-node] Container Runtime blackbox test on terminated container should report termination message from log output if TerminationMessagePolicy FallbackToLogsOnError is set
- [sig-node] Pods should support retrieving logs from the container over websockets
- [sig-node] Kubelet when scheduling a read only busybox container should not write to root filesystem
The test failures all appear to be caused by an unexpected systemd error in the pod logs:
{ fail [k8s.io/kubernetes@v1.25.0/test/e2e/common/node/kubelet.go:79]: Timed out after 60.003s. Expected <string>: time=\"2023-04-24T17:03:56Z\" level=warning msg=\"skipping device /dev/char/10:200 for systemd: stat /sys/dev/char/10:200: no such file or directory\"\nHello World\n to equal <string>: Hello World\n Ginkgo exit error 1: exit with code 1}
{ fail [k8s.io/kubernetes@v1.25.0/test/e2e/common/node/runtime.go:167]: Expected <string>: time=\"2023-04-24T17:07:35Z\" level=warning msg=\"skipping device /dev/char/10:200 for systemd: stat /sys/dev/char/10:200: no such file or directory\"\nDONE to equal <string>: DONE Ginkgo exit error 1: exit with code 1}
{ fail [github.com/onsi/ginkgo/v2@v2.1.5-0.20220909190140-b488ab12695a/internal/suite.go:612]: Apr 24 17:12:57.845: Unexpected websocket logs: time="2023-04-24T17:12:55Z" level=warning msg="skipping device /dev/char/10:200 for systemd: stat /sys/dev/char/10:200: no such file or directory" container is alive Ginkgo exit error 1: exit with code 1}
{ fail [k8s.io/kubernetes@v1.25.0/test/e2e/common/node/kubelet.go:214]: Timed out after 60.002s. Expected <string>: "time="..." to equal | <string>: "/bin/s..." Ginkgo exit error 1: exit with code 1}
These above example failures come from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/910/pull-ci-openshift-cluster-ingress-operator-release-4.12-e2e-aws-ovn-single-node/1650534308475047936.
The test failures all appear to be for 4.12.
- A search.ci search over all jobs for the past 2 days with the pattern skipping device /dev/char/\d+:\d+ for systemd returns only 4.12 jobs.
- A search.ci search over 4.12 jobs in the past 7 days sometimes times out and sometimes shows numerous failures going back several days.
- A search.ci search over all jobs for the past 7 days times out.
This issue appears to affect various platforms, including AWS, Azure, GCP, IBM Cloud, metal, oVirt, and vSphere.
Version-Release number of selected component (if applicable)
4.12.
How reproducible
Presently, a search.ci search for skipping device /dev/char/\d+:\d+ for systemd over 4.12 jobs in the past 7 days reports, "Found in 9.60% of runs (27.31% of failures) across 6519 total runs and 463 jobs (35.16% failed)".
Steps to Reproduce
1. Post a PR and have bad luck.
2. Check search.ci using one of the aforementioned links.
Actual results
CI fails with the four aforementioned test failures.
Expected results
CI passes, or fails on some other test failure.