-
Bug
-
Resolution: Done
-
Undefined
-
4.14
-
None
-
No
-
False
-
Description of problem:
Since we migrated some our jobs to OCP 4.14, we are experiencing a lot of flakiness with the "openshift-tests" binary which panics when trying to retrieve the logs of etcd: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2212/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted/1673615526967906304#1:build-log.txt%3A161-191 Here's the impact on our jobs: https://search.ci.openshift.org/?search=error+reading+pod+logs&maxAge=48h&context=1&type=build-log&name=.*assisted.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Version-Release number of selected component (if applicable):
N/A
How reproducible:
Happens from time to time against OCP 4.14
Steps to Reproduce:
1. Provision an OCP cluster 4.14 2. Run the conformance tests on it with "openshift-tests"
Actual results:
The binary "openshift-tests" panics from time to time: [2023-06-27 10:12:07] time="2023-06-27T10:12:07Z" level=error msg="error reading pod logs" error="container \"etcd\" in pod \"etcd-test-infra-cluster-a1729bd4-master-2\" is not available" pod=etcd-test-infra-cluster-a1729bd4-master-2 [2023-06-27 10:12:07] panic: runtime error: invalid memory address or nil pointer dereference [2023-06-27 10:12:07] [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x26eb9b5] [2023-06-27 10:12:07] [2023-06-27 10:12:07] goroutine 1 [running]: [2023-06-27 10:12:07] bufio.(*Scanner).Scan(0xc005954250) [2023-06-27 10:12:07] bufio/scan.go:214 +0x855 [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.IntervalsFromPodLogs({0x8d91460, 0xc004a43d40}, {0xc8b83c0?, 0xc006138000?, 0xc8b83c0?}, {0x8d91460?, 0xc004a43d40?, 0xc8b83c0?}) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation/podlogs.go:130 +0x8cd [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.InsertIntervalsFromCluster({0x8d441e0, 0xc000ffd900}, 0xc0008b4000?, {0xc005f88000?, 0x539, 0x0?}, 0x25e1e39?, {0xc11ecb5d446c4f2c, 0x4fb99e6af, 0xc8b83c0}, ...) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation/types.go:65 +0x274 [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*MonitorEventsOptions).End(0xc001083050, {0x8d441e0, 0xc000ffd900}, 0x1?, {0x7fff15b2ccde, 0x16}) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo/options_monitor_events.go:170 +0x225 [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*Options).Run(0xc0013e2000, 0xc00012e380, {0x8126d1e, 0xf}) [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo/cmd_runsuite.go:506 +0x2d9a [2023-06-27 10:12:07] main.newRunCommand.func1.1() [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:330 +0x2d4 [2023-06-27 10:12:07] main.mirrorToFile(0xc0013e2000, 0xc0014cdb30) [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:476 +0x5f2 [2023-06-27 10:12:07] main.newRunCommand.func1(0xc0013e0300?, {0xc000862ea0?, 0x6?, 0x6?}) [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:311 +0x5c [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).execute(0xc0013e0300, {0xc000862e40, 0x6, 0x6}) [2023-06-27 10:12:07] github.com/spf13/cobra@v1.6.0/command.go:916 +0x862 [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).ExecuteC(0xc0013e0000) [2023-06-27 10:12:07] github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).Execute(...) [2023-06-27 10:12:07] github.com/spf13/cobra@v1.6.0/command.go:968 [2023-06-27 10:12:07] main.main.func1(0xc00011b300?) [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:96 +0x8a [2023-06-27 10:12:07] main.main() [2023-06-27 10:12:07] github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:97 +0x516
Expected results:
No panics
Additional info:
The source of the panic has been pin-pointed here: https://github.com/openshift/origin/pull/27772#discussion_r1243600596