Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15500

openshift-tests panics when retrieving etcd logs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • 4.14.0
    • 4.14
    • Test Framework
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Since we migrated some our jobs to OCP 4.14, we are experiencing a lot of flakiness with the "openshift-tests" binary which panics when trying to retrieve the logs of etcd: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2212/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted/1673615526967906304#1:build-log.txt%3A161-191
      
      Here's the impact on our jobs:
      https://search.ci.openshift.org/?search=error+reading+pod+logs&maxAge=48h&context=1&type=build-log&name=.*assisted.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
      
      

      Version-Release number of selected component (if applicable):

       N/A
      
      

      How reproducible:

      Happens from time to time against OCP 4.14
      

      Steps to Reproduce:

      1. Provision an OCP cluster 4.14
      2. Run the conformance tests on it with "openshift-tests"
      
      

      Actual results:

      
      The binary "openshift-tests" panics from time to time:
      
       [2023-06-27 10:12:07] time="2023-06-27T10:12:07Z" level=error msg="error reading pod logs" error="container \"etcd\" in pod \"etcd-test-infra-cluster-a1729bd4-master-2\" is not available" pod=etcd-test-infra-cluster-a1729bd4-master-2
      [2023-06-27 10:12:07] panic: runtime error: invalid memory address or nil pointer dereference
      [2023-06-27 10:12:07] [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x26eb9b5]
      [2023-06-27 10:12:07] 
      [2023-06-27 10:12:07] goroutine 1 [running]:
      [2023-06-27 10:12:07] bufio.(*Scanner).Scan(0xc005954250)
      [2023-06-27 10:12:07] 	bufio/scan.go:214 +0x855
      [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.IntervalsFromPodLogs({0x8d91460, 0xc004a43d40}, {0xc8b83c0?, 0xc006138000?, 0xc8b83c0?}, {0x8d91460?, 0xc004a43d40?, 0xc8b83c0?})
      [2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/monitor/intervalcreation/podlogs.go:130 +0x8cd
      [2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.InsertIntervalsFromCluster({0x8d441e0, 0xc000ffd900}, 0xc0008b4000?, {0xc005f88000?, 0x539, 0x0?}, 0x25e1e39?, {0xc11ecb5d446c4f2c, 0x4fb99e6af, 0xc8b83c0}, ...)
      [2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/monitor/intervalcreation/types.go:65 +0x274
      [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*MonitorEventsOptions).End(0xc001083050, {0x8d441e0, 0xc000ffd900}, 0x1?, {0x7fff15b2ccde, 0x16})
      [2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/test/ginkgo/options_monitor_events.go:170 +0x225
      [2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*Options).Run(0xc0013e2000, 0xc00012e380, {0x8126d1e, 0xf})
      [2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/test/ginkgo/cmd_runsuite.go:506 +0x2d9a
      [2023-06-27 10:12:07] main.newRunCommand.func1.1()
      [2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:330 +0x2d4
      [2023-06-27 10:12:07] main.mirrorToFile(0xc0013e2000, 0xc0014cdb30)
      [2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:476 +0x5f2
      [2023-06-27 10:12:07] main.newRunCommand.func1(0xc0013e0300?, {0xc000862ea0?, 0x6?, 0x6?})
      [2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:311 +0x5c
      [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).execute(0xc0013e0300, {0xc000862e40, 0x6, 0x6})
      [2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:916 +0x862
      [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).ExecuteC(0xc0013e0000)
      [2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd
      [2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).Execute(...)
      [2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:968
      [2023-06-27 10:12:07] main.main.func1(0xc00011b300?)
      [2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:96 +0x8a
      [2023-06-27 10:12:07] main.main()
      [2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:97 +0x516 
      
      

      Expected results:

      No panics
      

      Additional info:

      The source of the panic has been pin-pointed here: https://github.com/openshift/origin/pull/27772#discussion_r1243600596
      

            dperique@redhat.com Dennis Periquet
            agentil@redhat.com Adrien Gentil
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: