Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25364

route-controller-manager pod panics creating cache

XMLWordPrintable

    • No
    • 1
    • NE Sprint 260
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      Observed in 4.12.19 ROSA cluster and raised as a ClusterOperatorDown alert for the openshift-controller-manager. Issue appears to have began shortly after installation completed; the cluster was potentially never healthy.

      Upon investigating, it was found that all route-controller-manager pods were in a CrashLoopBackoff state. Each of their logs contained only the following:

      $ oc logs route-controller-manager-7c6d8d8b66-nqxhk -n openshift-route-controller-manager -p
      I1213 20:30:25.500136       1 controller_manager.go:26] Starting controllers on 0.0.0.0:8443 (4.12.0-202305101515.p0.g9e74d17.assembly.stream-9e74d17)
      unexpected fault address 0xcc001427540
      fatal error: fault
      [signal SIGSEGV: segmentation violation code=0x1 addr=0xcc001427540 pc=0x221fad4]
      
      goroutine 1 [running]:
      runtime.throw({0x2f287c5?, 0xc00091ed78?})
              runtime/panic.go:1047 +0x5d fp=0xc00091ed60 sp=0xc00091ed30 pc=0x109821d
      runtime.sigpanic()
              runtime/signal_unix.go:842 +0x2c5 fp=0xc00091edb0 sp=0xc00091ed60 pc=0x10af465
      k8s.io/apiserver/pkg/authentication/token/cache.newStripedCache(0x20, 0x3089620, 0xc00091ee78)
              k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cache_striped.go:37 +0xf4 fp=0xc00091ee30 sp=0xc00091edb0 pc=0x221fad4
      k8s.io/apiserver/pkg/authentication/token/cache.newWithClock({0x3298a60?, 0xc000226380}, 0x0, 0x45d964b800, 0x45d964b800, {0x32be670, 0x4756188})
              k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cached_token_authenticator.go:112 +0xd4 fp=0xc00091eea0 sp=0xc00091ee30 pc=0x221ff94
      k8s.io/apiserver/pkg/authentication/token/cache.New(...)
              k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cached_token_authenticator.go:91
      github.com/openshift/route-controller-manager/pkg/cmd/controller/route.newRemoteAuthenticator({0x32a3f28, 0xc00043e600}, 0xc0005c4ea0, 0x30872f0?)
              github.com/openshift/route-controller-manager/pkg/cmd/controller/route/apiserver_authenticator.go:33 +0x1b9 fp=0xc00091f0f8 sp=0xc00091eea0 pc=0x28e8f79
      github.com/openshift/route-controller-manager/pkg/cmd/controller/route.RunControllerServer({{{0x2f32887, 0xc}, {0x2f27140, 0x3}, {{0x2f62ce1, 0x25}, {0x2f62d06, 0x25}}, {0x2f7010f, 0x2b}, ...}, ...}, ...)
              github.com/openshift/route-controller-manager/pkg/cmd/controller/route/standalone_apiserver.go:45 +0x18d fp=0xc00091f5f8 sp=0xc00091f0f8 pc=0x28ebfcd
      github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.RunRouteControllerManager(0xc00052ed80, 0x4?, {0x32b8bd0, 0xc00043c1c0})
              github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/controller_manager.go:28 +0x24b fp=0xc00091f8a0 sp=0xc00091f5f8 pc=0x28f6a4b
      github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.(*RouteControllerManager).StartControllerManager(0xc0008dc940, {0x32b8bd0, 0xc00043c1c0})
              github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/cmd.go:119 +0x3f2 fp=0xc00091fa00 sp=0xc00091f8a0 pc=0x28f6752
      github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.NewRouteControllerManagerCommand.func1(0xc0008bd900?, {0x2f279cd?, 0x2?, 0x2?})
              github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/cmd.go:49 +0xc5 fp=0xc00091fb20 sp=0xc00091fa00 pc=0x28f5f65
      github.com/spf13/cobra.(*Command).execute(0xc0008bd900, {0xc0008dcce0, 0x2, 0x2})
              github.com/spf13/cobra@v1.4.0/command.go:860 +0x663 fp=0xc00091fbf8 sp=0xc00091fb20 pc=0x14a7483
      github.com/spf13/cobra.(*Command).ExecuteC(0xc0008bc000)
              github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd fp=0xc00091fcb0 sp=0xc00091fbf8 pc=0x14a7b9d
      github.com/spf13/cobra.(*Command).Execute(...)
              github.com/spf13/cobra@v1.4.0/command.go:902
      k8s.io/component-base/cli.run(0xc0008bc000)
              k8s.io/component-base@v0.25.2/cli/run.go:146 +0x317 fp=0xc00091fd70 sp=0xc00091fcb0 pc=0x24c8917
      k8s.io/component-base/cli.Run(0x32b8bd0?)
              k8s.io/component-base@v0.25.2/cli/run.go:46 +0x1d fp=0xc00091fdf0 sp=0xc00091fd70 pc=0x24c84fd
      main.main()
              github.com/openshift/route-controller-manager/cmd/route-controller-manager/main.go:28 +0x17f fp=0xc00091ff80 sp=0xc00091fdf0 pc=0x28f735f
      runtime.main()
              runtime/proc.go:250 +0x212 fp=0xc00091ffe0 sp=0xc00091ff80 pc=0x109ae52
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc00091ffe8 sp=0xc00091ffe0 pc=0x10cdb41
      
      goroutine 2 [force gc (idle)]:
      runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000084fb0 sp=0xc000084f90 pc=0x109b216
      runtime.goparkunlock(...)
              runtime/proc.go:369
      runtime.forcegchelper()
              runtime/proc.go:302 +0xad fp=0xc000084fe0 sp=0xc000084fb0 pc=0x109b0ad
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x10cdb41
      created by runtime.init.6
              runtime/proc.go:290 +0x25
      
      goroutine 3 [GC sweep wait]:
      runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000085790 sp=0xc000085770 pc=0x109b216
      runtime.goparkunlock(...)
              runtime/proc.go:369
      runtime.bgsweep(0x0?)
              runtime/mgcsweep.go:297 +0xd7 fp=0xc0000857c8 sp=0xc000085790 pc=0x1083df7
      runtime.gcenable.func1()
              runtime/mgc.go:178 +0x26 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x1078986
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x10cdb41
      created by runtime.gcenable
              runtime/mgc.go:178 +0x6b
      
      goroutine 4 [GC scavenge wait]:
      runtime.gopark(0xc0000b2000?, 0x328cad0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000085f70 sp=0xc000085f50 pc=0x109b216
      runtime.goparkunlock(...)
              runtime/proc.go:369
      runtime.(*scavengerState).park(0x47246a0)
              runtime/mgcscavenge.go:389 +0x53 fp=0xc000085fa0 sp=0xc000085f70 pc=0x1081dd3
      runtime.bgscavenge(0x0?)
              runtime/mgcscavenge.go:622 +0x65 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x10823e5
      runtime.gcenable.func2()
              runtime/mgc.go:179 +0x26 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x1078926
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x10cdb41
      created by runtime.gcenable
              runtime/mgc.go:179 +0xaa
      
      goroutine 5 [finalizer wait]:
      runtime.gopark(0x4725820?, 0xc000009860?, 0x0?, 0x0?, 0xc000084770?)
              runtime/proc.go:363 +0xd6 fp=0xc000084628 sp=0xc000084608 pc=0x109b216
      runtime.goparkunlock(...)
              runtime/proc.go:369
      runtime.runfinq()
              runtime/mfinal.go:180 +0x10f fp=0xc0000847e0 sp=0xc000084628 pc=0x1077a0f
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x10cdb41
      created by runtime.createfing
              runtime/mfinal.go:157 +0x45
      
      goroutine 7 [GC worker (idle)]:
      runtime.gopark(0x1061c3d?, 0xc0003af980?, 0xa0?, 0x67?, 0xc0000867a8?)
              runtime/proc.go:363 +0xd6 fp=0xc000086750 sp=0xc000086730 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc0000867e0 sp=0xc000086750 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 8 [GC worker (idle)]:
      runtime.gopark(0xd92381ca853?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000086f50 sp=0xc000086f30 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc000086fe0 sp=0xc000086f50 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 22 [GC worker (idle)]:
      runtime.gopark(0xd92381ca42e?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000080750 sp=0xc000080730 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc0000807e0 sp=0xc000080750 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 23 [GC worker (idle)]:
      runtime.gopark(0xd9238f02c33?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000080f50 sp=0xc000080f30 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc000080fe0 sp=0xc000080f50 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 50 [GC worker (idle)]:
      runtime.gopark(0xd9238f09c5d?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000586750 sp=0xc000586730 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc0005867e0 sp=0xc000586750 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0005867e8 sp=0xc0005867e0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 9 [GC worker (idle)]:
      runtime.gopark(0xd9238f03015?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000087750 sp=0xc000087730 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc0000877e0 sp=0xc000087750 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 10 [GC worker (idle)]:
      runtime.gopark(0xd9238f0d9c7?, 0xc00058a000?, 0x18?, 0x14?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000087f50 sp=0xc000087f30 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc000087fe0 sp=0xc000087f50 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 11 [GC worker (idle)]:
      runtime.gopark(0xd9238f0392d?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000582750 sp=0xc000582730 pc=0x109b216
      runtime.gcBgMarkWorker()
              runtime/mgc.go:1235 +0xf1 fp=0xc0005827e0 sp=0xc000582750 pc=0x107aad1
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0005827e8 sp=0xc0005827e0 pc=0x10cdb41
      created by runtime.gcBgMarkStartWorkers
              runtime/mgc.go:1159 +0x25
      
      goroutine 29 [select, locked to thread]:
      runtime.gopark(0xc0005897a8?, 0x2?, 0x0?, 0x0?, 0xc0005897a4?)
              runtime/proc.go:363 +0xd6 fp=0xc000589618 sp=0xc0005895f8 pc=0x109b216
      runtime.selectgo(0xc0005897a8, 0xc0005897a0, 0x0?, 0x0, 0x1?, 0x1)
              runtime/select.go:328 +0x7bc fp=0xc000589758 sp=0xc000589618 pc=0x10ab53c
      runtime.ensureSigM.func1()
              runtime/signal_unix.go:991 +0x1b0 fp=0xc0005897e0 sp=0xc000589758 pc=0x10af9b0
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0005897e8 sp=0xc0005897e0 pc=0x10cdb41
      created by runtime.ensureSigM
              runtime/signal_unix.go:974 +0xbd
      
      goroutine 30 [syscall]:
      runtime.notetsleepg(0x0?, 0x0?)
              runtime/lock_futex.go:236 +0x34 fp=0xc000588fa0 sp=0xc000588f68 pc=0x10687b4
      os/signal.signal_recv()
              runtime/sigqueue.go:152 +0x2f fp=0xc000588fc0 sp=0xc000588fa0 pc=0x10ca0ef
      os/signal.loop()
              os/signal/signal_unix.go:23 +0x19 fp=0xc000588fe0 sp=0xc000588fc0 pc=0x144cd59
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000588fe8 sp=0xc000588fe0 pc=0x10cdb41
      created by os/signal.Notify.func1.1
              os/signal/signal.go:151 +0x2a
      
      goroutine 31 [chan receive]:
      runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
              runtime/proc.go:363 +0xd6 fp=0xc000582f00 sp=0xc000582ee0 pc=0x109b216
      runtime.chanrecv(0xc0008e7680, 0x0, 0x1)
              runtime/chan.go:583 +0x49b fp=0xc000582f90 sp=0xc000582f00 pc=0x1062e9b
      runtime.chanrecv1(0x0?, 0x0?)
              runtime/chan.go:442 +0x18 fp=0xc000582fb8 sp=0xc000582f90 pc=0x1062998
      k8s.io/apiserver/pkg/server.SetupSignalContext.func1()
              k8s.io/apiserver@v0.25.2/pkg/server/signal.go:48 +0x2b fp=0xc000582fe0 sp=0xc000582fb8 pc=0x24c82eb
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc000582fe8 sp=0xc000582fe0 pc=0x10cdb41
      created by k8s.io/apiserver/pkg/server.SetupSignalContext
              k8s.io/apiserver@v0.25.2/pkg/server/signal.go:47 +0xe5
      
      goroutine 32 [select]:
      runtime.gopark(0xc0005837a0?, 0x2?, 0x0?, 0x0?, 0xc000583764?)
              runtime/proc.go:363 +0xd6 fp=0xc0005835e0 sp=0xc0005835c0 pc=0x109b216
      runtime.selectgo(0xc0005837a0, 0xc000583760, 0x0?, 0x0, 0x0?, 0x1)
              runtime/select.go:328 +0x7bc fp=0xc000583720 sp=0xc0005835e0 pc=0x10ab53c
      k8s.io/klog/v2.(*flushDaemon).run.func1()
              k8s.io/klog/v2@v2.80.1/klog.go:1135 +0x11e fp=0xc0005837e0 sp=0xc000583720 pc=0x118f27e
      runtime.goexit()
              runtime/asm_amd64.s:1594 +0x1 fp=0xc0005837e8 sp=0xc0005837e0 pc=0x10cdb41
      created by k8s.io/klog/v2.(*flushDaemon).run
              k8s.io/klog/v2@v2.80.1/klog.go:1131 +0x17b
          

      Attempting to reproduce this in other workloads by deploying images which also invoke cache.New has failed: so far no other pod on the cluster is known to be crashlooping.

      SRE attempted to restart the nodes running the affected pods. After booting, the route-controller-manager pod was able to run for a short time, but eventually re-entered a CrashLoopBackoff state. Logs did not change after rebooting.

      Version-Release number of selected component (if applicable):

      4.12.19
      

      How reproducible:

      Unsure: only observed on a single cluster, but very reproducible on that cluster 
      

      Steps to Reproduce:

          1. Install cluster
          2. Observe route-controller-manager pods crashlooping
      

      Actual results:

      $ oc get po -n openshift-route-controller-manager
      NAME                                            READY   STATUS             RESTARTS         AGE
      route-controller-manager-7c6d8d8b66-nqxhk       0/1     CrashLoopBackOff   26 (4m13s ago)   113m
      route-controller-manager-7c6d8d8b66-qspnx       0/1     CrashLoopBackOff   26 (4m18s ago)   113m
      route-controller-manager-7c6d8d8b66-twtm8       0/1     CrashLoopBackOff   26 (4m7s ago)    113m
      

      Expected results:

      All route-controller-manager pods are running 

              fkrepins@redhat.com Filip Krepinsky
              tnierman.openshift Trevor Nierman
              Unassigned Unassigned
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: