-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.12.z
-
No
-
1
-
NE Sprint 260
-
1
-
Rejected
-
False
-
Description of problem:
Observed in 4.12.19 ROSA cluster and raised as a ClusterOperatorDown alert for the openshift-controller-manager. Issue appears to have began shortly after installation completed; the cluster was potentially never healthy.
Upon investigating, it was found that all route-controller-manager pods were in a CrashLoopBackoff state. Each of their logs contained only the following:
$ oc logs route-controller-manager-7c6d8d8b66-nqxhk -n openshift-route-controller-manager -p
I1213 20:30:25.500136 1 controller_manager.go:26] Starting controllers on 0.0.0.0:8443 (4.12.0-202305101515.p0.g9e74d17.assembly.stream-9e74d17)
unexpected fault address 0xcc001427540
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0xcc001427540 pc=0x221fad4]
goroutine 1 [running]:
runtime.throw({0x2f287c5?, 0xc00091ed78?})
runtime/panic.go:1047 +0x5d fp=0xc00091ed60 sp=0xc00091ed30 pc=0x109821d
runtime.sigpanic()
runtime/signal_unix.go:842 +0x2c5 fp=0xc00091edb0 sp=0xc00091ed60 pc=0x10af465
k8s.io/apiserver/pkg/authentication/token/cache.newStripedCache(0x20, 0x3089620, 0xc00091ee78)
k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cache_striped.go:37 +0xf4 fp=0xc00091ee30 sp=0xc00091edb0 pc=0x221fad4
k8s.io/apiserver/pkg/authentication/token/cache.newWithClock({0x3298a60?, 0xc000226380}, 0x0, 0x45d964b800, 0x45d964b800, {0x32be670, 0x4756188})
k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cached_token_authenticator.go:112 +0xd4 fp=0xc00091eea0 sp=0xc00091ee30 pc=0x221ff94
k8s.io/apiserver/pkg/authentication/token/cache.New(...)
k8s.io/apiserver@v0.25.2/pkg/authentication/token/cache/cached_token_authenticator.go:91
github.com/openshift/route-controller-manager/pkg/cmd/controller/route.newRemoteAuthenticator({0x32a3f28, 0xc00043e600}, 0xc0005c4ea0, 0x30872f0?)
github.com/openshift/route-controller-manager/pkg/cmd/controller/route/apiserver_authenticator.go:33 +0x1b9 fp=0xc00091f0f8 sp=0xc00091eea0 pc=0x28e8f79
github.com/openshift/route-controller-manager/pkg/cmd/controller/route.RunControllerServer({{{0x2f32887, 0xc}, {0x2f27140, 0x3}, {{0x2f62ce1, 0x25}, {0x2f62d06, 0x25}}, {0x2f7010f, 0x2b}, ...}, ...}, ...)
github.com/openshift/route-controller-manager/pkg/cmd/controller/route/standalone_apiserver.go:45 +0x18d fp=0xc00091f5f8 sp=0xc00091f0f8 pc=0x28ebfcd
github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.RunRouteControllerManager(0xc00052ed80, 0x4?, {0x32b8bd0, 0xc00043c1c0})
github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/controller_manager.go:28 +0x24b fp=0xc00091f8a0 sp=0xc00091f5f8 pc=0x28f6a4b
github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.(*RouteControllerManager).StartControllerManager(0xc0008dc940, {0x32b8bd0, 0xc00043c1c0})
github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/cmd.go:119 +0x3f2 fp=0xc00091fa00 sp=0xc00091f8a0 pc=0x28f6752
github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager.NewRouteControllerManagerCommand.func1(0xc0008bd900?, {0x2f279cd?, 0x2?, 0x2?})
github.com/openshift/route-controller-manager/pkg/cmd/route-controller-manager/cmd.go:49 +0xc5 fp=0xc00091fb20 sp=0xc00091fa00 pc=0x28f5f65
github.com/spf13/cobra.(*Command).execute(0xc0008bd900, {0xc0008dcce0, 0x2, 0x2})
github.com/spf13/cobra@v1.4.0/command.go:860 +0x663 fp=0xc00091fbf8 sp=0xc00091fb20 pc=0x14a7483
github.com/spf13/cobra.(*Command).ExecuteC(0xc0008bc000)
github.com/spf13/cobra@v1.4.0/command.go:974 +0x3bd fp=0xc00091fcb0 sp=0xc00091fbf8 pc=0x14a7b9d
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.4.0/command.go:902
k8s.io/component-base/cli.run(0xc0008bc000)
k8s.io/component-base@v0.25.2/cli/run.go:146 +0x317 fp=0xc00091fd70 sp=0xc00091fcb0 pc=0x24c8917
k8s.io/component-base/cli.Run(0x32b8bd0?)
k8s.io/component-base@v0.25.2/cli/run.go:46 +0x1d fp=0xc00091fdf0 sp=0xc00091fd70 pc=0x24c84fd
main.main()
github.com/openshift/route-controller-manager/cmd/route-controller-manager/main.go:28 +0x17f fp=0xc00091ff80 sp=0xc00091fdf0 pc=0x28f735f
runtime.main()
runtime/proc.go:250 +0x212 fp=0xc00091ffe0 sp=0xc00091ff80 pc=0x109ae52
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc00091ffe8 sp=0xc00091ffe0 pc=0x10cdb41
goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000084fb0 sp=0xc000084f90 pc=0x109b216
runtime.goparkunlock(...)
runtime/proc.go:369
runtime.forcegchelper()
runtime/proc.go:302 +0xad fp=0xc000084fe0 sp=0xc000084fb0 pc=0x109b0ad
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x10cdb41
created by runtime.init.6
runtime/proc.go:290 +0x25
goroutine 3 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000085790 sp=0xc000085770 pc=0x109b216
runtime.goparkunlock(...)
runtime/proc.go:369
runtime.bgsweep(0x0?)
runtime/mgcsweep.go:297 +0xd7 fp=0xc0000857c8 sp=0xc000085790 pc=0x1083df7
runtime.gcenable.func1()
runtime/mgc.go:178 +0x26 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x1078986
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x10cdb41
created by runtime.gcenable
runtime/mgc.go:178 +0x6b
goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc0000b2000?, 0x328cad0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000085f70 sp=0xc000085f50 pc=0x109b216
runtime.goparkunlock(...)
runtime/proc.go:369
runtime.(*scavengerState).park(0x47246a0)
runtime/mgcscavenge.go:389 +0x53 fp=0xc000085fa0 sp=0xc000085f70 pc=0x1081dd3
runtime.bgscavenge(0x0?)
runtime/mgcscavenge.go:622 +0x65 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x10823e5
runtime.gcenable.func2()
runtime/mgc.go:179 +0x26 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x1078926
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x10cdb41
created by runtime.gcenable
runtime/mgc.go:179 +0xaa
goroutine 5 [finalizer wait]:
runtime.gopark(0x4725820?, 0xc000009860?, 0x0?, 0x0?, 0xc000084770?)
runtime/proc.go:363 +0xd6 fp=0xc000084628 sp=0xc000084608 pc=0x109b216
runtime.goparkunlock(...)
runtime/proc.go:369
runtime.runfinq()
runtime/mfinal.go:180 +0x10f fp=0xc0000847e0 sp=0xc000084628 pc=0x1077a0f
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x10cdb41
created by runtime.createfing
runtime/mfinal.go:157 +0x45
goroutine 7 [GC worker (idle)]:
runtime.gopark(0x1061c3d?, 0xc0003af980?, 0xa0?, 0x67?, 0xc0000867a8?)
runtime/proc.go:363 +0xd6 fp=0xc000086750 sp=0xc000086730 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc0000867e0 sp=0xc000086750 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 8 [GC worker (idle)]:
runtime.gopark(0xd92381ca853?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000086f50 sp=0xc000086f30 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc000086fe0 sp=0xc000086f50 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 22 [GC worker (idle)]:
runtime.gopark(0xd92381ca42e?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000080750 sp=0xc000080730 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc0000807e0 sp=0xc000080750 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 23 [GC worker (idle)]:
runtime.gopark(0xd9238f02c33?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000080f50 sp=0xc000080f30 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc000080fe0 sp=0xc000080f50 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 50 [GC worker (idle)]:
runtime.gopark(0xd9238f09c5d?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000586750 sp=0xc000586730 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc0005867e0 sp=0xc000586750 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0005867e8 sp=0xc0005867e0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 9 [GC worker (idle)]:
runtime.gopark(0xd9238f03015?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000087750 sp=0xc000087730 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc0000877e0 sp=0xc000087750 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 10 [GC worker (idle)]:
runtime.gopark(0xd9238f0d9c7?, 0xc00058a000?, 0x18?, 0x14?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000087f50 sp=0xc000087f30 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc000087fe0 sp=0xc000087f50 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 11 [GC worker (idle)]:
runtime.gopark(0xd9238f0392d?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000582750 sp=0xc000582730 pc=0x109b216
runtime.gcBgMarkWorker()
runtime/mgc.go:1235 +0xf1 fp=0xc0005827e0 sp=0xc000582750 pc=0x107aad1
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0005827e8 sp=0xc0005827e0 pc=0x10cdb41
created by runtime.gcBgMarkStartWorkers
runtime/mgc.go:1159 +0x25
goroutine 29 [select, locked to thread]:
runtime.gopark(0xc0005897a8?, 0x2?, 0x0?, 0x0?, 0xc0005897a4?)
runtime/proc.go:363 +0xd6 fp=0xc000589618 sp=0xc0005895f8 pc=0x109b216
runtime.selectgo(0xc0005897a8, 0xc0005897a0, 0x0?, 0x0, 0x1?, 0x1)
runtime/select.go:328 +0x7bc fp=0xc000589758 sp=0xc000589618 pc=0x10ab53c
runtime.ensureSigM.func1()
runtime/signal_unix.go:991 +0x1b0 fp=0xc0005897e0 sp=0xc000589758 pc=0x10af9b0
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0005897e8 sp=0xc0005897e0 pc=0x10cdb41
created by runtime.ensureSigM
runtime/signal_unix.go:974 +0xbd
goroutine 30 [syscall]:
runtime.notetsleepg(0x0?, 0x0?)
runtime/lock_futex.go:236 +0x34 fp=0xc000588fa0 sp=0xc000588f68 pc=0x10687b4
os/signal.signal_recv()
runtime/sigqueue.go:152 +0x2f fp=0xc000588fc0 sp=0xc000588fa0 pc=0x10ca0ef
os/signal.loop()
os/signal/signal_unix.go:23 +0x19 fp=0xc000588fe0 sp=0xc000588fc0 pc=0x144cd59
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000588fe8 sp=0xc000588fe0 pc=0x10cdb41
created by os/signal.Notify.func1.1
os/signal/signal.go:151 +0x2a
goroutine 31 [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:363 +0xd6 fp=0xc000582f00 sp=0xc000582ee0 pc=0x109b216
runtime.chanrecv(0xc0008e7680, 0x0, 0x1)
runtime/chan.go:583 +0x49b fp=0xc000582f90 sp=0xc000582f00 pc=0x1062e9b
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:442 +0x18 fp=0xc000582fb8 sp=0xc000582f90 pc=0x1062998
k8s.io/apiserver/pkg/server.SetupSignalContext.func1()
k8s.io/apiserver@v0.25.2/pkg/server/signal.go:48 +0x2b fp=0xc000582fe0 sp=0xc000582fb8 pc=0x24c82eb
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc000582fe8 sp=0xc000582fe0 pc=0x10cdb41
created by k8s.io/apiserver/pkg/server.SetupSignalContext
k8s.io/apiserver@v0.25.2/pkg/server/signal.go:47 +0xe5
goroutine 32 [select]:
runtime.gopark(0xc0005837a0?, 0x2?, 0x0?, 0x0?, 0xc000583764?)
runtime/proc.go:363 +0xd6 fp=0xc0005835e0 sp=0xc0005835c0 pc=0x109b216
runtime.selectgo(0xc0005837a0, 0xc000583760, 0x0?, 0x0, 0x0?, 0x1)
runtime/select.go:328 +0x7bc fp=0xc000583720 sp=0xc0005835e0 pc=0x10ab53c
k8s.io/klog/v2.(*flushDaemon).run.func1()
k8s.io/klog/v2@v2.80.1/klog.go:1135 +0x11e fp=0xc0005837e0 sp=0xc000583720 pc=0x118f27e
runtime.goexit()
runtime/asm_amd64.s:1594 +0x1 fp=0xc0005837e8 sp=0xc0005837e0 pc=0x10cdb41
created by k8s.io/klog/v2.(*flushDaemon).run
k8s.io/klog/v2@v2.80.1/klog.go:1131 +0x17b
Attempting to reproduce this in other workloads by deploying images which also invoke cache.New has failed: so far no other pod on the cluster is known to be crashlooping.
SRE attempted to restart the nodes running the affected pods. After booting, the route-controller-manager pod was able to run for a short time, but eventually re-entered a CrashLoopBackoff state. Logs did not change after rebooting.
Version-Release number of selected component (if applicable):
4.12.19
How reproducible:
Unsure: only observed on a single cluster, but very reproducible on that cluster
Steps to Reproduce:
1. Install cluster 2. Observe route-controller-manager pods crashlooping
Actual results:
$ oc get po -n openshift-route-controller-manager NAME READY STATUS RESTARTS AGE route-controller-manager-7c6d8d8b66-nqxhk 0/1 CrashLoopBackOff 26 (4m13s ago) 113m route-controller-manager-7c6d8d8b66-qspnx 0/1 CrashLoopBackOff 26 (4m18s ago) 113m route-controller-manager-7c6d8d8b66-twtm8 0/1 CrashLoopBackOff 26 (4m7s ago) 113m
Expected results:
All route-controller-manager pods are running