-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.18.0
-
None
-
CLOUD Sprint 260, CLOUD Sprint 261, CLOUD Sprint 262
-
3
-
False
-
Description of problem:
KCM crashes with the following panic even though the cloud provider is not set --cloud-config="" --cloud-provider="" It seems there is an attempt to still start the service-lb-controller which results in 2024-09-21T06:33:19.455834130Z E0921 06:33:19.455607 1 core.go:105] "Failed to start service controller" err="WARNING: no cloud provider provided, services of type LoadBalancer will fail" logger="service-lb-controller but in some instances it will fail with the following panic later(in 1:45h) 2024-09-21T14:15:01.407730648Z panic: runtime error: invalid memory address or nil pointer dereference [recovered] 2024-09-21T14:15:01.407835311Z panic: runtime error: invalid memory address or nil pointer dereference 2024-09-21T14:15:01.407885052Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x21bfd5a] 2024-09-21T14:15:01.407995256Z 2024-09-21T14:15:01.408018213Z goroutine 975 [running]: 2024-09-21T14:15:01.408091273Z k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x3eeb248, 0x5ff62e0}, {0x3299160, 0x5e926b0}, {0x5ff62e0, 0x0, 0x43fb85?}) 2024-09-21T14:15:01.408402738Z k8s.io/apimachinery/pkg/util/runtime/runtime.go:89 +0xee 2024-09-21T14:15:01.408492677Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0007d36c0?}) 2024-09-21T14:15:01.408654734Z k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108 2024-09-21T14:15:01.408751922Z panic({0x3299160?, 0x5e926b0?}) 2024-09-21T14:15:01.409474042Z runtime/panic.go:770 +0x132 2024-09-21T14:15:01.409619038Z k8s.io/cloud-provider/controllers/service.(*Controller).needsUpdate(0xc0007cb2b0, 0xc0060aa288, 0xc004308008) 2024-09-21T14:15:01.409790865Z k8s.io/cloud-provider/controllers/service/controller.go:586 +0x39a 2024-09-21T14:15:01.409924910Z k8s.io/cloud-provider/controllers/service.New.func2({0x382f1c0?, 0xc0060aa288?}, {0x382f1c0, 0xc004308008?}) 2024-09-21T14:15:01.410338662Z k8s.io/cloud-provider/controllers/service/controller.go:144 +0x74 2024-09-21T14:15:01.410484780Z k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) 2024-09-21T14:15:01.410558301Z k8s.io/client-go/tools/cache/controller.go:253 2024-09-21T14:15:01.410649764Z k8s.io/client-go/tools/cache.(*processorListener).run.func1() 2024-09-21T14:15:01.410707568Z k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea 2024-09-21T14:15:01.410841912Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) 2024-09-21T14:15:01.410934524Z k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 2024-09-21T14:15:01.411065837Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc002a87f70, {0x3eaec60, 0xc002290090}, 0x1, 0xc000955560) 2024-09-21T14:15:01.411533342Z k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf 2024-09-21T14:15:01.411673905Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc002180770, 0x3b9aca00, 0x0, 0x1, 0xc000955560) 2024-09-21T14:15:01.411899402Z k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f 2024-09-21T14:15:01.412032598Z k8s.io/apimachinery/pkg/util/wait.Until(...) 2024-09-21T14:15:01.412105779Z k8s.io/apimachinery/pkg/util/wait/backoff.go:161 2024-09-21T14:15:01.412225197Z k8s.io/client-go/tools/cache.(*processorListener).run(0xc0013f4d80) 2024-09-21T14:15:01.412300249Z k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69 2024-09-21T14:15:01.412441934Z k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() 2024-09-21T14:15:01.412502821Z k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52 2024-09-21T14:15:01.412649231Z created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 966 2024-09-21T14:15:01.412743984Z k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73 I suspect the resource event handler is running (not great) and called in https://github.com/kubernetes/kubernetes/blob/c9d6fd9ff77f43363898362ec71796dafeb89929/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L129-L150 which results in the nil pointer here https://github.com/kubernetes/kubernetes/blob/c9d6fd9ff77f43363898362ec71796dafeb89929/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L584 . Most likely because c.eventRecorder is not initialized.
Version-Release number of selected component (if applicable):
found during the 4.18 rebase when running the payloads: https://github.com/openshift/kubernetes/pull/2092
How reproducible:
rarely
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
KCM does not crash and the cloud controllers are not running
Additional info:
observed in - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-kubernetes-2092-nightly-4.18-e2e-metal-ipi-ovn-serial-virtualmedia/1837446433330958336 - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-kubernetes-2092-nightly-4.18-e2e-metal-ipi-ovn-serial-ipv4/1837446433297403904 - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-kubernetes-2092-nightly-4.18-e2e-metal-ipi-ovn-serial-ipv4/1837315052495966208
- is related to
-
WRKLDS-1556 remove --cloud-provider=external from KCM in OpenShift 4.19 (k8s 1.32)
- To Do
-
OCPBUGS-36219 cluster autoscaler cleanup deletion test is failing continually
- Closed
- links to