Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42315

KCM without a cloud provider crashed by a cloud controller

XMLWordPrintable

    • None
    • CLOUD Sprint 260, CLOUD Sprint 261, CLOUD Sprint 262
    • 3
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      KCM crashes with the following panic even though the cloud provider is not set
      
      --cloud-config="" --cloud-provider=""
      
      It seems there is an attempt to still start the service-lb-controller
      
      which results in 
      
      2024-09-21T06:33:19.455834130Z E0921 06:33:19.455607       1 core.go:105] "Failed to start service controller" err="WARNING: no cloud provider provided, services of type LoadBalancer will fail" logger="service-lb-controller
      
      but in some instances it will fail with the following panic later(in 1:45h)
      
      2024-09-21T14:15:01.407730648Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
      2024-09-21T14:15:01.407835311Z  panic: runtime error: invalid memory address or nil pointer dereference
      2024-09-21T14:15:01.407885052Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x21bfd5a]
      2024-09-21T14:15:01.407995256Z 
      2024-09-21T14:15:01.408018213Z goroutine 975 [running]:
      2024-09-21T14:15:01.408091273Z k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x3eeb248, 0x5ff62e0}, {0x3299160, 0x5e926b0}, {0x5ff62e0, 0x0, 0x43fb85?})
      2024-09-21T14:15:01.408402738Z  k8s.io/apimachinery/pkg/util/runtime/runtime.go:89 +0xee
      2024-09-21T14:15:01.408492677Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc0007d36c0?})
      2024-09-21T14:15:01.408654734Z  k8s.io/apimachinery/pkg/util/runtime/runtime.go:59 +0x108
      2024-09-21T14:15:01.408751922Z panic({0x3299160?, 0x5e926b0?})
      2024-09-21T14:15:01.409474042Z  runtime/panic.go:770 +0x132
      2024-09-21T14:15:01.409619038Z k8s.io/cloud-provider/controllers/service.(*Controller).needsUpdate(0xc0007cb2b0, 0xc0060aa288, 0xc004308008)
      2024-09-21T14:15:01.409790865Z  k8s.io/cloud-provider/controllers/service/controller.go:586 +0x39a
      2024-09-21T14:15:01.409924910Z k8s.io/cloud-provider/controllers/service.New.func2({0x382f1c0?, 0xc0060aa288?}, {0x382f1c0, 0xc004308008?})
      2024-09-21T14:15:01.410338662Z  k8s.io/cloud-provider/controllers/service/controller.go:144 +0x74
      2024-09-21T14:15:01.410484780Z k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
      2024-09-21T14:15:01.410558301Z  k8s.io/client-go/tools/cache/controller.go:253
      2024-09-21T14:15:01.410649764Z k8s.io/client-go/tools/cache.(*processorListener).run.func1()
      2024-09-21T14:15:01.410707568Z  k8s.io/client-go/tools/cache/shared_informer.go:976 +0xea
      2024-09-21T14:15:01.410841912Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
      2024-09-21T14:15:01.410934524Z  k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
      2024-09-21T14:15:01.411065837Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc002a87f70, {0x3eaec60, 0xc002290090}, 0x1, 0xc000955560)
      2024-09-21T14:15:01.411533342Z  k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
      2024-09-21T14:15:01.411673905Z k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc002180770, 0x3b9aca00, 0x0, 0x1, 0xc000955560)
      2024-09-21T14:15:01.411899402Z  k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
      2024-09-21T14:15:01.412032598Z k8s.io/apimachinery/pkg/util/wait.Until(...)
      2024-09-21T14:15:01.412105779Z  k8s.io/apimachinery/pkg/util/wait/backoff.go:161
      2024-09-21T14:15:01.412225197Z k8s.io/client-go/tools/cache.(*processorListener).run(0xc0013f4d80)
      2024-09-21T14:15:01.412300249Z  k8s.io/client-go/tools/cache/shared_informer.go:972 +0x69
      2024-09-21T14:15:01.412441934Z k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      2024-09-21T14:15:01.412502821Z  k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x52
      2024-09-21T14:15:01.412649231Z created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 966
      2024-09-21T14:15:01.412743984Z  k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
      
      
      
      I suspect the resource event handler is running (not great) and called in https://github.com/kubernetes/kubernetes/blob/c9d6fd9ff77f43363898362ec71796dafeb89929/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L129-L150 which results in the nil pointer here https://github.com/kubernetes/kubernetes/blob/c9d6fd9ff77f43363898362ec71796dafeb89929/staging/src/k8s.io/cloud-provider/controllers/service/controller.go#L584 . Most likely because c.eventRecorder is not initialized.

      Version-Release number of selected component (if applicable):

      found during the 4.18 rebase when running the payloads: https://github.com/openshift/kubernetes/pull/2092    

      How reproducible:

      rarely

      Steps to Reproduce:

          1. 
          2.
          3.
          

      Actual results:

          

      Expected results:

      KCM does not crash and the cloud controllers are not running    

      Additional info:

      observed in
      - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-kubernetes-2092-nightly-4.18-e2e-metal-ipi-ovn-serial-virtualmedia/1837446433330958336
      - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-kubernetes-2092-nightly-4.18-e2e-metal-ipi-ovn-serial-ipv4/1837446433297403904
      - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-kubernetes-2092-nightly-4.18-e2e-metal-ipi-ovn-serial-ipv4/1837315052495966208

              mimccune@redhat.com Michael McCune
              fkrepins@redhat.com Filip Krepinsky
              Milind Yadav Milind Yadav
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: