Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31808

control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error

    XMLWordPrintable

Details

    • No
    • False
    • Hide

      None

      Show
      None
    • Fixed ControlPlaneMachineSetOperator to handle scenario where infrastructure resource has not been configured with vSphere platform spec.

    Description

      Description of problem:

      control-plane-machine-set operator pod stuck into crashloopbackoff state with panic: runtime error: invalid memory address or nil pointer dereference while extracting the failureDomain from the controlplanemachineset. Below is the error trace for reference.
      ~~~
      2024-04-04T09:32:23.594257072Z I0404 09:32:23.594176       1 controller.go:146]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="c282f3e3-9f9d-40df-a24e-417ba2ea4106"
      2024-04-04T09:32:23.594257072Z I0404 09:32:23.594221       1 controller.go:125]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55"
      2024-04-04T09:32:23.594274974Z I0404 09:32:23.594257       1 controller.go:146]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55"
      2024-04-04T09:32:23.597509741Z I0404 09:32:23.597426       1 watch_filters.go:179] reconcile triggered by infrastructure change
      2024-04-04T09:32:23.606311553Z I0404 09:32:23.606243       1 controller.go:220]  "msg"="Starting workers" "controller"="controlplanemachineset" "worker count"=1
      2024-04-04T09:32:23.606360950Z I0404 09:32:23.606340       1 controller.go:169]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400"
      2024-04-04T09:32:23.609322467Z I0404 09:32:23.609217       1 panic.go:884]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400"
      2024-04-04T09:32:23.609322467Z I0404 09:32:23.609271       1 controller.go:115]  "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="controlplanemachineset" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400"
      2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference [recovered]
      2024-04-04T09:32:23.612540681Z     panic: runtime error: invalid memory address or nil pointer dereference
      2024-04-04T09:32:23.612540681Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a5911c]
      2024-04-04T09:32:23.612540681Z 
      2024-04-04T09:32:23.612540681Z goroutine 255 [running]:
      2024-04-04T09:32:23.612540681Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
      2024-04-04T09:32:23.612571624Z     /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa
      2024-04-04T09:32:23.612571624Z panic({0x1c8ac60, 0x31c6ea0})
      2024-04-04T09:32:23.612571624Z     /usr/lib/golang/src/runtime/panic.go:884 +0x213
      2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.VSphereProviderConfig.ExtractFailureDomain(...)
      2024-04-04T09:32:23.612571624Z     /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/vsphere.go:120
      2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.providerConfig.ExtractFailureDomain({{0x1f2a71a, 0x7}, {{{{...}, {...}}, {{...}, {...}, {...}, {...}, {...}, {...}, ...}, ...}}, ...})
      2024-04-04T09:32:23.612588145Z     /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/providerconfig.go:212 +0x23c
      ~~~
          

      Version-Release number of selected component (if applicable):

      
          

      How reproducible:

      
          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      control-plane-machine-set operator stuck into crashloopback off state while cluster upgrade.
          

      Expected results:

      control-plane-machine-set operator should be upgraded without any errors.
          

      Additional info:

      This is happening during the cluster upgrade of Vsphere IPI cluster from OCP version 4.14.z to 4.15.6 and may impact other z stream releases. 
      from the official docs[1]  I see providing the failure domain for the Vsphere platform is tech preview feature.
      [1] https://docs.openshift.com/container-platform/4.15/machine_management/control_plane_machine_management/cpmso-configuration.html#cpmso-yaml-failure-domain-vsphere_cpmso-configuration
          

      Attachments

        Issue Links

          Activity

            People

              rhn-support-ngirard Neil Girard
              rhn-support-nkashyap Nirupma Nirupma
              Huali Liu Huali Liu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated: