Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33129

Panic when we remove an OCL infra MCP and we try to create new ones with different names

XMLWordPrintable

    • Moderate
    • None
    • MCO Sprint 254, MCO Sprint 255
    • 2
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a potential panic seen in Machine Config Controller and Machine Build Controller objects resulted from de-reference accidentally deleted MachineOSConfig/MachineOSBuild to read build status. The panic is controlled with additional error conditions to warn wallowed MachineOSConfig deletions. (link:https://issues.redhat.com/browse/OCPBUGS-33129[*OCPBUGS-33129])
      Show
      * Previously, a potential panic seen in Machine Config Controller and Machine Build Controller objects resulted from de-reference accidentally deleted MachineOSConfig/MachineOSBuild to read build status. The panic is controlled with additional error conditions to warn wallowed MachineOSConfig deletions. (link: https://issues.redhat.com/browse/OCPBUGS-33129 [* OCPBUGS-33129 ])
    • Bug Fix
    • Done

      Description of problem:

      Given that we create a new pool, and we enable OCB in this pool, and we remove the pool and the MachineOSConfig resource, and we create another new pool to enable OCB again, then the controller pod panics.
          

      Version-Release number of selected component (if applicable):

      pre-merge https://github.com/openshift/machine-config-operator/pull/4327
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Create a new infra MCP
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: infra
      spec:
        machineConfigSelector:
          matchExpressions:
            - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/infra: ""
      
      
          2. Create a MachineOSConfig for infra pool
      
      oc create -f - << EOF
      apiVersion: machineconfiguration.openshift.io/v1alpha1
      kind: MachineOSConfig
      metadata:
        name: infra
      spec:
        machineConfigPool:
          name: infra
        buildInputs:
          imageBuilder:
            imageBuilderType: PodImageBuilder
          baseImagePullSecret:
            name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
          renderedImagePushSecret:
            name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
          renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
      EOF
      
      
          3. When the build is finished, remove the MachineOSConfig and the pool
      
      oc delete machineosconfig infra
      oc delete mcp infra
      
          4. Create a new infra1 pool
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: infra1
      spec:
        machineConfigSelector:
          matchExpressions:
            - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra1]}
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/infra1: ""
      
          5. Create a new machineosconfig for infra1 pool
      
      oc create -f - << EOF
      apiVersion: machineconfiguration.openshift.io/v1alpha1
      kind: MachineOSConfig
      metadata:
        name: infra1
      spec:
        machineConfigPool:
          name: infra1
        buildInputs:
          imageBuilder:
            imageBuilderType: PodImageBuilder
          baseImagePullSecret:
            name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
          renderedImagePushSecret:
            name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
          renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
          containerFile:
          - containerfileArch: noarch
            content: |-
              RUN echo 'test image' > /etc/test-image.file
      EOF
      
      
      
          

      Actual results:

      The MCO controller pod panics (in updateMachineOSBuild):
      
      E0430 11:21:03.779078       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
      goroutine 265 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00035e000?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
      panic({0x3547bc0?, 0x53ebb20?})
      	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
      github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?)
      	<autogenerated>:1 +0x9
      k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25
      k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74
      k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e
      k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b
      github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0007097a0, 0x0, 0x0?)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33
      github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772
      github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateMachineOSBuild(0xc0007097a0, {0xc001c37800?, 0xc000029678?}, {0x3904000?, 0xc0028361a0})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:395 +0xd1
      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246
      k8s.io/client-go/tools/cache.(*processorListener).run.func1()
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:970 +0xea
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e5738?, {0x3de6020, 0xc0008fe780}, 0x1, 0xc0000ac720)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x6974616761706f72?, 0x3b9aca00, 0x0, 0x69?, 0xc0005e5788?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
      k8s.io/apimachinery/pkg/util/wait.Until(...)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
      k8s.io/client-go/tools/cache.(*processorListener).run(0xc000b97c20)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69
      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
      created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 248
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
      panic: runtime error: invalid memory address or nil pointer dereference [recovered]
      	panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]
      
      
      
      When the controller pod is restarted, it panics again, but in a different function (addMachineOSBuild):
      
      E0430 11:26:54.753689       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
      goroutine 97 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x15555555aa?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
      panic({0x3547bc0?, 0x53ebb20?})
      	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
      github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?)
      	<autogenerated>:1 +0x9
      k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25
      k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74
      k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e
      k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b
      github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc000899560, 0x0, 0x0?)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33
      github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772
      github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addMachineOSBuild(0xc000899560, {0x3904000?, 0xc0006a8b60})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:386 +0xc5
      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:239
      k8s.io/client-go/tools/cache.(*processorListener).run.func1()
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x13e
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00066bf38?, {0x3de6020, 0xc0008f8b40}, 0x1, 0xc000c2ea20)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00066bf88?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
      k8s.io/apimachinery/pkg/util/wait.Until(...)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
      k8s.io/client-go/tools/cache.(*processorListener).run(0xc000ba6240)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69
      k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
      created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 43
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
      panic: runtime error: invalid memory address or nil pointer dereference [recovered]
      	panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]
      
      
      
      
      
          

      Expected results:

      No panic should happen. Errors should be controlled.
      
          

      Additional info:

          In order to recover from this panic, we need to  manually delete the MachineOSBuild resources that are related to the pool that does not exist anymore.

            rh-ee-iqian Ines Qian
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: