Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12933

Node Tuning Operator crashloops when in Hypershift mode

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.14
    • Node Tuning Operator
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-12883. The following is the description of the original issue:

      Description of problem:

      The node tuning operator segfaults when used in hypershift mode.
      
      This is caused by a code path that attempts to retrieve MachineConfigPool objects when no MachineConfigPool informers are initialized. From what I understand, the node tuning operator shouldn't be attempting to use MachineConfigPools when used with hypershift.
      
      While I'm not a NTO expert, I took a shot at contributing a PR to address this here, https://github.com/openshift/cluster-node-tuning-operator/pull/637. If nothing else, maybe it helps illustrate the issue.
      
      

      Version-Release number of selected component (if applicable):

      4.14 nightly

      How reproducible:

      100% (with hypershift kubevirt platform)

      Steps to Reproduce:

      Run the hypershift nodepool e2e tests. The tests might pass, but the NTO will get into a crash loop
      
      bin/test-e2e \
          --test.v \
          --test.timeout=0 \
          --test.run='TestNodePool' \
          --e2e.node-pool-replicas=2 \
          --e2e.kubevirt-node-memory="6Gi" \
          --e2e.platform="KubeVirt" \
          --e2e.latest-release-image=${OCP_IMAGE_LATEST} \
          --e2e.previous-release-image=${OCP_IMAGE_PREVIOUS} \
          --e2e.pull-secret-file=$PULL_SECRET_PATH
      
      
      2.
      3.
      

      Actual results:

      NTO crash loops indefinitely

      Expected results:

      NTO does not crash loop

      Additional info:

      
      
      oc logs -n e2e-clusters-c25rn-example-bbqgr cluster-node-tuning-operator-54b6b67d44-nrqjr
      I0427 20:11:07.768012       1 main.go:71] Go Version: go1.19.6
      I0427 20:11:07.768088       1 main.go:72] Go OS/Arch: linux/amd64
      I0427 20:11:07.768092       1 main.go:73] node-tuning Version: 951cc099-dirty
      I0427 20:11:08.832309       1 request.go:690] Waited for 1.047043965s due to client-side throttling, not priority and fairness, request: GET:https://kube-apiserver:6443/apis/authorization.openshift.io/v1?timeout=32s
      I0427 20:11:09.538821       1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-node-tuning-operator/node-tuning-operator-lock...
      I0427 20:13:37.608229       1 leaderelection.go:258] successfully acquired lease openshift-cluster-node-tuning-operator/node-tuning-operator-lock
      I0427 20:13:37.608322       1 controller.go:1302] starting Tuned controller
      I0427 20:13:37.608699       1 server.go:102] starting metrics server
      I0427 20:13:37.709613       1 controller.go:1402] started events processor/controller
      I0427 20:13:37.829629       1 controller.go:764] updated profile example-bbqgr-test-ntomachineconfig-replace-bzfsj [openshift-hugepages]
      I0427 20:13:38.014296       1 controller.go:764] updated profile example-bbqgr-test-ntomachineconfig-replace-m9nxp [openshift-hugepages]
      E0427 20:13:38.427379       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
      goroutine 478 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1ae9940?, 0x310ead0})
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000789780?})
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
      panic({0x1ae9940, 0x310ead0})
          /usr/lib/golang/src/runtime/panic.go:884 +0x212
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*ProfileCalculator).getMachineCountForMachineConfigPool(0x1ac7d20?, {0xc0009deb07?, 0x1dc8e7d?})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/mc.go:154 +0x2a
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).syncMachineConfigHyperShift(0xc0003a4cc0, {0xc0005ff3e0, 0x2b}, 0xc00108cf00)
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:933 +0x121e
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).syncProfile(0xc0003a4cc0, 0x0?, {0xc000aac680, 0x31})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:717 +0x1a89
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).sync(0xc0003a4cc0, {{0x1da4684, 0x7}, {0xc000060050, 0x26}, {0xc000aac680, 0x31}, {0x0, 0x0}})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:371 +0x153d
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).eventProcessor.func1(0xc0003a4cc0, {0x1c3ae20?, 0xc000789780?})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:193 +0x23f
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).eventProcessor(0xc0003a4cc0)
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:212 +0x5c
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x1e?)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000131b90?, {0x209bbe0, 0xc00099b350}, 0x1, 0xc0006532c0)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000887fb0?, 0x3b9aca00, 0x0, 0x0?, 0x448005?)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
      k8s.io/apimachinery/pkg/util/wait.Until(0x8c9d2a?, 0x0?, 0x0?)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
      created by github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).run
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:1401 +0x1bc5
      panic: runtime error: invalid memory address or nil pointer dereference [recovered]
          panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x18946aa]goroutine 478 [running]:
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000789780?})
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
      panic({0x1ae9940, 0x310ead0})
          /usr/lib/golang/src/runtime/panic.go:884 +0x212
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*ProfileCalculator).getMachineCountForMachineConfigPool(0x1ac7d20?, {0xc0009deb07?, 0x1dc8e7d?})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/mc.go:154 +0x2a
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).syncMachineConfigHyperShift(0xc0003a4cc0, {0xc0005ff3e0, 0x2b}, 0xc00108cf00)
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:933 +0x121e
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).syncProfile(0xc0003a4cc0, 0x0?, {0xc000aac680, 0x31})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:717 +0x1a89
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).sync(0xc0003a4cc0, {{0x1da4684, 0x7}, {0xc000060050, 0x26}, {0xc000aac680, 0x31}, {0x0, 0x0}})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:371 +0x153d
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).eventProcessor.func1(0xc0003a4cc0, {0x1c3ae20?, 0xc000789780?})
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:193 +0x23f
      github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).eventProcessor(0xc0003a4cc0)
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:212 +0x5c
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x1e?)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000131b90?, {0x209bbe0, 0xc00099b350}, 0x1, 0xc0006532c0)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000887fb0?, 0x3b9aca00, 0x0, 0x0?, 0x448005?)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
      k8s.io/apimachinery/pkg/util/wait.Until(0x8c9d2a?, 0x0?, 0x0?)
          /go/src/github.com/openshift/cluster-node-tuning-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92 +0x25
      created by github.com/openshift/cluster-node-tuning-operator/pkg/operator.(*Controller).run
          /go/src/github.com/openshift/cluster-node-tuning-operator/pkg/operator/controller.go:1401 +0x1bc5
      
      

            jmencak Jiri Mencak
            openshift-crt-jira-prow OpenShift Prow Bot
            Liquan Cui Liquan Cui
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: