Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44436

Hypershift operator pod crashes when a HC is created with no NodePort address

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.14.z, 4.15.z, 4.17.z, 4.16.z
    • HyperShift
    • None
    • Hypershift Sprint 263
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When a HostedCluster CR is created with NodePort servicePublishingStrategy for APIServer, but without specifying the nodePort.address field, the HC pods/namespace are not created - no obvious error is presented but the hypershift operator Pod repeatedly panics

      Version-Release number of selected component (if applicable):

      MCE 2.6.3, ACM 2.11.3

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Create a HostedCluster which requests NodePort publishing for APIServer but omits the nodePort.address section:
      apiVersion: hypershift.openshift.io/v1beta1
      kind: HostedCluster
      metadata:
        name: "examplecluster"
        namespace: "clusters"
      spec:
      #...
        services:
          - service: APIServer
            servicePublishingStrategy:
              type: NodePort
      # no nodePort.address field...
          - service: OAuthServer
            servicePublishingStrategy:
              type: Route
          2. Observe that no apparent error is shown in the UI or the status of the HC, and the HostedCluster's namespace and Pods are not created.

      Actual results:

      No HostedCluster pods/namespace is created. If the operator pods in the hypershift namespace are observed, they are seen to repeatedly panic and they go into CrashLoopBackoff:
      
      $ oc logs -n hypershift operator-8679988789-9dvmw -f
      ...
      panic: runtime error: invalid memory address or nil pointer dereference [recovered]
      	panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x32afdc7]
      
      goroutine 853 [running]:
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
      	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x2f4
      panic({0x3973460?, 0x5d4fb00?})
      	/usr/lib/golang/src/runtime/panic.go:770 +0x136
      github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster.getNodePortIP(0xc00173b008)
      	/remote-source/app/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go:5179 +0x167
      github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster.validateNodePortVsServiceNetwork(0xc00173b008)
      	/remote-source/app/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go:4469 +0x56
      github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster.(*HostedClusterReconciler).validateNetworks(0xc000748d80, 0xc00173b008)
      	/remote-source/app/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go:4318 +0x3d0
      github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster.(*HostedClusterReconciler).validateConfigAndClusterCapabilities(0xc000748d80, {0x462b268, 0xc0044836e0}, 0xc00173b008)
      	/remote-source/app/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go:3975 +0xbca
      github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster.(*HostedClusterReconciler).reconcile(0xc000748d80, {0x462b268, 0xc0044836e0}, {{{0xc002c82dc8, 0x8}, {0xc002c82dc0, 0x8}}}, {{0x462f370, 0xc004483710}, 0x0}, ...)
      	/remote-source/app/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go:840 +0x78cc
      github.com/openshift/hypershift/hypershift-operator/controllers/hostedcluster.(*HostedClusterReconciler).Reconcile(0xc000748d80, {0x462b268, 0xc0044836e0}, {{{0xc002c82dc8, 0x8}, {0xc002c82dc0, 0x8}}})
      	/remote-source/app/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go:313 +0x628
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0006f9540, {0x462b268, 0xc0044836e0}, {{{0xc002c82dc8, 0x8}, {0xc002c82dc0, 0x8}}})
      	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 +0x1be
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006f9540, {0x462b268, 0xc0044836e0}, {0x3b29cc0, 0xc001e86dc0})
      	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 +0x4cb
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006f9540, {0x462b2a0, 0xc000aca0a0})
      	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x405
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
      	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0xd1
      created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 288
      	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x9c7

      Expected results:

      nodePort.address field should be marked as a required field in the CRD, or the UI/status of the HostedCluster should flag up the missing value. Operator should not crash.

      Additional info:

      OCP 4.16.20 Hosting cluster on baremetal

              Unassigned Unassigned
              rh-ee-gellner Gareth Ellner
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: