Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32759

Cluster Sizing PlaceHolder Deployment can be created with invalid NodeAffinity

XMLWordPrintable

    • None
    • None
    • Hypershift Sprint 252, Hypershift Sprint 253
    • 2
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem

      When provisioning an HCP on an MC enabled with sizing enabled (that has no existing HCPs) HCP install can be stuck trying to schedule the kube-apiserver for the hosted control plane. It seems the the placeholder deployment cannot be created, because of an empty selector value in the NodeAffinity:

      operator-56b7ccb598-4hqz4 operator {"level":"error","ts":"2024-04-23T13:35:42Z","msg":"Reconciler error","controller":"DedicatedServingComponentSchedulerAndSi
      zer","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedCluster","HostedCluster":{"name":"dry3","namespace":"ocm-staging-2aqkcjamdtbcmjtp0lk1
      il3vo9hfd4n1"},"namespace":"ocm-staging-2aqkcjamdtbcmjtp0lk1il3vo9hfd4n1","name":"dry3","reconcileID":"0772c093-ceef-46c1-a450-6bc8184ba633","error":"failed t
      o ensure placeholder deployment: Deployment.apps \"ocm-staging-2aqkcjamdtbcmjtp0lk1il3vo9hfd4n1-dry3\" is invalid: spec.template.spec.affinity.nodeAffinity.re
      quiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].values: Required value: must be specified when `operator` is 'In' or 'No
      tIn'","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-r
      untime/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/sr
      c/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.
      func2.2\n\t/opt/app-root/src/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
      

      This appears to be due the the In selector of the nodeAffinity being populated with an empty list in the source: https://github.com/openshift/hypershift/blob/main/hypershift-operator/controllers/scheduler/dedicated_request_serving_nodes.go#L704

      The value of unavailableNodePairs can be an empty list in the case that no HCPs exist on the cluster already, and therefore no nodes are labelled with both a cluster and a serving pair label. In this case, the empty list is passed in the NodeAffinity and results in the error above

              cewong@redhat.com Cesar Wong
              rh-ee-btroutma Brae Troutman
              None
              None
              Jie Zhao Jie Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: