Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22899

Self-managed HCP pods are scheduled on single mgmt cluster node when no zones are in use

    XMLWordPrintable

Details

    • No
    • Hypershift Sprint 246, Hypershift Sprint 247
    • 2
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      
      In the self-managed HCP use case, if the on-premise baremetal management cluster does not have nodes labeled with the "topology.kubernetes.io/zone" key, then all HCP pods for a High Available cluster are scheduled to a single mgmt cluster node.
      
      This is a result of the way the affinity rules are constructed.
      
      Take the pod affinity/antiAffinity example below, which is generated for a HA HCP cluster. If the "topology.kubernetes.io/zone" label does not exist on the mgmt cluster nodes, then the pod will still get scheduled but that antiAffinity rule is effectively ignored. That seems odd due to the usage of the "requiredDuringSchedulingIgnoredDuringExecution" value, but I have tested this and the rule truly is ignored if the topologyKey is not present.
      
      
              podAffinity: 
                preferredDuringSchedulingIgnoredDuringExecution: 
                - podAffinityTerm: 
                    labelSelector: 
                      matchLabels: 
                        hypershift.openshift.io/hosted-control-plane: clusters-vossel1
                    topologyKey: kubernetes.io/hostname
                  weight: 100
              podAntiAffinity: 
                requiredDuringSchedulingIgnoredDuringExecution: 
                - labelSelector: 
                    matchLabels: 
                      app: kube-apiserver
                      hypershift.openshift.io/control-plane-component: kube-apiserver
                  topologyKey: topology.kubernetes.io/zone
      
      In the event that no "zones" are configured for the baremetal mgmt cluster, then the only other pod affinity rule is one that actually colocates the pods together. This results in a HA HCP having all the etcd, apiservers, etc... scheduled to a single node.
      

      Version-Release number of selected component (if applicable):

      4.14
      
      
      

      How reproducible:

      100%
      
      

      Steps to Reproduce:

      1. Create a self-managed HA HCP cluster on a mgmt cluster with nodes that lack the "topology.kubernetes.io/zone" label
      
      

      Actual results:

      all HCP pods are scheduled to a single node.
      
      

      Expected results:

      HCP pods should always be spread across multiple nodes.
      
      

      Additional info:

      
      A way to address this is to add another anti-affinity rule which prevents every component from being scheduled on the same node as its replicas
      
      

      Attachments

        Issue Links

          Activity

            People

              sjenning Seth Jennings
              rhn-engineering-dvossel David Vossel
              Liangquan Li Liangquan Li
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: