Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20105

HyperShift Operator does not guarantee that there are two nodes with labels for serving nodes

XMLWordPrintable

    • Moderate
    • No
    • Hypershift Sprint 243
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      Description of problem:

      The HyperShift Operator does not guarantee that two request serving nodes will be labeled with the HCP's namespace-name. It is likely that it labels the nodes initially and then doesn't notice if the nodes get deleted by something else.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create a HCP with dedicated request serving nodes
      2. Delete one of the request serving nodes (via deleting the node directly or its machine)
      3. Observe that the replacement node does not have the required label for scheduling its request-serving pods

      Actual results:

      HCP's can exist without two nodes labeled with the HCP's name, causing the kube-apiserver pods to be unschedulable

      ❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012
      NAME                                        STATUS   ROLES    AGE   VERSION
      ip-10-0-34-188.us-east-2.compute.internal   Ready    worker   9h    v1.27.6+1648878
      ❯ k get po -n ocm-staging-26ljge23ub1112ve884u0opvkj2c4lpc-perf-rhcp-0012 -lapp=kube-apiserver -owide   
      NAME                             READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
      kube-apiserver-54854bcb7-v88dq   0/5     Pending   0          151m   <none>         <none>                                      <none>           <none>
      kube-apiserver-54854bcb7-x5jqt   5/5     Running   0          3h2m   10.128.236.6   ip-10-0-34-188.us-east-2.compute.internal   <none>           <none>
      

      Expected results:

      Every HCP has two nodes labeled with the HCP's name

      ❯ k get po -n ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017 -l app=kube-apiserver -owide
      NAME                            READY   STATUS    RESTARTS   AGE    IP             NODE                                        NOMINATED NODE   READINESS GATES
      kube-apiserver-5f85cd4b-l57qr   5/5     Running   0          169m   10.128.218.6   ip-10-0-114-35.us-east-2.compute.internal   <none>           <none>
      kube-apiserver-5f85cd4b-lqfsx   5/5     Running   0          169m   10.128.129.6   ip-10-0-59-232.us-east-2.compute.internal   <none>           <none>
      ❯ k get no -lhypershift.openshift.io/cluster=ocm-staging-26ljip0ck3d2i1bejp2sipio4okhgttn-perf-rhcp-0017
      NAME                                        STATUS   ROLES    AGE    VERSION
      ip-10-0-114-35.us-east-2.compute.internal   Ready    worker   24h    v1.27.6+1648878
      ip-10-0-59-232.us-east-2.compute.internal   Ready    worker   5d2h   v1.27.6+1648878

      Additional info:

       

              agarcial@redhat.com Alberto Garcia Lamela
              mshen.openshift Michael Shen (Inactive)
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: