Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32379

Placeholder pods for warm-up nodes can be scheduled incorrectly causing deployments to be stuck

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Undefined
    • None
    • 4.16.0
    • HyperShift
    • Critical
    • No
    • Hypershift Sprint 252, Hypershift Sprint 253
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

         When using cluster size tagging, and a number of placeholder pods is configured to keep nodes warm, in some cases pods of 2 deployments can be scheduled on different zones but the same subnet pair (1 of each) so that the deployments never finish rolling out.

      Version-Release number of selected component (if applicable):

          4.16.0 - main

      How reproducible:

      Sometimes    

      Steps to Reproduce:

          1. Setup a management cluster with request-serving machinesets
          2. Configure clustersizingconfiguration to have at least 2 pairs of warm nodes for the smallest size.
          

      Actual results:

          For a given pair of deployments, 1 pod of each deployment is scheduled, with the other left in pending.

      Expected results:

          For a given pair of deployments, all pods are scheduled.

      Additional info:

            Example state:
      
      $ oc get pods -n hypershift-request-serving-node-placeholders
      NAME                                   READY   STATUS    RESTARTS   AGE
      placeholder-small-0-7b696687d8-cnf6r   0/1     Pending   0          4h59m
      placeholder-small-0-7b696687d8-qb2bz   1/1     Running   0          4h59m
      placeholder-small-1-575fc487df-8rgxd   0/1     Pending   0          4h59m
      placeholder-small-1-575fc487df-9bt7n   1/1     Running   0          4h59m
      
      Pods from small-0 and small-1 are scheduled on nodes with the same osd-fleet-manager.openshift.io/paired-nodes label, resulting in pending pods that will never schedule.

      Attachments

        Activity

          People

            cewong@redhat.com Cesar Wong
            cewong.openshift Cesar Wong
            Jie Zhao Jie Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: