Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63213

Need to merge and backport priorityClassName fix to fix a gracefulShutdown bug

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18
    • kube-apiserver
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • x86_64
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Based on the issue described here: https://github.com/kubernetes/kubernetes/issues/133442
      
      Setting the priorityClassName field on static pod definitions have no impact on the shutdown order as kubelet ignores the field for static pods. This causes the static pods to be terminated in the first round of terminations rather than at the time specified by the priorityClassName.
      
      This renders the gracefulShutdown order almost useless for Single Node Openshift because we lose kube-apiserver and kube-etcd right away.
      
      I have opened the PRs below to fix this issue. Could we please merge and immediately backport this fix ASAP as our customers are facing very long shutdown times, shutdown hangs, and forced shutdowns impacting the storage layer for SNO environments because on this issue.
      
      https://github.com/openshift/cluster-kube-apiserver-operator/pull/1915
      https://github.com/openshift/cluster-etcd-operator/pull/1476
      https://github.com/openshift/cluster-kube-controller-manager-operator/pull/865
      https://github.com/openshift/cluster-kube-scheduler-operator/pull/572
      
      Describe the impact to you or the business
      Long shutdown times and storage layer problems caused by forceful termination from graceful Termination not being respected. 
      
      In what environment are you experiencing this behavior?
      All SNO environments
      
      How frequently does this behavior occur? Does it occur repeatedly or at certain times?
      Every shutdown 

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              Unassigned Unassigned
              rhn-support-nchoudhu Novonil Choudhuri
              None
              None
              Ke Wang Ke Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: