Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18697

kube-apiserver-operator does not re-create guard pods when preempted

XMLWordPrintable

    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When a kube-apiserver-guard pod is evicted due to preemption, it just stays in "Completed" status forever, regardless of whether the situation that caused the eviction is already gone.
      
      With the pod stuck in "Completed" state, the corresponding PDB has the false impression that the kube-apiserver pod in that node is not healthy and unnecessarily halts node drains that should be allowed.
      
      Manually deleting the "Completed" pod is a workaround. However, this is something the kube-apiserver-operator should do automatically, users should not be expected to know OCP internals in this deep so they can do themselves.
      

      Version-Release number of selected component (if applicable):

      4.13.9
      

      How reproducible:

      Sometimes (at random)
      

      Steps to Reproduce:

      1. Wait for some situation where kube-apiserver-guard pod is preempted.
      2.
      3.
      

      Actual results:

      kube-apiserver-guard pod should be eventually restarted and run as normal, once the node allows it, without manual workaround.
      

      Expected results:

      kube-apiserver-guard pod stays in "Completed" state forever unless manual workaround is applied.
      

      Additional info:

      A similar situation has been also found with the etcd-quorum-guard pod, but we have better data for this one. Once the fix for this bug is on track, I may open second bug to the etcd team so they can apply a similar fix.
      

              jchaloup@redhat.com Jan Chaloupka
              rhn-support-palonsor Pablo Alonso Rodriguez
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: