Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23796

not possible to drain a master node after multiple master nodes experience network disruption

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 4.15.0
    • 4.15.0
    • kube-apiserver
    • None
    • No
    • False
    • Hide

      None

      Show
      None
    • NA
    • Release Note Not Required
    • In Progress

      Description of problem:

      - upgrade the cluster
      - 2 or more kube-apiserver pod do not become online. Network access could be lost due to misconfiguration or wrong rhel update. We can simulate this with:
          ssh into a node
          run iptables -A INPUT -p tcp --destination-port 6443 -j DROP
      - 2 or more kube-apiserver-guard pods lose readiness
      - kube-apiserver-guard-pdb PDB blocks the node drain because status.currentHealthy is less than status.desiredHealthy
      - it is not possible to drain the node without overriding eviction requests (forcefully deleting the guard pods)`
      

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      100

      Steps to Reproduce:

      in a description

      Actual results:

      evicting pod openshift-kube-apiserver/kube-apiserver-guard-ip-10-0-19-181.eu-north-1.compute.internal
          error when evicting pods/"kube-apiserver-guard-ip-10-0-19-181.eu-north-1.compute.internal" -n "openshift-kube-apiserver" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
      

      Expected results:

      it is possible to evict the unready pods

      Additional info:

          

       
       
       

       

            fkrepins@redhat.com Filip Krepinsky
            fkrepins@redhat.com Filip Krepinsky
            Ke Wang Ke Wang
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: