Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12638

Node drain failed during reboot as pod stuck in Terminating state in OCP version 4.10

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      IHAC who is referring https://bugzilla.redhat.com/show_bug.cgi?id=1943564 and https://bugzilla.redhat.com/show_bug.cgi?id=1903228 bug which got fixed in 4.9 version of OCP , but they are still facing the issue.

      They are doing compliance operator's scans which involves the node reboot, during the node reboot, the cluster node drain was stopped due to one of the pods staying in `terminating` state.

        master: 'pool is degraded because nodes fail with "1 nodes are reporting degraded
            status on sync": "Node master3.ocp4.XXX.mcp is reporting: \"failed to drain
            node : master3.ocp4.XXX.mcp after 1 hour\""'
      
      
      2023-03-04T03:39:22.686744936Z I0304 03:39:22.686701    4844 drain.go:91] Draining failed with: error when waiting for pod "container-registry-79648798c4-9m68m" terminating: global timeout reached: 1m30s, retrying
      2023-03-04T03:41:54.938201702Z I0304 03:41:54.938161    4844 drain.go:91] Draining failed with: error when waiting for pod "container-registry-79648798c4-9m68m" terminating: global timeout reached: 1m30s, retrying
      

      Customer is not ok with the workaround and they are looking for permanent fix.

      How reproducible:

      
      

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Node drain is failing when pods on the node stuck in terminating state.
      

      Expected results:

      It should remove the pod stuck in terminating state and complete the node drain
      

      Additional info:

      I will share the sos report and must-gather soon.
      

            pehunt@redhat.com Peter Hunt
            rhn-support-psingour Poornima Singour
            David Darrah David Darrah
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: