Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-16214

Kubelet restart causes the node to hang in NotReady on MNO

XMLWordPrintable

    • Critical
    • No
    • OCPNODE Sprint 239 (Blue), OCPNODE Sprint 240 (Blue)
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      8/22: reporter re-test ok, formal QE verification pending ; Green
      8/15: pending confirmation this is a dup of OCPBUGS-14918, then close
      Show
      8/22: reporter re-test ok, formal QE verification pending ; Green 8/15: pending confirmation this is a dup of OCPBUGS-14918 , then close

      Description of problem:

      Kubelet restart on vanilla MNO OCP with version 4.14 causes the node to hang on NotReady state. 

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      always

      Steps to Reproduce:

      1. Deploy 4.14 MNO cluster 
      2. restart kubelet on node: sudo systemctl restart kubelet
      3.
      

      Actual results:

      The node on which the kubelet was restart is stuck in NotReady and doesn't recover.

      Expected results:

      The node should recover in a short time (matter of ~10 seconds).

      Additional info:

      The behavior does not reproduce on all 4.13, it was reproduced on 4.13.0-0.nightly-2023-06-15-044404.
      kubelet version: v1.27.3+af29f64.
      Encountered on OCP version: 4.14.0-0.nightly-2023-07-11-092038.
      This behavior was not encountered on SNO. 

       

            team-mco Team MCO
            rhn-support-shajmakh Shereen Haj
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: