Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58265

OS was in a hung state and does not see any logs being generated when it was Not Ready in the system

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Normal Normal
    • None
    • 4.16.z
    • Node / Kubelet
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      OS was in a hung state and does not see any logs being generated when it was Not Ready in the system    

      Version-Release number of selected component (if applicable):

      Customer is using OCP V.4.16    

      How reproducible:

          Nodes are hosted over zVM hypervisor.
      
      We have seen the multiple occurences of the issue where node went into NotReady state. We not able to ssh or ping the nodes. However, ZVM team able to see login prompt of the node through console.
      
      - We are not able to find exact replication steps when and how this happens.
      - We faced this issue on Feb 17 2025 so as per the recomendations from the support team configured kdump on the cluster nodes. 
      - Two Nodes died again on April 25 & 26. Reference case is linked. (Kdump configured on cluster but not able to find any logs why node does not have any logs) 
      - Now, the same incident happened again on June 24. (Different cluster as KDUMP is not configured)
      - We checked memory and CPU utilization of the node during the time of issue and able to see it is stable.
      
      Requesting engineering help and guidance to check further on this issue and how we can find the root cause of the issue if something similar happens again. As this issue is happening consitently customer is looking for a solution so that we can mitigate this issue further.
      
      
      
      

      Steps to Reproduce:

      Does not have exact steps to replicate the issue but it is happening randmoly
          

      Actual results:

          Node OS is not generating logs.

      Expected results:

          Node OS should generate logs, so that we can find cause of the issue

      Additional info:

          SOS report for April 25 & 26 is already available over the case 04126850.

       

      Please let us know if any additional details are required. 

              sgrunert@redhat.com Sascha Grunert
              rh-ee-aharchin Akhil Harchinder (Inactive)
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: