Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54801

Pods are in CrashLoopBackOff irrespective of namespaces

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          Pods are getting in CrashLoopBackOff state

      Version-Release number of selected component (if applicable):

          RHOCP 4.14

      How reproducible:

          

      Steps to Reproduce:

      sh-4.4# oc get pods -A -o wide | egrep -i "pending|error|crash"
      ibm-satellite-storage satellite-storage-operator-695d8d5cdd-6rbf9 0/1 CrashLoopBackOff 4192 (22s ago) 13d 172.30.106.xx mylocalnode65 <none> <none>
      openshift-servicemesh istio-cni-node-v2-6-g8l8n 0/1 CrashLoopBackOff 8620 (4m14s ago) 27d 172.30.106.xx mylocalnode65 <none> <none>
      openshift-servicemesh istio-cni-node-v2-6-grnzf 0/1 CrashLoopBackOff 8456 (4m25s ago) 27d 172.30.181.xxx mylocalnode66 <none> <none>
      openshift-console. downloads-569f5c5d58-2pgqn  0/1  CrashLoopBackOff 8217       25d    172.30.106.x     mylocalnode65   <none>   <none>
       
      
      Pod logs:
      ~~~
      2025-04-03T15:25:27Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache       {"error": "Timeout: failed waiting for *v1alpha1.ClusterServiceVersion Informer to sync"}
      2025-04-03T15:25:27Z    ERROR   controller-runtime.source.EventHandler  failed to get informer from cache       {"error": "Timeout: failed waiting for *v1.Infrastructure Informer to sync"}
      ~~~
      ____________________________________________________
      
      Node mylocalnode65:
      ------------------
      MEMORY
        Stats graphed as percent of MemTotal:
          MemUsed    ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊.  97.8%
        RAM:
          41 GiB total ram
          40 GiB (98%) used
      ____________________________________________________
      sh-4.4# free -m
      total used free shared buff/cache available
      Mem: 41937 23945 3103  17 14888    17503
      Swap: 0 0 0
      ____________________________________________________ 
      
      $ oc get --raw /apis/metrics.k8s.io/v1beta1/nodes/mylocalnode65 | jq .
      {
        "kind": "NodeMetrics",
        "apiVersion": "metrics.k8s.io/v1beta1",
        "metadata": {
          "name": "mylocalnode65",
          "creationTimestamp": "2025-04-03T15:22:57Z",
          "labels": {
            "arch": "amd64",
            "beta.kubernetes.io/arch": "amd64",
            "beta.kubernetes.io/instance-type": "upi",
            "beta.kubernetes.io/os": "linux",
      <<snip>>  "timestamp": "2025-04-03T15:22:57Z",
        "window": "5m0s",
        "usage": {
          "cpu": "1274m",
          "memory": "24123244Ki"
        }     

      Actual results:

      Pods are going into CrashLoopBackOff even if the memory is available on the node.

      Expected results:

       Kubelet should consider available memory of node to keep pods in running state.    

      Additional info:

          

              rh-ee-kehannon Kevin Hannon
              rhn-support-duge Dushyant Uge
              None
              None
              Cameron Meadors Cameron Meadors
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: