Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1681

Worker nodes become NotReady when put load on the arm cluster.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.12
    • Multi-Arch / ARM
    • None
    • Moderate
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When we put load on the arm cluster using clusterbuster utility worker nodes become not ready.

      Version-Release number of selected component (if applicable):

      4.11.0-0.nightly-arm64-2022-09-19-125802 with profile 10_aarch64_Disconnected UPI on AWS & EFS

      How reproducible:

      Always

      Steps to Reproduce:

      1. Put the load on clusterbuster utility, create namespaces in range from 1 to 8.

      ./OpenShift4-tools/clusterbuster -P server -b 5 -p 10 -D .01 -M 2 -N 2 -r 4 -d 2 -c 10 -m 1000 -v -x

      2. Check all the clusterbuster pods should be running.
      3. Check the all worker nodes to be ready. 

      Actual results:

      oc get node
      NAME                                         STATUS     ROLES    AGE   VERSIONip-10-0-154-168.us-west-2.compute.internal   Ready      worker   62m   v1.24.0+3882f8fip-10-0-154-217.us-west-2.compute.internal   Ready      master   73m   v1.24.0+3882f8fip-10-0-170-173.us-west-2.compute.internal   Ready      master   73m   v1.24.0+3882f8fip-10-0-187-141.us-west-2.compute.internal   Ready      worker   62m   v1.24.0+3882f8fip-10-0-216-103.us-west-2.compute.internal   Ready      master   73m   v1.24.0+3882f8fip-10-0-216-43.us-west-2.compute.internal    NotReady   worker   62m   v1.24.0+3882f8f 

      oc adm top node

      NAME                                         CPU(cores)   CPU%        MEMORY(bytes)   MEMORY%     

      ip-10-0-154-217.us-west-2.compute.internal   535m         7%          10264Mi         69%         

      ip-10-0-170-173.us-west-2.compute.internal   1001m        13%         9747Mi          65%         

      ip-10-0-187-141.us-west-2.compute.internal   232m         6%          5866Mi          86%         

      ip-10-0-216-103.us-west-2.compute.internal   684m         9%          11566Mi         77%         

      ip-10-0-216-43.us-west-2.compute.internal    <unknown>    <unknown>   <unknown>       <unknown>   

      ip-10-0-154-168.us-west-2.compute.internal   <unknown>    <unknown>   <unknown>       <unknown>

       

       oc describe node/ip-10-0-216-43.us-west-2.compute.internal

      Describe logs - https://drive.google.com/file/d/1xOnjadulS_2bzxJbs_VLnF5xfEj351Mw/view?usp=sharing

      Expected results:

      All worker node to be ready

      Additional info:

      http://file.nay.redhat.com/~qili/ip-10-0-148-98.us-west-2.compute.internal_node.log.tar.gzhttp://file.nay.redhat.com/~qili/ip-10-0-197-206.us-west-2.compute.internal_node.log.tar.gzhttp://file.nay.redhat.com/~qili/must-gather.local.1307979296927767376.tar.gz

        1. notReadyNode.png
          notReadyNode.png
          196 kB
        2. disk_network.png
          disk_network.png
          327 kB
        3. kubelet_crio.png
          kubelet_crio.png
          463 kB

            jeffdyoung Jeff Young
            rhn-support-rgangwar Rahul Gangwar
            Rahul Gangwar Rahul Gangwar
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: