Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35150

cri-o RPC error after loading heavy cluster with miltiple projects

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.15.z, 4.16
    • Node / CRI-O
    • Moderate
    • Yes
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      cri-o rpc error: code = DeadlineExceeded desc = context deadline exceeded  after loading clustem
      after loading cluster with multiple project in 120 worker node cluster.

      Version-Release number of selected component (if applicable):

      I tested with versions: 4.15.12, 4.15.13, and many 4.16 nightly. Last one: 4.16.0-0.nightly-2024-05-30-021120

      How reproducible:

      100%

      Steps to Reproduce:

      1. Install GCP-OVN-EtcdEncrypt-FIPS cluster
      2. scale up to 120 worker nodes
      3. setup jump node
      4. ssh to master nodes from jump node
      5. load cluster with projects (https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/kube-burner-ocp/1472/) cluster-density test 

      Actual results:

      - after loading around 1/3 oc client is losing connection.
      - on master nodes check cri-o
      
      crictl ps | grep -v Running
      FATA[0193] validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/crio/crio.sock": rpc error: code = DeadlineExceeded desc = context deadline exceeded 
      
      after that I'm also lost connection with master nodes through ssh.

              harpatil@redhat.com Harshal Patil
              skordas Simon Kordas
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: