-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.15.z, 4.16
-
Moderate
-
Yes
-
False
-
Description of problem:
cri-o rpc error: code = DeadlineExceeded desc = context deadline exceeded after loading clustem after loading cluster with multiple project in 120 worker node cluster.
Version-Release number of selected component (if applicable):
I tested with versions: 4.15.12, 4.15.13, and many 4.16 nightly. Last one: 4.16.0-0.nightly-2024-05-30-021120
How reproducible:
100%
Steps to Reproduce:
1. Install GCP-OVN-EtcdEncrypt-FIPS cluster 2. scale up to 120 worker nodes 3. setup jump node 4. ssh to master nodes from jump node 5. load cluster with projects (https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/kube-burner-ocp/1472/) cluster-density test
Actual results:
- after loading around 1/3 oc client is losing connection. - on master nodes check cri-o crictl ps | grep -v Running FATA[0193] validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/crio/crio.sock": rpc error: code = DeadlineExceeded desc = context deadline exceeded after that I'm also lost connection with master nodes through ssh.