-
Bug
-
Resolution: Done
-
Major
-
None
-
4.14
-
No
-
Proposed
-
False
-
This is a ovn-ic 120 node environment with 4.14.0-0.nightly-2023-06-30-131338.
node-density-cni test is ran on 120 node env with 80 pods-per-node. It tries to create 4015 deployments (each with 2 pods) and 4015 servies. However service creation stated failing after 1464 (i.e Service/webserver-1-1464).
[2023-07-02T15:13:26.029+0000] {subprocess.py:93} INFO - time="2023-07-02 15:13:26" level=error msg="Error creating object Service/webserver-1-1464 in namespace 42242802-node-density-cni-20230702: Post \"https://api.venkataanil-ovn-ic-4.14-aws-ovn-medium-cp.perfscale.devcluster.openshift.com:6443/api/v1/namespaces/42242802-node-density-cni-20230702/services?timeout=15s\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Randomly some services were created between 1464 and 4015. For example, Service/webserver-1-3645 was created succesfully.
[2023-07-02T15:20:59.328+0000] {subprocess.py:93} INFO - time="2023-07-02 15:20:59" level=error msg="Error creating object Service/webserver-1-3644 in namespace 42242802-node-density-cni-20230702: Post \"https://api.venkataanil-ovn-ic-4.14-aws-ovn-medium-cp.perfscale.devcluster.openshift.com:6443/api/v1/namespaces/42242802-node-density-cni-20230702/services?timeout=15s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
[2023-07-02T15:21:00.178+0000] {subprocess.py:93} INFO - time="2023-07-02 15:21:00" level=error msg="Error creating object Service/webserver-1-3648 in namespace 42242802-node-density-cni-20230702: Post \"https://api.venkataanil-ovn-ic-4.14-aws-ovn-medium-cp.perfscale.devcluster.openshift.com:6443/api/v1/namespaces/42242802-node-density-cni-20230702/services?timeout=15s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
We observed a dip in CPU idle time (reaching 0.06%) and high user cpu (726% out of 800%) in workers during this duration. Grafana dashboard
https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/8s5LxbA0r3tO3TAqnaRGN5IqTYL3rFQ9
https://grafana.rdu2.scalelab.redhat.com:3000/d/FwPsenaaa/kube-burner-report-icv3?orgId=1&from=1688310000000&to=1688324399000&var-Datasource=AWS+Pro+-+ripsaw-kube-burner&var-platform=&var-platform=AWS&var-sdn=&var-sdn=OVNKubernetes&var-workload=node-density-cni&var-worker_nodes=120&var-uuid=42242802-node-density-cni-20230702&var-master=ip-10-0-150-139.us-west-2.compute.internal&var-worker=ip-10-0-128-149.us-west-2.compute.internal&var-infra=ip-10-0-129-136.us-west-2.compute.internal&var-namespace=All&var-latencyPercentile=P99
Similar behaviour observed in another test run where it started failing after 1868 (i.e Service/webserver-1-1868)
[2023-07-02, 12:28:21 UTC] {subprocess.py:93} INFO - time="2023-07-02 12:28:21" level=error msg="Error creating object Service/webserver-1-1868 in namespace e2a73d17-node-density-cni-20230702: Post \"https://api.venkataanil-ovn-ic-cni-4.14-aws-ovn-medium-cp.perfscale.devcluster.openshift.com:6443/api/v1/namespaces/e2a73d17-node-density-cni-20230702/services?timeout=15s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
[2023-07-02, 12:28:21 UTC] {subprocess.py:93} INFO - time="2023-07-02 12:28:21" level=error msg="Retrying object creation"
must-gather for the second run http://ec2-54-212-114-216.us-west-2.compute.amazonaws.com:7070/index/venkataanil/4.14-aws-ovn-medium-cp/manual__2023-07-02T11:20:21.494581%2B00:00-AWS-4.14.0-ovnkubernetes/must_gather/2023-07-02_01:57_PM/must-gather-2023-07-02_01-39_PM.tar.xz
So this test (creating services) is consitently failing at scale.
This is a regression with OVN-IC as this issue is not seen on{}
- OVN legacy environment created with same nightly image
- OVN IC environemnt created with OVN v3 image (quay.io/itssurya/dev-images:ic-scale-v3){}
must-gather with OVN v3 image (which had succesful run) http://storage.scalelab.redhat.com/anilvenkata/must-gather-icv3-cni.tar.xz
and grafana dashboard https://grafana.rdu2.scalelab.redhat.com:3000/d/FwPsenaaa/kube-burner-report-icv3?orgId=1&from=1688651100000&to=1688652959000