-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.13.0
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
Node density tests fail due to increased pod latency times on IBM Cloud
Version-Release number of selected component (if applicable):
4.13.0-0.nightly-2023-03-05-104719
How reproducible: 100%
Steps to reproduce:
1. Build/create an IBM Cloud cluster with the following parameters:
vm_type_masters: bx2-8x32
vm_type_workers: bx2-4x16
region: 'us-east'
installer_payload_image: latest 4.13 nightly build
2. Run the node-density kube burner test with the following parameters:
VARIABLE: 200
NODE_COUNT: 40 (you can scale up to 40 worker nodes during the kube-burner job or before)
QPS=50
BURST=50
Actual results:
The job fails (increased pod latency).
OCP Version | Flexy Id | Scale Ci Job | Grafana URL | Status | Cloud | Arch Type | Network Type | Worker Count | PODS_PER_NODE | NODES | Avg Pod Ready (ms) | Avg Pod Scheduled (ms) | Avg Initialized (ms) | Avg Containers Ready (ms) | Google Sheet Data | Time/Date | ENV_VARS |
4.13.0-0.nightly-2023-03-05-104719 | 183166 | 2159 | 59eabcb1-d8b9-4d79-a7ee-f31578738625 | FAIL | ibmcloud | amd64 | OVN | 40 | 200 | 40 | 3900 | 0 | 7 | 3900 | 2023-03-06 10:10:35.961914-05:00 | QPS=50 BURST=50 |
03-06 10:02:27.095 ############################################### 03-06 10:02:27.095 [1mMon Mar 6 15:02:26 UTC 2023 Indexing enabled, using metrics from metrics-profiles/metrics.yaml[0m 03-06 10:02:27.095 ~/ws/workspace/multibranch-pipeline_kube-burner/workloads/kube-burner/workloads/node-pod-density ~/ws/workspace/multibranch-pipeline_kube-burner/workloads/kube-burner 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="🔥 Starting kube-burner (0.17.3@c38fe7eb37c62686b68e2b64bdc8311a4d73d8f1) with UUID 59eabcb1-d8b9-4d79-a7ee-f31578738625" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="👽 Initializing prometheus client" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="📁 Creating indexer: elastic" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="📈 Creating measurement factory" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="Registered measurement: podLatency" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="Preparing create job: node-density" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="Job node-density: 7460 iterations with 1 Pod replicas" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="Pre-load: images from job node-density" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="Pre-load: Creating DaemonSet using image gcr.io/google_containers/pause:3.1 in namespace preload-kube-burner" 03-06 10:02:27.095 time="2023-03-06 15:02:26" level=info msg="Pre-load: Sleeping for 2m0s" 03-06 10:04:33.463 time="2023-03-06 15:04:26" level=info msg="Pre-load: Deleting namespace preload-kube-burner" 03-06 10:04:33.463 time="2023-03-06 15:04:27" level=info msg="Deleting namespaces with label kube-burner-preload=true" 03-06 10:04:33.463 time="2023-03-06 15:04:27" level=info msg="Waiting for namespaces to be definitely deleted" 03-06 10:04:45.616 time="2023-03-06 15:04:44" level=info msg="Triggering job: node-density" 03-06 10:04:45.616 time="2023-03-06 15:04:44" level=info msg="Creating Pod latency watcher for node-density" 03-06 10:04:45.616 time="2023-03-06 15:04:44" level=info msg="QPS: 50" 03-06 10:04:45.616 time="2023-03-06 15:04:44" level=info msg="Burst: 50" 03-06 10:04:45.616 time="2023-03-06 15:04:44" level=info msg="Running job node-density" 03-06 10:07:22.020 time="2023-03-06 15:07:13" level=info msg="Waiting up to 1h0m0s for actions to be completed" 03-06 10:07:28.544 time="2023-03-06 15:07:28" level=info msg="Actions in namespace 59eabcb1-d8b9-4d79-a7ee-f31578738625 completed" 03-06 10:07:28.544 time="2023-03-06 15:07:28" level=info msg="Finished the create job in 2m44s" 03-06 10:07:28.544 time="2023-03-06 15:07:28" level=info msg="Verifying created objects" 03-06 10:07:35.177 time="2023-03-06 15:07:34" level=info msg="pods found: 7460 Expected: 7460" 03-06 10:07:35.177 time="2023-03-06 15:07:34" level=info msg="Stopping measurement: podLatency" 03-06 10:07:35.177 time="2023-03-06 15:07:34" level=info msg="Evaluating latency thresholds" 03-06 10:07:35.177 time="2023-03-06 15:07:34" level=error msg="❗ P99 Ready latency (11.60s) higher than configured threshold: 5s" 03-06 10:08:01.654 time="2023-03-06 15:07:58" level=info msg="node-density: PodScheduled 50th: 0 99th: 6 max: 119 avg: 0" 03-06 10:08:01.654 time="2023-03-06 15:07:58" level=info msg="node-density: ContainersReady 50th: 3339 99th: 11599 max: 24006 avg: 3900" 03-06 10:08:01.654 time="2023-03-06 15:07:58" level=info msg="node-density: Initialized 50th: 0 99th: 141 max: 1286 avg: 7" 03-06 10:08:01.654 time="2023-03-06 15:07:58" level=info msg="node-density: Ready 50th: 3339 99th: 11599 max: 24006 avg: 3900" 03-06 10:08:01.654 time="2023-03-06 15:07:58" level=info msg="Job node-density took 194.93 seconds" 03-06 10:08:01.654 time="2023-03-06 15:07:58" level=info msg="Indexing metadata information for job: node-density" 03-06 10:08:01.654 time="2023-03-06 15:07:59" level=info msg="Waiting 30s extra before scraping prometheus" 03-06 10:08:33.660 time="2023-03-06 15:08:29" level=info msg="🔍 Scraping prometheus metrics for benchmark from 2023-03-06 15:04:44.064206001 +0000 UTC to 2023-03-06 15:08:29.273780395 +0000 UTC" 03-06 10:09:55.151 time="2023-03-06 15:09:43" level=info msg="Finished execution with UUID: 59eabcb1-d8b9-4d79-a7ee-f31578738625" 03-06 10:09:55.151 time="2023-03-06 15:09:43" level=info msg="👋 Exiting kube-burner" 03-06 10:09:55.151 ~/ws/workspace/multibranch-pipeline_kube-burner/workloads/kube-burner
Expected results:
Cluster density test passed (pod latency results are within accepted values).
- clones
-
OCPBUGS-165 Spike in pod-latency graph observed due to ovnkube-master restarts
- Closed