-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.15.0
-
No
-
CNF RAN Sprint 249, CNF RAN Sprint 250
-
2
-
False
-
Soak testing PTP CPU Utilization failed due to invalid Soak testing PTP CPU Utilization from rate(container_cpu_usage_seconds_total) query.
This could happen in any of the dualnicbc-parallel, bc-parallel, or oc-parallel test suites.
The issue is intermittent. The frequency of this issue occurring is HIGH.
Actual Result:
ft5.1: �[36mINFO �[0m[Feb 18 00:44:14.868][ptp.go: 135] CPU Utilization TC Config: {CpuTestSpec:{TestSpec:{Enable:true FailureThreshold:3 Duration:5} CustomParams:{PromTimeWindow:70s Node:{CpuUsageThreshold:100} Pod:[{PodType:ptp-operator Container:<nil> CpuUsageThreshold:30} {PodType:linuxptp-daemon Container:<nil> CpuUsageThreshold:80} {PodType:linuxptp-daemon Container:cloud-event-proxy CpuUsageThreshold:30} {PodType:linuxptp-daemon Container:linuxptp-daemon-container CpuUsageThreshold:40}]}} Description:The test measures PTP CPU usage and fails if >15mcores} �[36mINFO �[0m[Feb 18 00:44:14.964][ptp.go: 165] Configured rate timeWindow: 1m10s, cadvisor scrape interval: 30 secs. �[36mINFO �[0m[Feb 18 00:45:14.965][ptp.go: 186] Running test for 5m0s (failure threshold: 3) �[36mINFO �[0m[Feb 18 00:46:14.965][ptp.go: 196] Retrieving cpu usage of the ptp pods. �[37mDEBUG �[0m[Feb 18 00:46:14.965][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]), attempt 0 �[33mWARNING�[0m[Feb 18 00:46:15.134][ptptesthelper.go: 481] Invalid result vector length in prometheus response: {Status:success Error: Data:{ResultType:vector Result:0xc00012c060}} �[33mWARNING�[0m[Feb 18 00:46:15.134][prometheus.go: 135] Failed to get a prometheus response for query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]): <nil> �[37mDEBUG �[0m[Feb 18 00:46:15.134][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]), attempt 1 �[33mWARNING�[0m[Feb 18 00:46:16.295][ptptesthelper.go: 481] Invalid result vector length in prometheus response: {Status:success Error: Data:{ResultType:vector Result:0xc00012c060}} �[33mWARNING�[0m[Feb 18 00:46:16.295][prometheus.go: 135] Failed to get a prometheus response for query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]): <nil> �[37mDEBUG �[0m[Feb 18 00:46:16.295][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]), attempt 2 �[33mWARNING�[0m[Feb 18 00:46:17.460][ptptesthelper.go: 481] Invalid result vector length in prometheus response: {Status:success Error: Data:{ResultType:vector Result:0xc00012c060}} �[33mWARNING�[0m[Feb 18 00:46:17.460][prometheus.go: 135] Failed to get a prometheus response for query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]): <nil> �[37mDEBUG �[0m[Feb 18 00:46:17.460][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]), attempt 3 �[33mWARNING�[0m[Feb 18 00:46:18.622][ptptesthelper.go: 481] Invalid result vector length in prometheus response: {Status:success Error: Data:{ResultType:vector Result:0xc00012c060}} �[33mWARNING�[0m[Feb 18 00:46:18.622][prometheus.go: 135] Failed to get a prometheus response for query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]): <nil> �[37mDEBUG �[0m[Feb 18 00:46:18.622][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]), attempt 4 �[33mWARNING�[0m[Feb 18 00:46:19.771][ptptesthelper.go: 481] Invalid result vector length in prometheus response: {Status:success Error: Data:{ResultType:vector Result:0xc00012c060}} �[33mWARNING�[0m[Feb 18 00:46:19.771][prometheus.go: 135] Failed to get a prometheus response for query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-ghrpk", container=""}[1m10s]): <nil> �
Expected Result:
pt8.1: �[36mINFO �[0m[Feb 17 17:05:22.685][ptp.go: 135] CPU Utilization TC Config: {CpuTestSpec:{TestSpec:{Enable:true FailureThreshold:3 Duration:5} CustomParams:{PromTimeWindow:70s Node:{CpuUsageThreshold:100} Pod:[{PodType:ptp-operator Container:<nil> CpuUsageThreshold:30} {PodType:linuxptp-daemon Container:<nil> CpuUsageThreshold:80} {PodType:linuxptp-daemon Container:cloud-event-proxy CpuUsageThreshold:30} {PodType:linuxptp-daemon Container:linuxptp-daemon-container CpuUsageThreshold:40}]}} Description:The test measures PTP CPU usage and fails if >15mcores} �[36mINFO �[0m[Feb 17 17:05:22.783][ptp.go: 165] Configured rate timeWindow: 1m10s, cadvisor scrape interval: 30 secs. �[36mINFO �[0m[Feb 17 17:06:22.784][ptp.go: 186] Running test for 5m0s (failure threshold: 3) �[36mINFO �[0m[Feb 17 17:07:22.785][ptp.go: 196] Retrieving cpu usage of the ptp pods. �[37mDEBUG �[0m[Feb 17 17:07:22.785][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-7v69m", container=""}[1m10s]), attempt 0 �[37mDEBUG �[0m[Feb 17 17:07:22.922][ptptesthelper.go: 497] Pod: linuxptp-daemon-7v69m, container: (ns openshift-ptp) cpu usage: 0.0005737646459500775 (ts: 2024-02-17 17:06:46.349 +0000 UTC) �[36mINFO �[0m[Feb 17 17:07:22.922][ptp.go: 232] Node master1.ptpcimno.telco5gran.eng.rdu2.redhat.com: pod: linuxptp-daemon-7v69m (ns:openshift-ptp) cpu usage: 0.00057 �[37mDEBUG �[0m[Feb 17 17:07:22.922][ptp.go: 240] Checking cpu usage of pod linuxptp-daemon-7v69m. Cpu Usage: 0.00057 - Threshold: 0.08000 �[37mDEBUG �[0m[Feb 17 17:07:22.922][prometheus.go: 119] Querying prometheus, query rate(container_cpu_usage_seconds_total{namespace="openshift-ptp", pod="linuxptp-daemon-7v69m", container="cloud-event-proxy"}[1m10s]), attempt 0 �[37mDEBUG �[0m[Feb 17 17:07:23.074][ptptesthelper.go: 497] Pod: linuxptp-daemon-7v69m, container: cloud-event-proxy (ns openshift-ptp) cpu usage: 0.00010342890579286081 (ts: 2024-02-17 17:06:46.5 +0000 UTC) �[36mINFO �[0m[Feb 17 17:07:23.074][ptp.go: 254] Node master1.ptpcimno.telco5gran.eng.rdu2.redhat.com: pod: linuxptp-daemon-7v69m, container: cloud-event-proxy (ns:openshift-ptp) cpu usage: 0.00010 �[37mDEBUG �[0m[Feb 17 17:07:23.074][ptp.go: 262] Checking cpu usage of container cloud-event-proxy (pod linuxptp-daemon-7v69m). Cpu Usage: 0.00010 - Threshold: 0.03000
Not sure if it is related but promethous pod has a lot of errors like these during the test:
ts=2024-02-18T00:44:22.011Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.105:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:46:11.747Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.102:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:46:45.202Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.110:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:47:41.743Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.102:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:48:52.046Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.105:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:53:22.107Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.105:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:53:52.051Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https://10.8.34.105:10250/metrics/cadvisor msg="Error on ingesting out-of-order samples" num_dropped=13 ts=2024-02-18T00:56:22.054Z caller=scrape.go:1655 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/kubelet/1 target=https: