-
Bug
-
Resolution: Done
-
Major
-
4.14
-
No
-
Rejected
-
False
-
-
N/A
-
Release Note Not Required
Description of problem:
The TRT ComponentReadiness tool shows what looks like a regression (https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-16%2000%3A00%3A00&capability=Other&component=Monitoring&confidence=95&environment=ovn%20no-upgrade%20amd64%20aws%20hypershift&excludeArches=heterogeneous%2Carm64%2Cppc64le%2Cs390x&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-07-20%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-07-13%2000%3A00%3A00&testId=openshift-tests%3A79898d2e28b78374d89e10b38f88107b&testName=%5Bsig-instrumentation%5D%20Prometheus%20%5Bapigroup%3Aimage.openshift.io%5D%20when%20installed%20on%20the%20cluster%20should%20report%20telemetry%20%5BLate%5D%20%5BSkipped%3ADisconnected%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&variant=hypershift) in the "[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster should report telemetry [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" test. In the ComponentReadiness link above, you can see the sample runs (linked with red "F").
Version-Release number of selected component (if applicable):
4.14
How reproducible:
The pass rate in 4.13 is 100% vs. 81% in 4.14
Steps to Reproduce:
1. There query above focuses on "periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance" jobs and the specific test mentioned. You can see the failures by clicking on the red "F"s 2. 3.
Actual results:
The failures look like: { fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:365]: Unexpected error: <errors.aggregate | len:2, cap:2>: [promQL query returned unexpected results: metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1 [], promQL query returned unexpected results: federate_samples{job="telemeter-client"} >= 10 []] [ <*errors.errorString | 0xc0017611b0>{ s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]", }, <*errors.errorString | 0xc00203d380>{ s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]", }, ]
Expected results:
Query should succeed
Additional info:
I set the severity to Major because this looks like a regression from where it was in the 5 weeks before 4.13 went GA.
- is cloned by
-
OCPBUGS-17797 Prometheus reporting telemetry test intermittent failures due to server side rate limiting
- ASSIGNED
- links to