-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14.0
-
None
-
-
-
No
-
MON Sprint 236
-
1
-
False
-
-
N/A
-
Release Note Not Required
Our telemetry test using remote write is increasingly flaky. The recurring error is:
TestTelemeterRemoteWrite telemeter_test.go:103: timed out waiting for the condition: error validating response body "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"container\":\"kube-rbac-proxy\",\"endpoint\":\"metrics\",\"job\":\"prometheus-k8s\",\"namespace\":\"openshift-monitoring\",\"remote_name\":\"2bdd72\",\"service\":\"prometheus-k8s\",\"url\":\"https://infogw.api.openshift.com/metrics/v1/receive\"},\"value\":[1684889572.197,\"20.125925925925927\"]}]}}" for query "max without(pod,instance) (rate(prometheus_remote_storage_samples_failed_total{job=\"prometheus-k8s\",url=~\"https://infogw.api.openshift.com.+\"}[5m]))": expecting Prometheus remote write to see no failed samples but got 20.125926
Any failed samples will cause this test to fail. This is perhaps a too strict requirement. We could consider it good enough if some samples are send successfully. The current version tests telemeter behavior on top of CMO behavior.
- is cloned by
-
OCPBUGS-14072 TestAlertmanagerUWMSecrets test flaky
- Closed
- links to
-
RHEA-2023:5006 rpm