Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14007

telemetry remote write test flaky

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.14.0
    • 4.14.0
    • Monitoring
    • None
    • -
    • No
    • MON Sprint 236
    • 1
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required

      Our telemetry test using remote write is increasingly flaky. The recurring error is:

      TestTelemeterRemoteWrite
          telemeter_test.go:103: timed out waiting for the condition: error validating response body "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"container\":\"kube-rbac-proxy\",\"endpoint\":\"metrics\",\"job\":\"prometheus-k8s\",\"namespace\":\"openshift-monitoring\",\"remote_name\":\"2bdd72\",\"service\":\"prometheus-k8s\",\"url\":\"https://infogw.api.openshift.com/metrics/v1/receive\"},\"value\":[1684889572.197,\"20.125925925925927\"]}]}}" for query "max without(pod,instance) (rate(prometheus_remote_storage_samples_failed_total{job=\"prometheus-k8s\",url=~\"https://infogw.api.openshift.com.+\"}[5m]))": expecting Prometheus remote write to see no failed samples but got 20.125926
      

      Any failed samples will cause this test to fail. This is perhaps a too strict requirement. We could consider it good enough if some samples are send successfully. The current version tests telemeter behavior on top of CMO behavior.

            jfajersk@redhat.com Jan Fajerski
            jfajersk@redhat.com Jan Fajerski
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: