Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-12088

Observatorium API is susceptible to data loss on metrics remote write path

XMLWordPrintable

    • MCO Sprint 23, MCO Sprint 24
    • Critical
    • No

      Description of problem:

       

      Metrics forwarding should be retried from the metrics collector remote write client on non 2xx responses as per https://github.com/stolostron/multicluster-observability-operator/blob/main/collectors/metrics/pkg/metricsclient/metricsclient.go#L524-L563

       

      However, in the ACM fork of observatorium, we use a customised forwarder behind the gateway to forward requests to multiple remote write sinks. 

       

      However, as per https://github.com/stolostron/observatorium/pull/87 it appears the client can never get a non-200 response in any case. 

      Steps to Reproduce:

      1. Take the backend down
      2. Send remote write requests

      Actual results:

      Data is dropped

      Expected results:

      Requests are retried 

              pgough@redhat.com Philip Gough
              pgough@redhat.com Philip Gough
              Xiang Yin Xiang Yin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: