Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-1616

Design: Automatic scaling of Prometheus/Agent scrape using Prometheus Operator

XMLWordPrintable

    • False
    • False
    • NEW
    • NEW
    • Undefined
    • 0

      To achieve horizontal scalability of our scraping layer, we would like to distribute Service/PodMonitors and any other targets to multiple Scrapers dynamically.

      This is similar to what Otel is doing: https://docs.google.com/document/d/13Gcu5SlbgjrsQJQUuZAjdQo1MOQA76Yji3oX8yHh-p8/edit#heading=h.tq4cuijg0dif so we might want to join forces.

      AC:

      • Design (and optionally PoC) for a solution that allows us to run Prometheus Operator in a mode that allows us to set a group of Prometheus-es that take care of a group of Service/Pod Monitors. Such group will dynamically scale up and down to match the number of targets per instance.
      • Check if we can reuse Otel work (or Otel can reuse our work)

      It's worth to think about setting some smarter sharding e.g per node in future too.

            Unassigned Unassigned
            bplotka Bartlomiej Plotka (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: