-
Feature Request
-
Resolution: Done
-
Normal
-
None
-
None
-
False
-
False
-
Undefined
-
-
-
-
-
1. Proposed title of this feature request
Provide ability to horizontally scale Prometheus
2. What is the nature and description of the request?
Today a Prometheus instance scrapes all the endpoints, which limits the number of endpoints/series that can be collected. This request is to provide a way of addressing this limitation so that resources, especially memory, required by a single Prometheus instance stay reasonable.
3. Why does the customer need this? (List the business requirements here)
Running a big cluster (300 nodes made of big bare-metal servers) to support lots of jobs getting created at the same time Prometheus is currently configured with 500GB and still gets OOM killed time to time when job pods get in crashloopback for whatever reason.
4. List any affected packages or components.
Monitoring/Prometheus
- links to