-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
For telemetry only use cases it seems desirable to use Prometheus agent mode, since on the surface we only want to forward data.
This however comes with the big trade-off of not being able to deploy recording rules (and alerts). With an agent strategy for telemetry sending we would have to move all aggregation (normally handled by recording rules) and down sampling* to the telemeter receiving side. This could massively increase resource usage on the telemeter side (however not the topic of this research).
The goal of this epic is to quantify the difference in prometheus resource usage for two scenarios:
- A prometheus agent scraping data and remote writing that data to a target at scrape frequency.
- A prometheus server scraping data and evaluating recording rules at scrape frequency.
The dataset scraped should be fairly small, to mimic the telemetry use case, not more then a few thousands of time series. The server deployment should have a very short retention period (2h). It could be worth it to configure smaller -storage.tsdb.min-block-duration and -storage.tsdb.max-block-duration for the server instance.
- Down sampling here refers to the the difference between the telemetry interval (5m) and the scrape interval (usually 30s). To get accurate recording rule results data has to be present at a higher resolution compared to the telemetry resolution.
COO can be used to accomplish this, since we install the prometheusagents CRD. One only has to create a PrometheusAgent CR manually.
- is related to
-
MON-3872 Send OCP telemetry via Prometheus remote-write
- New