Details
-
Story
-
Resolution: Obsolete
-
Normal
-
None
-
None
-
None
-
False
-
-
True
-
BIZ-629 - ELS add on for concurrent (non-pay-as-you-go) RHEL offerings
Description
We learned that we can't rely on recording rules always evaluating at the top of every hour in a 5min window.
Our current strategy when gathering metrics is to query at the top of every hour, and get a single data point if it exists.
We should update our strategy so that we query the entire hour range with a step of 600, and then grab the first appropriate data point.
There's potential overlap between datapoint[6] on one run and datapoint[0] on another and that might be a problem. The recording rule operates on data from now(), and so one that evaluates at 12:31 and aggregates over the past hour gives 11:31-12:31 and then one that runs at 1:01 operates on data from 12:01 - 1:01.
To find the "first appropriate" datapoint, we'll need to do something like this
//hourly metrics gathering List<Metric> dataPoints = fromPrometheus().results() var timestampIndex = 0; var metricValue = 1; Metric firstInTheHour = dataPoints.filter(metric -> metric.value[timestampIndex] >= topOfTheHour).findFirst(); var metricValueWeCareAbout = firstInTheHour.value[metricValue]
Requirements
- make a configurable resolution step. set the default to 3600, and use 600 for swatch-metrics-rhel clowdapp
- introduce a configurable "tolerance" window added. default with be 0. that will preserve the current behavior with openshift telemeter. for the swatch-metrics-rhel configuration, set that window to 30min.