-
Story
-
Resolution: Unresolved
-
Normal
-
ACM 2.11.0
What
We rolled out Thanos v0.35.1 to our production environment and hit issues in terms of ingest errors and CPU usage on the Thanos Receive component.
Why
https://github.com/thanos-io/thanos/pull/7045 introduced the "receive.forward.async-workers" flag with a default value of 5 with seems not be sufficient for high-scale environments. The default was rolled out to our MST instance without issues.
How
Add a param for this flag into the template in https://github.com/rhobs/configuration
Try to determine from existing metrics and traces the suitable value for that field in both telemeter prod and hypershift prod instances.
A.C
- The flag is added as a template param
- Thanos v0.35.0 is rolled out to all prod envs and ingest is reliable with the value chosen