After launching Thanos Querier, I noticed that Thanos consumes a lot of memory.
If memory limits were set to thanos, this situation led to OOM of thanos pods.
If memory limits were not set to thanos, In this situation thanos pods consumed all the memory of the node and eventually led to OOM of the node on which thanos pods were deployed.
- If memory limits were applied to thanos are 2Gi:
- It took too much time when made an API performance query from monitoring dashboard for a time range 15 min to 6 hours.
- While when increased the time rage to 12 hours or more it was getting timed out and thanos pods were getting OOM kill.
- When reduced the time range the pods were automatically getting stable until the time range was increased back to 12 hours or more.
- Tried increasing the memory limit for thanos to 4Gi:
- Now Thanos pods were getting OOM killed when time rage was increased to 2 day or above.
- Even for 1 day time range the console was facing issue loading the data.