Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22441

Thanos Querier high CPU and memory usage till OOM


    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.12.z
    • Monitoring
    • None
    • -
    • No
    • False
    • Hide



      Description of problem:

      After launching Thanos Querier, I noticed that Thanos consumes a lot of memory.
      If memory limits were set to thanos, this situation led to OOM of thanos pods.
      If memory limits were not set to thanos, In this situation thanos pods consumed all the memory of the node and eventually led to OOM of the node on which thanos pods were deployed.
      Example explanation:
      - If memory limits were applied to thanos are 2Gi:
        - It took too much time when made an API performance query from monitoring dashboard for a time range 15 min to 6 hours.
        - While when increased the time rage to 12 hours or more it was getting timed out and thanos pods were getting OOM kill.
        - When reduced the time range the pods were automatically getting stable until the time range was increased back to 12 hours or more.
      - Tried increasing the memory limit for thanos to 4Gi:
        - Now Thanos pods were getting OOM killed when time rage was increased to 2 day or above.
        - Even for 1 day time range the console was facing issue loading the data.

      Version-Release number of selected component (if applicable):

      OCP: 4.12.11
      Thanos: 0.28.1

      How reproducible:


      Steps to Reproduce:


      Actual results:

      Thanos pod consume all the memory and are getting OOM killed.

      Expected results:

      Thanos pods should not consume too much memory

      Additional info:


            hasun@redhat.com Haoyu Sun
            rhn-support-amanverm Aman Dev Verma
            Junqi Zhao Junqi Zhao
            0 Vote for this issue
            4 Start watching this issue