Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-4536

Vector pods using up 160G of memory

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide

      Prior to this change, vector collector deployments relied upon the default retry and buffering behavior. This could lead to the delivery pipeline being backed up trying to deliver every message when the availability of an output was unstable. This fix modifies the configuration to have parity with earlier releases of alternate collectors to limit the number of message retries and to drop messages once the threshold is exceeded.
      Show
      Prior to this change, vector collector deployments relied upon the default retry and buffering behavior. This could lead to the delivery pipeline being backed up trying to deliver every message when the availability of an output was unstable. This fix modifies the configuration to have parity with earlier releases of alternate collectors to limit the number of message retries and to drop messages once the threshold is exceeded.
    • Bug Fix
    • Proposed
    • Log Collection - Sprint 242, Log Collection - Sprint 243, Log Collection - Sprint 244, Log Collection - Sprint 245, Log Collection - Sprint 246
    • Important

      Description of problem:

      It's observed Vector pods using up 160G of memory.

          USER      PID      %CPU  %MEM  VSZ-MiB  RSS-MiB  TTY    STAT   START  TIME      COMMAND  
          root      3626153  126   20.7  163472   160364   ?      -      Aug16  9323:42   /usr/bin/vector  

       

      It's needed to know:

      • If the memory used by vector is legitim and the vector pod is not leaking memory. Then, is it posible to profile Vector's memory?
      • where the memory is used and in case that related to some problems with delivering the logs to some output, is it possible to know how much memory is used for each output defined that justifies the usage of memory?

      Currently, not more impacted by the vector_internal_metrics_cardinality_total since in Logging 5.7.6

      Version-Release number of selected component (if applicable):

      CLO 5.7.6

      How reproducible:

      Not able to reproduce in a lab with low load. But the objective is trying to get metrics and profile the memory to confirm where the memory is used and Vector is not leaking.

      Steps to Reproduce:

      N/A

      Actual results:

      Vector consuming until 160G without able to justify what part of Vector is using all this memory

      Expected results:

      Confirm that the memory used by Vector is legitim and indicate with metrics and/or memory profiling that Vector is working as expected and not suffering a leak

      Additional info:

      This is aligned also with RFE https://issues.redhat.com/browse/OBSDA-482 where it's needed to debug/troubleshoot Vector

        1. 20230925_vector_internal_metrics_cardinality_total_since_5.7.6_update.png
          168 kB
          Emmanuel Kasprzyk
        2. back off calc.png
          27 kB
          Jeffrey Cantrill
        3. image-2023-09-27-23-31-09-217.png
          96 kB
          Oscar Casal Sanchez
        4. image-2023-09-27-23-37-11-372.png
          65 kB
          Oscar Casal Sanchez
        5. image-2023-09-27-23-53-51-064.png
          51 kB
          Oscar Casal Sanchez
        6. image-2023-09-28-17-05-02-157.png
          158 kB
          Oscar Casal Sanchez
        7. screenshot-1.png
          191 kB
          Oscar Casal Sanchez

              jcantril@redhat.com Jeffrey Cantrill
              rhn-support-ocasalsa Oscar Casal Sanchez
              Kabir Bharti Kabir Bharti
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: