Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-3293

log-file-metric-exporter exhausting the resources of the node

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • VERIFIED
    • Hide
      Before this update, the log file size map generated by the `log-file-metrics-exporter` component did not remove entries for deleted files, resulting in an increase file size, and process memory. With this update, the log file size map does not contain entries for deleted files.
      Show
      Before this update, the log file size map generated by the `log-file-metrics-exporter` component did not remove entries for deleted files, resulting in an increase file size, and process memory. With this update, the log file size map does not contain entries for deleted files.
    • Log Collection - Sprint 227, Log Collection - Sprint 228
    • Important

      Description of problem:

      The collector pod contains 2 containers:

      1. Starts the fluentd process
      2. Starts the `/usr/local/bin/log-file-metric-exporter` process

      If we review both containers, only the first has requests and limits for cpu and memory and they can be managed from the ClusterLogging Operator:

          - name: COLLECTOR_CONF_HASH
            value: fb4ebfa073fd0ea24153c48f22abdaa9
          image: registry.redhat.io/openshift-logging/fluentd-rhel8@sha256:1140e317d111e13c4900c1b6d128c5fdef05b9f319b0bd693665d67f3139d03a
          imagePullPolicy: IfNotPresent
          name: collector
          ports:
          - containerPort: 24231
            name: metrics
            protocol: TCP
          resources:   <---------------- this is limited as set in the clusterLogging instance
            limits:
              memory: 2Gi
            requests:
              cpu: 100m
              memory: 1Gi

      but the second process  `/usr/local/bin/log-file-metric-exporter` has not limits/requests set by default and is even, not able to set them from the ClusterLogging Operator

        - command:
          - /usr/local/bin/log-file-metric-exporter    <----------- the same process seen in the output from node consuming 8GB of RAM
          - '  -verbosity=2'
          - ' -dir=/var/log/containers'
          - ' -http=:2112'
          - ' -keyFile=/etc/fluent/metrics/tls.key'
          - ' -crtFile=/etc/fluent/metrics/tls.crt'
          image: registry.redhat.io/openshift-logging/log-file-metric-exporter-rhel8@sha256:2f43018b00df04dcdb0eebb7ae90e91dd60970494d13fd0851d91b996c8b0daf
          imagePullPolicy: IfNotPresent
          name: logfilesmetricexporter
          ports:
          - containerPort: 2112
            name: logfile-metrics
            protocol: TCP
          resources: {}     <----------------- not limit and not option in the clusterLogging CR instance of doing it

      Then, for any unknown reason, the `/usr/local/bin/log-file-metric-exporter` process was starting to increase the usage of memory leading to consuming 8G, the moment in the master OCP node was starting to have big issues leading performance in the cluster since the etcd was starting to answer when reaching this node with high times.

      The memory usage by the process was detected in a sosreport, and it was:

       Top MEM-using processes: 
      USER PID %CPU %MEM VSZ-MiB RSS-MiB TTY STAT START TIME COMMAND 
      root 7047 8.1 41.3 9616 8295 ? - Mar10 28603:18 /usr/local/bin/log-file-metric-exporter -verbosity=2 -dir=/var/log/containers 
      root 1531508 20.9 10.3 2859 2080 ? - Nov08 299:43 kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml
      root 3882 9.1 5.3 10252 1064 ? - Mar10 32027:28 etcd --logger=zap --log-level=info

      Version-Release number of selected component (if applicable):

      cluster-logging.5.3.2-20

      But the same is happening in the latest version

      How reproducible:

      Not able to reproduce, but it's easy to review that the container for the process `/usr/local/bin/log-file-metric-exporter` has no limits and is not able to set them.

      Actual results:

      The container `/usr/local/bin/log-file-metric-exporter`  in the collector pods has no limits leading for an unknown reason to consume 8GB of RAM impacting the node ( a master ) and all the clusters.

      Expected results:

      The container `/usr/local/bin/log-file-metric-exporter` has limits/requests set by default not being able to consume without limits and perhaps, having the option to set them from the clusterLogging Operator.

      Then, if something is leading the process to start to consume memory or CPU, the limits stop it.

      Additional info:

       

       

       

       

       

              vimalkum@redhat.com Vimal Kumar
              rhn-support-ocasalsa Oscar Casal Sanchez
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: