Uploaded image for project: 'Observability Documentation'
  1. Observability Documentation
  2. OBSDOCS-856

Examples without limits and resources defined for the collector

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      URL

      https://docs.openshift.com/container-platform/4.14/logging/cluster-logging-deploying.html#create-cluster-logging-cli_cluster-logging-deploying

      https://docs.openshift.com/container-platform/4.14/logging/log_collection_forwarding/log-forwarding.html#about-log-collectors-types_log-forwarding

      https://docs.openshift.com/container-platform/4.14/logging/log_collection_forwarding/cluster-logging-collector.html#configuring-logging-collector_cluster-logging-collector

      DESCRIPTION

      In the past, for example, in the example available in https://docs.openshift.com/container-platform/4.14/logging/config/cluster-logging-memory.html:

         collection:
          logs:
            type: "fluentd"
            fluentd:
              resources: 
                limits:
                  memory: 736Mi
                requests: 3
                  cpu: 200m
                  memory: 736Mi
      ...
      3. Specify the CPU and memory limits and requests for the log collector as needed.

      The examples given where given indicating limits and also saying "Specify the CPU and memory limits and requests for the log collector as needed."

      In all the examples, the `limits` and `resources` are not set as:

      spec:
      # ...
        collection:
          type: <log_collector_type> 
          resources: {}
          tolerations: {}

      And this is not a good example for being given as the collector could use a huge memory and cpu leading to bring down masters and worker nodes.

      Then, it should be good to have good examples and with a good explanation as the one in https://docs.openshift.com/container-platform/4.14/logging/config/cluster-logging-memory.html, but also setting the cpu limit and with the explanation given below with words to "adjust" to the needs

         collection:
          logs:
            type: "fluentd"
            fluentd:
              resources: 
                limits:
                  cpu: 1
                  memory: 736Mi
                requests: 3
                  cpu: 200m
                  memory: 736Mi
      ...
      3. Specify the CPU and memory limits and requests for the log collector as needed. 
      

      Having bad examples without limits leads to problems like: https://issues.redhat.com/browse/LOG-4536 , mostly when Vector works all in memory without buffering on disk.

      NOTES

      The "good examples" should be also adapted to use the new "style". In the past, it was reported in https://issues.redhat.com/browse/OBSDOCS-79 and it's closed.

      In parallel, and as part of the modifying the examples, a big recommendation should be across of the collector configuration recommending always to set the limits. It was requested in https://issues.redhat.com/browse/LOG-4745 to set a default limit when not set by the admin, but it was rejected, then, when not a limit exists can happen the detailed in the bug https://issues.redhat.com/browse/LOG-4536 where the nodes were getting exhausted in memory all used by Vector impacting to the business loads

            Unassigned Unassigned
            rhn-support-ocasalsa Oscar Casal Sanchez
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: