Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-1714

Memory/CPU spike issues seen with Logging 5.2

    XMLWordPrintable

Details

    • Logging (LogExp) - Sprint 206
    • Passed
    • NEW
    • NEW
    • Hide
      Before this update, OpenShift Elasticsearch Operator reconciliation of the `ServiceAccounts` overwrote third-party-owned fields that contained secrets. This issue caused memory and CPU spikes due to frequent secret re-creation. This update resolves the issue: the OpenShift Elasticsearch Operator does not overwrite third-party-owned fields.
      Show
      Before this update, OpenShift Elasticsearch Operator reconciliation of the `ServiceAccounts` overwrote third-party-owned fields that contained secrets. This issue caused memory and CPU spikes due to frequent secret re-creation. This update resolves the issue: the OpenShift Elasticsearch Operator does not overwrite third-party-owned fields.

    Description

      Memory/CPU spike issues seen on OCP 4.8 clusters when Logging 5.2 builds are used. Issues are seen only after installing the Elasticsearch and Logging operators and not seen on normal deployed clusters. Logging Builds tried 5.2.0-41 , 5.2.0-45

      Config used:
      Masters:
      Memory: 32 GB (Dedicated)
      Processors: 1
      Processing units: 0.5 (Shared)
       
      workers:
      Memory: 64 GB (Dedicated)
      Processors: 1
      Processing units: 0.8 (Shared)

      1. oc adm top nodes
        W0827 00:28:44.186897 67960 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
        NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
        master-0 1550m 20% 14695Mi 46%
        master-1 2552m 34% 16492Mi 52%
        master-2 2930m 39% 28794Mi 91%
        worker-0 3455m 46% 20210Mi 31%
        worker-1 1500m 20% 19017Mi 29%

      The above one is with the usual config that we used for the verification of Logging 5.0 and Logging 5.1
      Also on increasing the memory of Masters from 32 GB  -> 64 GB, memory spike was came to normal but now seeing the CPU spike
      ```

      1. oc adm top nodes
        W0827 06:44:16.892845 75508 top_node.go:119] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
        NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
        master-0 7023m 93% 25249Mi 39%
        master-1 3958m 52% 24382Mi 37%
        master-2 3012m 40% 21627Mi 68%
        worker-0 3669m 48% 24201Mi 37%
        worker-1 1758m 23% 22642Mi 35%
        ```

      While gathering Log cluster is not stable, not able to get all must-gather logs properly
      https://drive.google.com/drive/folders/1XLTohRyhjRsk61CyPa_oEX-j6ythG85Z?usp=sharing

      Attachments

        Activity

          People

            ptsiraki@redhat.com Periklis Tsirakidis
            tkapoor1 Tania Kapoor
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: