Uploaded image for project: 'Managed Service - Streams'
  1. Managed Service - Streams
  2. MGDSTRM-7904

Prometheus resource limits need to be improved

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • Yes
    • MK - Sprint 232

      WHAT

      Looking at the production OSD instance's `prometheus-kafka-prometheus-0` pod's metrics in the `managed-application-services-observability` namespace the memory and CPU limits seem to be lower than what the system seems to require.

      WHY

      While looking at Single AZ cluster and its requirements for the additional nodes it came to observation that most of the pods in the namespace `managed-application-services-observability` do not define explicit cpu/memory limits, even where specified seem to be under-represented by their need.  In tightly packed system like single AZ the pods can be placed on this node claiming all the CPU and memory which would leave with nor resources expend on observability needs.

      HOW

      Correct the CPU/Memory requirements for the pods in question. We can take a look at the current usage over time and use the maximum plus some buffer of the historical data as the new limit. Make sure that the limits are reasonable for the node size.

              Unassigned Unassigned
              rhn-engineering-rareddy Ramesh Reddy
              MK - Control Plane
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: