Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-1838

Enforce body_size_limit

XMLWordPrintable

    • False
    • False
    • NEW
    • NEW
    • undefined
    • Sprint 217, Sprint 218, Sprint 219, Sprint 220
    • 0

      Once https://issues.redhat.com/browse/MON-1837 is implemented upstream, we should make use of this flag downstream to continue to limit the impact that a malicious target can have on Prometheus and the cluster as a whole.

      The context behind this is briefly mentioned in MON-1837, but the goal is to enforce a global body_size_limit for the platform Prometheuses depending on the size of the cluster so that we can limit how many metrics are ingested by Prometheus. We noticed that `sample_limit`  does not completely protect against targets exposing millions and millions of series which would result in a scrape request of hundreds of megabytes. Prometheus would not have enough RAM available to fully ingest this request which would result in Prometheus running out of memory and the node going down even though they are mechanisms in place in the kernel / kubelet to prevent that.

      A heuristic that spasquie came with would be to multiply the estimated maximum number of samples that the more expensive target as based on the data we collect from https://issues.redhat.com/browse/MON-1637 + a certain margin by 200 which is on estimated size in bytes of a sample + a certain margin of error.

      In addition, since this is very sensitive and if we get the maths wrong we might end up breaking clusters, it would be great to add a field to CMO's config to disable the limit in case a cluster-admin runs into an unexpected issue and knows that they setup is correct. That would at least provide them a way to recover, although that would leave them in a potentially dangerous situation.

      DoD:

      • Configure enforce_body_size_limit in CMO based on the following heuristic:
        • 200 * (max_number_of_samples_per_target | depending on cluster size)
      • Add a field in CMO's config to disable enforce_body_size_limit

            hasun@redhat.com Haoyu Sun
            dgrisonn@redhat.com Damien Grisonnet
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: