Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-3292

Make Prometheus flooding/DoS problems easier to detect

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • prometheus-DoS
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • NEW
    • To Do
    • NEW
    • 50% To Do, 0% In Progress, 50% Done
    • 75% (Medium)
    • 3

      For multiple clusters, see:

       

      https://issues.redhat.com/browse/OCPBUGS-15337

      https://issues.redhat.com/browse/OCPBUGS-4186

      Prometheus was flooded (all its web.max-connections (512 by default) spots was continually filled with query connections), the net stack queues were also filled with query connections, which led to probes not being able to run.

       

      To make debugging such problems easier we can:

       

      • See with CCX team if we can add a rule to detect the SYN flooding (in general) from sosreport https://issues.redhat.com/browse/INSIGHTOCP-1307
      • Add a Prometheus alert when the number of connections that prometheus is processing approaches the max. If we see the problem from another angle, we can say that the probes were failing because Prometheus couldn't accept() and process their connections as it was already dealing with its max (-web.max-connections),

              rh-ee-amrini Ayoub Mrini
              rh-ee-amrini Ayoub Mrini
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: