Loading...

XML

Word

Printable

Type: Epic
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- groomed

Epic Name:
prometheus-DoS
Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Docs QE Status:
NEW
Epic Status:
To Do
QE Status:
NEW
Hierarchy Progress Bar:

50% To Do, 0% In Progress, 50% Done

Confidence:
75% (Medium)
Effort:
3

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

For multiple clusters, see:

https://issues.redhat.com/browse/OCPBUGS-15337

https://issues.redhat.com/browse/OCPBUGS-4186

Prometheus was flooded (all its web.max-connections (512 by default) spots was continually filled with query connections), the net stack queues were also filled with query connections, which led to probes not being able to run.

To make debugging such problems easier we can:

See with CCX team if we can add a rule to detect the SYN flooding (in general) from sosreport https://issues.redhat.com/browse/INSIGHTOCP-1307
Add a Prometheus alert when the number of connections that prometheus is processing approaches the max. If we see the problem from another angle, we can say that the probes were failing because Prometheus couldn't accept() and process their connections as it was already dealing with its max (-web.max-connections),

is blocked by

MON-3390 Write post-mortem document on liveness probes being unresponsive due to VPA

Closed

is related to

OCPBUGS-4186 Prometheus ReadinessProbes failing after upgrade to OpenShift 4.10

Closed

Assignee:: Ayoub Mrini

Reporter:: Ayoub Mrini

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/07/31 2:59 PM

Updated:: 2025/08/11 6:34 AM

Target start:: 2023/11/14

Target end:: 2023/11/20

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates