Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: DO380 - OCP4.14-en-2-20240617
Affects Version/s: DO380 - OCP4.14-en-1-20240220
Component/s: DO380
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Chapter:
0
Intelligence Requested:
Market:
Language:

en-US (English)

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Please fill in the following information:

URL:	https://role.rhu.redhat.com/rol-rhu/app/courses/do380-4.14/pages/pr01s02
Reporter RHNID:	rhn-support-ablum
Section Title:	(all)

Issue description

For DO380 environments running longer than 7 days, prometheus will consume enough ephemeral storage on the nodes where the prometheus-k8s-X pods run that the kubelet triggers its hard eviction to clean up space. This results in a number of issues with various cluster operators as it effectively removes two of the nodes from the cluster (via the NoSchedule key:node.kubernetes.io/disk-pressure taint).

Steps to reproduce:

Use the cluster normally for longer than 7 days. See screenshot for the metrics trendline for a node. Prometheus is using about 17G of the 40G available on the node.

Workaround:

Killing off the prometheus pods will result in them getting rescheduled to another node. This will temporarily fix the issue until the ephemeral storage is used up again later.

[student@workstation ~]$ oc scale statefulset/prometheus-k8s -n openshift-monitoring --replicas 0

NOTE: The openshift monitoring operator will automatically re-adjust the replica count here resulting in two new prometheus-k8s-X pods.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screenshot from 2024-04-30 07-01-01.png
83 kB
2024/04/30 11:11 AM

Assignee:: Bernardo Andres Gargallo Jaquotot

Reporter:: Andrew Blum

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/04/30 11:11 AM

Updated:: 2024/06/18 8:10 AM

Resolved:: 2024/05/28 9:07 AM

Details

Description

Please fill in the following information:

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty