Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: DO380 - OCP4.14-en-2-20240617
Affects Version/s: DO380 - OCP4.14-en-1-20240220
Component/s: DO380
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Chapter:
6
Intelligence Requested:
Market:
Language:

en-US (English)

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Please fill in the following information:

URL:	https://rol.redhat.com/rol/app/courses/do380-4.14/pages/ch06s05
Reporter RHNID:	chetan-rhls
Section Title:	Lab: OpenShift Monitoring

Issue description: The deployments that are dropped during this labs setup have a good chance of practically killing one of the nodes. It would be a good idea to dial down the amount of load dropped on it or changing what kind of alert is supposed to appear.

I’ve experienced this twice.
Worker03 simply wasn’t able to deal with the load to a point where cluster operators on this node would not respond and node was marked as unavailable. Given time maybe it would recover, but I gave it 30m to no avail. In the end I force deleted the pods and restarted the node. The problem is that because the cluster lost the communication with the node it didn’t even produce the desired alert

Steps to reproduce:

Workaround:

Expected result:

Assignee:: Bernardo Andres Gargallo Jaquotot

Reporter:: Chetan Tiwary

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/04/19 6:54 PM

Updated:: 2024/06/18 8:10 AM

Resolved:: 2024/05/28 9:06 AM

Details

Description

Please fill in the following information:

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty