Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: openshift-4.10.z, openshift-4.11.z, openshift-4.12.z
Component/s: API
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Department:
Product
Hierarchy Progress:
0
Hierarchy Progress Bar:

0% 0%
Intelligence Requested:
Market:

SFDC Cases Counter:
SFDC Cases Links:

Description of problem:

I observed a broken operator on a customer's OCP cluster which created > 95K Secrets. The large number of objects brought ETCD and the OCP cluster to their knees. The control plane became completely unresponsive. 

To work around this, master nodes were enlarged to restore some functionality, the rogue operator and its namespace were identified, and a ResourceQuota with `.spec.hard.secrets` was put into place to stop the bleeding while the operator's author was consulted.

Perhaps we should consider default ResourceQuotas (like default ulimits in RHEL) that would protect our customers from themselves, broken software, and bad actors. Administrators with the right permissions would be able to thoughtfully increase the Quotas when they actually need a large number of objects.

Version-Release number of selected component (if applicable):

Impacts all OCP clusters, as the limitation is really ETCD

How reproducible:

EZ PZ

Steps to Reproduce:

1. Create lots of objects (Secerts, ConfigMaps, anything...) with some kind of looping mechanism
2. Keep doing this until you have tens of thousands of objects
3. Watch ETCD and the control plane grind to a halt

Actual results:

The cluster struggles as expected

Expected results:

The cluster struggles as expected

Additional info:

This is really just an attempt to get us thinking about how we can mitigate this issue. I figure we can either institute default ResourceQuotas for new Projects, preach the benefits of ResourceQuotas to our customers via documentation or conversations, or something better that someone else will think of :)

Assignee:: William Caban

Reporter:: Aaron Jaeger

QA Contact:: ge liu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/08/24 4:02 PM

Updated:: 2023/08/24 8:39 PM

Details

Description

Attachments

Activity

People

Dates