Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Network - Core
Labels:
- SDN
- rfe-rejected-to-wontdo

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request
Implement LIST call chunking in openshift-sdn.

2. What is the nature and description of the request?
In a large cluster (256 nodes, 18k pods, 15k networkpolicies, 11k services, 10k endpoints, 5k netnamespaces/projects), sdn daemonset can DoS the kube-apiserver with un-paginated LIST calls on high count resources.
Inspecting the kube-apiserver audit logs from the cluster leading up to these crashes, I can see that the times when we lose the API we also see un-paginated LIST calls on the resources mentioned earlier (pods, networkpolicies, services, endpoints, netnamespaces, projects).
Times when the kube-apiserver is updated and sdn does not issue these list calls, the api is stable.

Any resources that can grow above 500 ought to implement list "chunking" via 'limit' query parameter to limit the memory burden on the APIserver, as well as increase overall performance of the calls SDN requires.

3. Why does the customer need this? (List the business requirements here)
This issue has recently been happening when kube-apiserver-operator renews a platform certificate and updates the kube-apiserver pods in a rolling fashion, some times (not every time) the control plane and console can experience an outage of 15 minutes to upwards of 1 hour before the API stabilizes.
Since we have little-to-no-visibility into these actions by kube-apiserver-operator, these events surprise us every time.
Increasing the memory of the master nodes is not always possible for this customer (or others), and there is no need for us to increase resources when this is an easy fix to reduce consumption required by the product (OCP) to run the product (OCP).
This would help this and any other customers running at large scale.

4. List any affected packages or components.
openshift-sdn

Assignee:: Marc Curry

Reporter:: Amit Kesarkar

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/01/26 6:31 PM

Updated:: 2025/09/14 12:46 AM

Resolved:: 2022/05/05 11:17 AM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates