Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14, 4.15, 4.16
Component/s: Cloud Compute / Machine CSR Approver
Labels:

Regression:
No
Sprint:
CLOUD Sprint 263, CLOUD Sprint 264, CLOUD Sprint 262
sprint_count:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
Cause: The CSR approver was including certificates from other systems within its own calculations for whether or not it was overwhelmed and should stop approving certificates
Consequence: In larger clusters, with other subsystems using CSRs, the CSR approver would determine that there were many unapproved CSRs, and prevent further approvals
Fix: The CSR approver now only includes CSRs that it can approve, using the signerName property as a filter
Result: The CSR approver will only prevent new approvals when there are a large number of CSRs, for the signerName values that it observes, that it has not been able to approve

Show
Cause: The CSR approver was including certificates from other systems within its own calculations for whether or not it was overwhelmed and should stop approving certificates Consequence: In larger clusters, with other subsystems using CSRs, the CSR approver would determine that there were many unapproved CSRs, and prevent further approvals Fix: The CSR approver now only includes CSRs that it can approve, using the signerName property as a filter Result: The CSR approver will only prevent new approvals when there are a large number of CSRs, for the signerName values that it observes, that it has not been able to approve
Release Note Type:
Bug Fix
Release Note Status:
In Progress
Target Version:

4.19.0
Target Backport Versions:

4.17.z, 4.16.z, 4.18.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:
machine-approver logs

E0221 20:29:52.377443       1 controller.go:182] csr-dm7zr: Pending CSRs: 1871; Max pending allowed: 604. Difference between pending CSRs and machines > 100. Ignoring all CSRs as too many recent pending CSRs seen

oc get csr |wc -l
3818
oc get csr |grep "node-bootstrapper" |wc -l
2152

By approving the pending CSR manually I can get the cluster to scaleup.

We can increase the maxPending to a higher number https://github.com/openshift/cluster-machine-approver/blob/2d68698410d7e6239dafa6749cc454272508db19/pkg/controller/controller.go#L330

blocks

OCPBUGS-46425 Too many pending CSRs lead to scaleup failures when scaling to 500 nodes

Verified

OCPBUGS-41551 Nodes to Node and subsequently pod to pod communication are repeatedly degrading despite multiple OVN DB rebuilds to fix the issue

Closed

is cloned by

OCPBUGS-46425 Too many pending CSRs lead to scaleup failures when scaling to 500 nodes

Verified

links to

openshift/cluster-machine-approver#243: OCPBUGS-36404: Filter CSRs by signerName

RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update

Assignee:: Radek Manak

Reporter:: Mohit Jitendra Sheth

QA Contact:: Zhaohua Sun

Votes:: 1 Vote for this issue

Watchers:: 16 Start watching this issue

Created:: 2024/07/01 8:37 PM

Updated:: 2025/02/10 9:42 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates