Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-46425

Too many pending CSRs lead to scaleup failures when scaling to 500 nodes

XMLWordPrintable

    • No
    • CLOUD Sprint 263, CLOUD Sprint 264
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the certificate signing request (CSR) approver included certificates from other systems within its calculations for whether it was overwhelmed and should stop approving certificates. In larger clusters, with other subsystems using CSRs, the CSR approver counted unrelated unapproved CSRs towards its total and prevented further approvals. With this release, the CSR approver only includes CSRs that it can approve, by using the `signerName` property as a filter. As a result, the CSR approver only prevents new approvals when there are a large number of unapproved CSRs for the relevant `signerName` values. (link:https://issues.redhat.com/browse/OCPBUGS-46425[*OCPBUGS-46425*])
      Show
      * Previously, the certificate signing request (CSR) approver included certificates from other systems within its calculations for whether it was overwhelmed and should stop approving certificates. In larger clusters, with other subsystems using CSRs, the CSR approver counted unrelated unapproved CSRs towards its total and prevented further approvals. With this release, the CSR approver only includes CSRs that it can approve, by using the `signerName` property as a filter. As a result, the CSR approver only prevents new approvals when there are a large number of unapproved CSRs for the relevant `signerName` values. (link: https://issues.redhat.com/browse/OCPBUGS-46425 [* OCPBUGS-46425 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-36404. The following is the description of the original issue:

      Description of problem:
      machine-approver logs

      E0221 20:29:52.377443       1 controller.go:182] csr-dm7zr: Pending CSRs: 1871; Max pending allowed: 604. Difference between pending CSRs and machines > 100. Ignoring all CSRs as too many recent pending CSRs seen

      .

      oc get csr |wc -l
      3818
      oc get csr |grep "node-bootstrapper" |wc -l
      2152

      By approving the pending CSR manually I can get the cluster to scaleup.

      We can increase the maxPending to a higher number https://github.com/openshift/cluster-machine-approver/blob/2d68698410d7e6239dafa6749cc454272508db19/pkg/controller/controller.go#L330 

       

              rmanak@redhat.com Radek Manak
              openshift-crt-jira-prow OpenShift Prow Bot
              Zhaohua Sun Zhaohua Sun
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: