Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36404

Too many pending CSRs lead to scaleup failures when scaling to 500 nodes

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • CLOUD Sprint 263, CLOUD Sprint 264, CLOUD Sprint 262
    • 3
    • Done
    • Bug Fix
    • Hide
      * Previously, the certificate signing request (CSR) approver included certificates from other systems within its calculations for whether it was overwhelmed and should stop approving certificates. In larger clusters, with other subsystems using CSRs, the CSR approver counted unrelated unapproved CSRs towards its total and prevented further approvals. With this release, the CSR approver only includes CSRs that it can approve, by using the `signerName` property as a filter. As a result, the CSR approver only prevents new approvals when there are a large number of unapproved CSRs for the relevant `signerName` values. (link:https://issues.redhat.com/browse/OCPBUGS-36404[OCPBUGS-36404])
      Show
      * Previously, the certificate signing request (CSR) approver included certificates from other systems within its calculations for whether it was overwhelmed and should stop approving certificates. In larger clusters, with other subsystems using CSRs, the CSR approver counted unrelated unapproved CSRs towards its total and prevented further approvals. With this release, the CSR approver only includes CSRs that it can approve, by using the `signerName` property as a filter. As a result, the CSR approver only prevents new approvals when there are a large number of unapproved CSRs for the relevant `signerName` values. (link: https://issues.redhat.com/browse/OCPBUGS-36404 [ OCPBUGS-36404 ])
    • None
    • None
    • None
    • None

      Description of problem:
      machine-approver logs

      E0221 20:29:52.377443       1 controller.go:182] csr-dm7zr: Pending CSRs: 1871; Max pending allowed: 604. Difference between pending CSRs and machines > 100. Ignoring all CSRs as too many recent pending CSRs seen

      .

      oc get csr |wc -l
      3818
      oc get csr |grep "node-bootstrapper" |wc -l
      2152

      By approving the pending CSR manually I can get the cluster to scaleup.

      We can increase the maxPending to a higher number https://github.com/openshift/cluster-machine-approver/blob/2d68698410d7e6239dafa6749cc454272508db19/pkg/controller/controller.go#L330 

       

              rmanak@redhat.com Radek Manak
              mohit-sheth Mohit Jitendra Sheth
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              1 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: