Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36404

Too many pending CSRs lead to scaleup failures when scaling to 500 nodes

XMLWordPrintable

    • No
    • CLOUD Sprint 263, CLOUD Sprint 264, CLOUD Sprint 262
    • 3
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: The CSR approver was including certificates from other systems within its own calculations for whether or not it was overwhelmed and should stop approving certificates
      Consequence: In larger clusters, with other subsystems using CSRs, the CSR approver would determine that there were many unapproved CSRs, and prevent further approvals
      Fix: The CSR approver now only includes CSRs that it can approve, using the signerName property as a filter
      Result: The CSR approver will only prevent new approvals when there are a large number of CSRs, for the signerName values that it observes, that it has not been able to approve
      Show
      Cause: The CSR approver was including certificates from other systems within its own calculations for whether or not it was overwhelmed and should stop approving certificates Consequence: In larger clusters, with other subsystems using CSRs, the CSR approver would determine that there were many unapproved CSRs, and prevent further approvals Fix: The CSR approver now only includes CSRs that it can approve, using the signerName property as a filter Result: The CSR approver will only prevent new approvals when there are a large number of CSRs, for the signerName values that it observes, that it has not been able to approve
    • Bug Fix
    • In Progress

      Description of problem:
      machine-approver logs

      E0221 20:29:52.377443       1 controller.go:182] csr-dm7zr: Pending CSRs: 1871; Max pending allowed: 604. Difference between pending CSRs and machines > 100. Ignoring all CSRs as too many recent pending CSRs seen

      .

      oc get csr |wc -l
      3818
      oc get csr |grep "node-bootstrapper" |wc -l
      2152

      By approving the pending CSR manually I can get the cluster to scaleup.

      We can increase the maxPending to a higher number https://github.com/openshift/cluster-machine-approver/blob/2d68698410d7e6239dafa6749cc454272508db19/pkg/controller/controller.go#L330 

       

              rmanak@redhat.com Radek Manak
              mohit-sheth Mohit Jitendra Sheth
              Zhaohua Sun Zhaohua Sun
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated: