Uploaded image for project: 'OpenShift API Server'
  1. OpenShift API Server
  2. API-1647 apiserver: scalability
  3. API-1500

ClusterResourceQuota: quota evaluation timeout

XMLWordPrintable

    • Icon: Sub-task Sub-task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • Impediment

      We worked on a customer escalation https://issues.redhat.com/browse/OCPBUGS-2514

      Root Cause:

      We saw the following error, API was timing out

      Internal Error occurred: resource quota evaluation timed out.

       

      The cluster had many ClusterResourceQuota objects, and some of them selected >100 projects 

       

      The official OpenShift doc mentions this:

      Selecting more than 100 projects under a single multi-project quota may have detrimental effects on API server responsiveness in those projects

      https://docs.openshift.com/container-platform/3.11/admin_guide/multiproject_quota.html

      The evaluation timeout is happening here:

       

      Action Items:

      • Monitor the evaluation latency (we should utilize the existing admission plugin metrics) and raise an alert (maybe at Warning label) to warn the customer beforehand
      • The number of projects selected by a ClusterResourceQuota is tracked inside the Status of the object, so we can collect data on how many clusters may be affected by this issue via telemetry/insights
      • There might be a few areas in the code that we could optimize, looks like the evaluation is done asynchronously by a pool of goroutines.

       

              vdinh@redhat.com Vu Dinh
              akashem@redhat.com Abu H Kashem
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: