Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-48468

OLMv0: excessive catalog source snapshots cause severe performance regression

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.15.z, 4.17.z, 4.16.z, 4.18.0
    • OLM
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • Yes
    • Rejected
    • Eevee OLM Sprint 265
    • 1
    • Customer Escalated, Customer Facing, Customer Reported
    • Done
    • Bug Fix
    • Hide
      * Previously, {olmv0} took a snapshot of the catalog source for every installed Operator when it reconciled a subscription. This behavior resulted in high CPU usage. With this update, {olmv0} caches catalog sources and limits calls to the gRPC Remote Procedure Calls (gRPC) server to resolve the issue. link:https://issues.redhat.com/browse/OCPBUGS-48468[OCPBUGS-48468]
      Show
      * Previously, {olmv0} took a snapshot of the catalog source for every installed Operator when it reconciled a subscription. This behavior resulted in high CPU usage. With this update, {olmv0} caches catalog sources and limits calls to the gRPC Remote Procedure Calls (gRPC) server to resolve the issue. link: https://issues.redhat.com/browse/OCPBUGS-48468 [ OCPBUGS-48468 ]

      Description of problem:

      OLM's catalog operator makes excessive and unnecessary GRPC API requests to catalog source pods every time a subscription is reconciled. The observable outcome is that catalog pods have very high CPU usage, and depending on the number of subscriptions and the state of the system, the high CPU usage may be sustained permanently.    

      Version-Release number of selected component (if applicable):

      4.15

      How reproducible:

      100%    

      Steps to Reproduce:

          1. Get credentials for a 4.15 or higher cluster
          2. Install an operator via a subscription against any catalog source
          3. (On 4.17 or higher only) Check the logs of the catalog pod that serves the catalog, and see that it is receiving many ListBundles and GetPackage requests in a short period of time, especially when a subscription is performing an installation or upgrade.
          

      Actual results:

      ~20 ListBundles calls performed while installing a single operator via a subscription

      Expected results:

       1 ListBundles call performed while installing a single operator via a subscription (other subscription reconciles should use a cached result until the cache is invalidated)

      Additional info:

          

              jlanford@redhat.com Joe Lanford
              jlanford@redhat.com Joe Lanford
              None
              None
              Kui Wang Kui Wang
              None
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: