Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57427

[release-4.18] high snapshot rate on redhat-operators, OLM operator install hangs. RPC DeadlineExceeded while listing bundles.

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • Done
    • Bug Fix
    • Hide
      * Before this update, the catalog-operator captured catalog snapshots with a frequency of five minutes. When using many namespaces and subscriptions, and with larger catalog sources available in 4.15, 4.16, the snapshots were failing, but cascaded across the catalog sources, which caused CPU loads to spike. This additional load caused an inability to upgrade and install operators. With this release, the cache lifetime is 30 minutes, which provides sufficient time for attempts to be resolved without undue load on the catalog source pods. (link:https://issues.redhat.com/browse/OCPBUGS-57427[OCPBUGS-57427])
      Show
      * Before this update, the catalog-operator captured catalog snapshots with a frequency of five minutes. When using many namespaces and subscriptions, and with larger catalog sources available in 4.15, 4.16, the snapshots were failing, but cascaded across the catalog sources, which caused CPU loads to spike. This additional load caused an inability to upgrade and install operators. With this release, the cache lifetime is 30 minutes, which provides sufficient time for attempts to be resolved without undue load on the catalog source pods. (link: https://issues.redhat.com/browse/OCPBUGS-57427 [ OCPBUGS-57427 ])
    • None

      This is a clone of issue OCPBUGS-57352. The following is the description of the original issue:

      This is the release-4.19 clone to backport the interval change for refreshing the catalog cache data for catalog-operator from 5 minutes to 30 minutes. 

      --------------

       

      When trying to install an operator, the below is logged:

      "Warning alert:CatalogSource health unknown This operator cannot be updated. The health of CatalogSource "redhat-operators" is unknown. It may have been disabled or removed from the cluster.CatalogSource CSView CatalogSource
      

       

      1. The underlying error in logs is 
        msg="error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded" catalog="{redhat-operators openshift-marketplace}
      1. As discussed we could not reproduce this locally and have attempted multiple times to simulate the appropriate grpc connection and exact api call,  which succeeded for us.
      2. Therefore the suspected cause is a network issue on the customer’s cluster,  and we require full cooperation from a qualified cluster/network professional on the customer end who is aware of their exact config, and a detailed network dump/analysis what actually happened at the point in time when OLM got this timeout.
      3. We cannot proceed with investigation based on the current info we have.

              rh-ee-jkeister Jordan Keister
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Xia Zhao Xia Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: