Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57428

[release-4.17] high snapshot rate on redhat-operators, OLM operator install hangs. RPC DeadlineExceeded while listing bundles.

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • Rejected
    • Lillipup Sprint 272, Mewtwo Sprint 273, Nidoran Sprint 274
    • 3
    • In Progress
    • Bug Fix
    • Hide
      Before this update, the `catalog-operator` captured snapshots every five minutes, which caused CPU spikes when dealing with many namespaces, subscriptions, and large catalog sources. This increased load on the catalog source pods and prevented users from installing or upgrading operators. With this release, the catalog snapshot cache lifetime has been increased to 30 minutes allowing enough time for the catalog source to resolve attempts without causing an undue load and stabilizing the operator installation and upgrade process. (link:https://issues.redhat.com/browse/OCPBUGS-57428[OCPBUGS-57428])
      Show
      Before this update, the `catalog-operator` captured snapshots every five minutes, which caused CPU spikes when dealing with many namespaces, subscriptions, and large catalog sources. This increased load on the catalog source pods and prevented users from installing or upgrading operators. With this release, the catalog snapshot cache lifetime has been increased to 30 minutes allowing enough time for the catalog source to resolve attempts without causing an undue load and stabilizing the operator installation and upgrade process. (link: https://issues.redhat.com/browse/OCPBUGS-57428 [ OCPBUGS-57428 ])
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-57427. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-57352. The following is the description of the original issue:

      This is the release-4.19 clone to backport the interval change for refreshing the catalog cache data for catalog-operator from 5 minutes to 30 minutes. 

      --------------

       

      When trying to install an operator, the below is logged:

      "Warning alert:CatalogSource health unknown This operator cannot be updated. The health of CatalogSource "redhat-operators" is unknown. It may have been disabled or removed from the cluster.CatalogSource CSView CatalogSource
      

       

      1. The underlying error in logs is 
        msg="error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded" catalog="{redhat-operators openshift-marketplace}
      1. As discussed we could not reproduce this locally and have attempted multiple times to simulate the appropriate grpc connection and exact api call,  which succeeded for us.
      2. Therefore the suspected cause is a network issue on the customer’s cluster,  and we require full cooperation from a qualified cluster/network professional on the customer end who is aware of their exact config, and a detailed network dump/analysis what actually happened at the point in time when OLM got this timeout.
      3. We cannot proceed with investigation based on the current info we have.

              rh-ee-jkeister Jordan Keister
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Xia Zhao Xia Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: