Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-5080

Stale CSV Entries in the Resolver Cache for Operator Lifecycle Manager

    XMLWordPrintable

Details

    • Moderate
    • Grumpy 241
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      It appears it may be possible to have invalid CSV entries in the resolver cache, resulting in the inability to reinstall an Operator.
      
      The situation:
      --------------
      A customer has removed the CSV, InstallPlan and Subscription for the GitOps Operator from the cluster but upon attempting to reinstall the Operator, the OLM was providing a conflict with existing CSV.
      
      This CSV was not in the ETCD instance and was removed previously. Upon deleting the `operator-catalog` and `operator-lifecycle-manager` Pods, the collision was resolved and the Operator was able to installed again.
      ~~~
      'Warning' reason: 'ResolutionFailed' constraints not satisfiable: subscription openshift-gitops-operator exists, subscription openshift-gitops-operator requires redhat-operators/openshift-marketplace/stable/openshift-gitops-operator.v1.5.8, redhat-operators/openshift-marketplace/stable/openshift-gitops-operator.v1.5.8 and @existing/openshift-operators//openshift-gitops-operator.v1.5.6-0.1664915551.p originate from package openshift-gitops-operator, clusterserviceversion openshift-gitops-operator.v1.5.6-0.1664915551.p exists and is not referenced by a subscription
      ~~~
      
      

      Version-Release number of selected component (if applicable):

      4.9.31
      

      How reproducible:

      Very intermittent, however once the issue has occurred it was impossible to avoid without deleting the Pods.
      

      Steps to Reproduce:

      1. Add Operator with manual approval InstallPlan
      2. Remove Operator (Subscription, CSV, InstallPlan)
      3. Attempt to reinstall Operator 
      
      

      Actual results:

      Very intermittent failure
      

      Expected results:

      Operators do not have conflicts with CSVs which have already been removed.
      

      Additional info:

      Briefly reviewing the OLM code, it appears an internal resolver cache is populated and used for checking constraints when an operator is installed. If there are stale entries in the cache, this would result in the described issue.
      The cache appears to have been rearchitected (moved to a dedicated object) since OCP 4.9.31. Due to the nature of this issue, the request does not have clear reproduction steps to reproduce so if the issue is unable to reproduced, I would like instructions on how to dump the contents of the cache if the issue is to arise again.
      

      Attachments

        Issue Links

          Activity

            People

              rh-ee-dfranz Daniel Franz
              rhn-support-mwasher Michael Washer
              Kui Wang Kui Wang
              Votes:
              6 Vote for this issue
              Watchers:
              32 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: