Description of problem:
It appears it may be possible to have invalid CSV entries in the resolver cache, resulting in the inability to reinstall an Operator. The situation: -------------- A customer has removed the CSV, InstallPlan and Subscription for the GitOps Operator from the cluster but upon attempting to reinstall the Operator, the OLM was providing a conflict with existing CSV. This CSV was not in the ETCD instance and was removed previously. Upon deleting the `operator-catalog` and `operator-lifecycle-manager` Pods, the collision was resolved and the Operator was able to installed again. ~~~ 'Warning' reason: 'ResolutionFailed' constraints not satisfiable: subscription openshift-gitops-operator exists, subscription openshift-gitops-operator requires redhat-operators/openshift-marketplace/stable/openshift-gitops-operator.v1.5.8, redhat-operators/openshift-marketplace/stable/openshift-gitops-operator.v1.5.8 and @existing/openshift-operators//openshift-gitops-operator.v1.5.6-0.1664915551.p originate from package openshift-gitops-operator, clusterserviceversion openshift-gitops-operator.v1.5.6-0.1664915551.p exists and is not referenced by a subscription ~~~
Version-Release number of selected component (if applicable):
4.9.31
How reproducible:
Very intermittent, however once the issue has occurred it was impossible to avoid without deleting the Pods.
Steps to Reproduce:
1. Add Operator with manual approval InstallPlan 2. Remove Operator (Subscription, CSV, InstallPlan) 3. Attempt to reinstall Operator
Actual results:
Very intermittent failure
Expected results:
Operators do not have conflicts with CSVs which have already been removed.
Additional info:
Briefly reviewing the OLM code, it appears an internal resolver cache is populated and used for checking constraints when an operator is installed. If there are stale entries in the cache, this would result in the described issue. The cache appears to have been rearchitected (moved to a dedicated object) since OCP 4.9.31. Due to the nature of this issue, the request does not have clear reproduction steps to reproduce so if the issue is unable to reproduced, I would like instructions on how to dump the contents of the cache if the issue is to arise again.
- depends on
-
OCPBUGS-17804 OLM v0 catalog-operator unnecessarily watches copiedCSVs
- Closed
- is cloned by
-
OCPBUGS-18305 Stale CSV Entries in the Resolver Cache for Operator Lifecycle Manager
- Closed
- is depended on by
-
OCPBUGS-18305 Stale CSV Entries in the Resolver Cache for Operator Lifecycle Manager
- Closed
- is duplicated by
-
OCPBUGS-9945 Reinstalling an Operator sometimes fails, referencing nonexistent CSV
- Closed
- relates to
-
RHDEVDOCS-4986 Unable to Upgrade gitOps to latest and Install Plan mismatching of NAME & CSV
- Closed
-
ACM-2850 Upgrading from ACM 2.6 to 2.7 failed
- Closed
- links to
-
RHEA-2023:5006 rpm