-
Bug
-
Resolution: Done
-
Major
-
None
-
4.8.z
-
Critical
-
No
-
OPECO 237
-
1
-
Rejected
-
False
-
-
Customer Escalated
Description of problem:
After cluster upgrade CU is not able to install any redhat-operator and community operator. Operator gets always stuck in unknown state. Only subscription is getting created, but no IP, CSV job. Catalog operator pod throwing below error: ~~~ 2023-05-28T13:04:22.954397188Z time="2023-05-28T13:04:22Z" level=warning msg="error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded" catalog="{community-operators openshift-marketplace}" 2023-05-28T13:04:22.954397188Z time="2023-05-28T13:04:22Z" level=warning msg="error encountered while listing bundles: rpc error: code = DeadlineExceeded desc = context deadline exceeded" catalog="{redhat-operators openshift-marketplace}" ~~~ We see this mostly on the OCS operator, as it can't be upgraded. The error we see in the catalog logs is: ~~~ 2023-05-28T13:04:22.954876501Z time="2023-05-28T13:04:22Z" level=debug msg="resolution failed" error="constraints not satisfiable: no operators found from catalog redhat-operators in namespace openshift-marketplace referenced by subscription ocs-operator, subscription ocs-operator exists" id=wE2th namespace=openshift-storage ~~~ Steps done: - I have deleted the existing CSVs under openshift-storage namespace - I have deleted existing sub under openshift-storage namespace - I have re-installed OCS v4.8 from OCP console and I don't see any behaviour changes. I observed only Sub for OCS v4.8 created and no CSV and no IP created. ~~~ # # oc get csv,ip,sub NAME PACKAGE SOURCE CHANNEL subscription.operators.coreos.com/ocs-operator ocs-operator redhat-operators stable-4.8 ~~~ The observation is that the catalog operator is not able to list the packages from the channel. Steps taken to debug the issue: - Ran below command from catalog operator POD in openshift-operator-lifecycle-manager project and could see it is taking some time(~2 Minutes) to list out packages. But the grpcurl finishes with success. # grpcurl -plaintext <redhat-operators_SVC_IP>:50051 api.Registry/ListPackages - Catalog-operator POD and redhat-operators are hosted on two different nodes and to check whether there is any network latency between two nodes and causing the delay in listing packages, I ran the above command from redhat-operators POD itself after replacing service IP with localhost and could see same amount of delay. Moreover community-operators POD is running on same node as redhat-operators and above grpcurl command is returning instant result while using community-operators service IP in place of redhat-operators service IP. So, network latency can be ruled out. - Deleting the pods in the OLM project didn't solve the issue. The issue still persists after new pods were started. - Increased the debugging, but it doesn't show any additional data.
Version-Release number of selected component (if applicable):
- OCP Cluster version is 4.8.51
How reproducible:
#N/A
Actual results:
- OCS operator can't be installed - one of our application run time (Event-streams pods) has already been impacted mostly due to the ongoing/prevailing upgrade issues.
Expected results:
Additional info:
- customer is trying to upgrade to the next EUS version