-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.10.z
-
None
-
No
-
Rejected
-
False
-
-
-
Description of problem:
During parallel deployment of 38 SNOs with the same ACM policies, one of the SNOs had an issue with its operator subscriptions. All of the subscriptions were failing with the following (attempting to use a CatalogSource that was not configured for the Subscriptions) Spec: Channel: stable Install Plan Approval: Manual Name: cluster-logging Source: redhat-operators-disconnected Source Namespace: openshift-marketplace ... Conditions: Last Transition Time: 2023-04-19T14:45:39Z Message: all available catalogsources are healthy Reason: AllCatalogSourcesHealthy Status: False Type: CatalogSourcesUnhealthy Message: error using catalog certified-operators (in namespace openshift-marketplace): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup certified-operators.openshift-marketplace.svc on [fd02::a]:53: no such host" Reason: ErrorPreventedResolution Status: True Type: ResolutionFailed Last Updated: 2023-04-19T14:45:41Z Looking at the catalog sources available for the cluster, we have the following $ oc get catsrc -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE redhat-operators-disconnected disconnected-redhat-operator-index grpc Red Hat 2h2m We can see from the catalog-operator pod the default catalog sources being removed 2023-04-19T14:44:16.400052144Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{redhat-operators openshift-marketplace}" 2023-04-19T14:44:16.406367711Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{certified-operators openshift-marketplace}" 2023-04-19T14:44:16.420116787Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{community-operators openshift-marketplace}" 2023-04-19T14:44:16.436125844Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{redhat-marketplace openshift-marketplace}" However, we can still see the operator trying to connect to the default catalog source after the removal 2023-04-19T19:36:02.803084079Z time="2023-04-19T19:36:02Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=CONNECTING" 2023-04-19T19:36:02.819476969Z time="2023-04-19T19:36:02Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=TRANSIENT_FAILURE" 2023-04-19T19:37:56.318458609Z time="2023-04-19T19:37:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=CONNECTING" 2023-04-19T19:37:56.333874431Z time="2023-04-19T19:37:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=TRANSIENT_FAILURE"
Version-Release number of selected component (if applicable):
4.10.52
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
The current workaroud for it is to manually restart the catalog-operator pod under the openshift-operator-lifecycle-manager namespace. After restarting it, all operator subscriptions are able to proceed the install
- duplicates
-
OCPBUGS-8659 The Catalog Operator attempts to connect to deleted catalogSources
- Closed