-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.10.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
During parallel deployment of 38 SNOs with the same ACM policies, one of the SNOs had an issue with its operator subscriptions. All of the subscriptions were failing with the following (attempting to use a CatalogSource that was not configured for the Subscriptions)
Spec:
Channel: stable
Install Plan Approval: Manual
Name: cluster-logging
Source: redhat-operators-disconnected
Source Namespace: openshift-marketplace
...
Conditions:
Last Transition Time: 2023-04-19T14:45:39Z
Message: all available catalogsources are healthy
Reason: AllCatalogSourcesHealthy
Status: False
Type: CatalogSourcesUnhealthy
Message: error using catalog certified-operators (in namespace openshift-marketplace): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup certified-operators.openshift-marketplace.svc on [fd02::a]:53: no such host"
Reason: ErrorPreventedResolution
Status: True
Type: ResolutionFailed
Last Updated: 2023-04-19T14:45:41Z
Looking at the catalog sources available for the cluster, we have the following
$ oc get catsrc -n openshift-marketplace
NAME DISPLAY TYPE PUBLISHER AGE
redhat-operators-disconnected disconnected-redhat-operator-index grpc Red Hat 2h2m
We can see from the catalog-operator pod the default catalog sources being removed
2023-04-19T14:44:16.400052144Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{redhat-operators openshift-marketplace}"
2023-04-19T14:44:16.406367711Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{certified-operators openshift-marketplace}"
2023-04-19T14:44:16.420116787Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{community-operators openshift-marketplace}"
2023-04-19T14:44:16.436125844Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{redhat-marketplace openshift-marketplace}"
However, we can still see the operator trying to connect to the default catalog source after the removal
2023-04-19T19:36:02.803084079Z time="2023-04-19T19:36:02Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=CONNECTING"
2023-04-19T19:36:02.819476969Z time="2023-04-19T19:36:02Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=TRANSIENT_FAILURE"
2023-04-19T19:37:56.318458609Z time="2023-04-19T19:37:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=CONNECTING"
2023-04-19T19:37:56.333874431Z time="2023-04-19T19:37:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=TRANSIENT_FAILURE"
Version-Release number of selected component (if applicable):
4.10.52
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
The current workaroud for it is to manually restart the catalog-operator pod under the openshift-operator-lifecycle-manager namespace. After restarting it, all operator subscriptions are able to proceed the install
- duplicates
-
OCPBUGS-8659 The Catalog Operator attempts to connect to deleted catalogSources
-
- Closed
-