Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13286

Catalog-operator pod unable to verify the removal of the default catalog source

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • 4.10.z
    • OLM
    • None
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      During parallel deployment of 38 SNOs with the same ACM policies, one of the SNOs had an issue with its operator subscriptions. All of the subscriptions were failing with the following (attempting to use a CatalogSource that was not configured for the Subscriptions)
      
      Spec:
        Channel:                stable
        Install Plan Approval:  Manual
        Name:                   cluster-logging
        Source:                 redhat-operators-disconnected
        Source Namespace:       openshift-marketplace
      ...
        Conditions:
          Last Transition Time:  2023-04-19T14:45:39Z
          Message:               all available catalogsources are healthy
          Reason:                AllCatalogSourcesHealthy
          Status:                False
          Type:                  CatalogSourcesUnhealthy
          Message:               error using catalog certified-operators (in namespace openshift-marketplace): failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup certified-operators.openshift-marketplace.svc on [fd02::a]:53: no such host"
          Reason:                ErrorPreventedResolution
          Status:                True
          Type:                  ResolutionFailed
        Last Updated:            2023-04-19T14:45:41Z
      
      
      Looking at the catalog sources available for the cluster, we have the following
      
      $ oc get catsrc -n openshift-marketplace
      NAME                            DISPLAY                              TYPE   PUBLISHER   AGE
      redhat-operators-disconnected   disconnected-redhat-operator-index   grpc   Red Hat     2h2m
      
      
      We can see from the catalog-operator pod the default catalog sources being removed
      
      2023-04-19T14:44:16.400052144Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{redhat-operators openshift-marketplace}"
      2023-04-19T14:44:16.406367711Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{certified-operators openshift-marketplace}"
      2023-04-19T14:44:16.420116787Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{community-operators openshift-marketplace}"
      2023-04-19T14:44:16.436125844Z time="2023-04-19T14:44:16Z" level=info msg="removed client for deleted catalogsource" source="{redhat-marketplace openshift-marketplace}"
      
      However, we can still see the operator trying to connect to the default catalog source after the removal
      
      2023-04-19T19:36:02.803084079Z time="2023-04-19T19:36:02Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=CONNECTING"
      2023-04-19T19:36:02.819476969Z time="2023-04-19T19:36:02Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=TRANSIENT_FAILURE"
      2023-04-19T19:37:56.318458609Z time="2023-04-19T19:37:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=CONNECTING"
      2023-04-19T19:37:56.333874431Z time="2023-04-19T19:37:56Z" level=info msg="state.Key.Namespace=openshift-marketplace state.Key.Name=certified-operators state.State=TRANSIENT_FAILURE"
      
      

      Version-Release number of selected component (if applicable):

      4.10.52

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      The current workaroud for it is to manually restart the catalog-operator pod under the openshift-operator-lifecycle-manager namespace. After restarting it, all operator subscriptions are able to proceed the install

              agreene1991 Alexander Greene (Inactive)
              rh-ee-vkuss Vitor Kuss
              Xia Zhao Xia Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: