Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29116

ResolutionFailed doesn't clear after recovery

XMLWordPrintable

    • Important
    • No
    • Orion
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, conditions on a `Subscription` custom resource (CR) reflecting `ResolutionFailed` errors were not cleaned up by Operator Lifecycle Manager (OLM) when the reasons behind the failure had been resolved. As a result, clients that depended on the CR's conditions as a source of truth would break. With this fix, the status of a `Subscription` CR is now updated correctly to reflect the resolution of a previous error, and the error message is cleaned up as expected. (link:https://issues.redhat.com/browse/OCPBUGS-29116[*OCPBUGS-29116*])
      Show
      * Previously, conditions on a `Subscription` custom resource (CR) reflecting `ResolutionFailed` errors were not cleaned up by Operator Lifecycle Manager (OLM) when the reasons behind the failure had been resolved. As a result, clients that depended on the CR's conditions as a source of truth would break. With this fix, the status of a `Subscription` CR is now updated correctly to reflect the resolution of a previous error, and the error message is cleaned up as expected. (link: https://issues.redhat.com/browse/OCPBUGS-29116 [* OCPBUGS-29116 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-24587. The following is the description of the original issue:

      Description of problem:

      Installation some operators. After some time the ResolutionFailed showing up:

       

      $ kubectl get subscription.operators -A -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,ResolutionFailed:.status.conditions[?(@.type=="ResolutionFailed")].status,MSG:.status.conditions[?(@.type=="ResolutionFailed")].message'
      NAMESPACE                   NAME                                                                         ResolutionFailed   MSG
      infra-sso                   rhbk-operator                                                                True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
      metallb-system              metallb-operator-sub                                                         True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
      multicluster-engine         multicluster-engine                                                          True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
      open-cluster-management     acm-operator-subscription                                                    True               [failed to populate resolver cache from source redhat-marketplace/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.67.215:50051: connect: connection refused", failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused"]
      openshift-cnv               kubevirt-hyperconverged                                                      True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
      openshift-gitops-operator   openshift-gitops-operator                                                    True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
      openshift-local-storage     local-storage-operator                                                       True               [failed to populate resolver cache from source community-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.14.92:50051: connect: connection refused", failed to populate resolver cache from source certified-operators/openshift-marketplace: failed to list bundles: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 172.30.202.255:50051: connect: connection refused"]
      openshift-nmstate           kubernetes-nmstate-operator                                                  <none>             <none>
      openshift-operators         devworkspace-operator-fast-redhat-operators-openshift-marketplace            <none>             <none>
      openshift-operators         external-secrets-operator                                                    <none>             <none>
      openshift-operators         web-terminal                                                                 <none>             <none>
      openshift-storage           lvms                                                                         <none>             <none>
      openshift-storage           mcg-operator-stable-4.14-redhat-operators-openshift-marketplace              <none>             <none>
      openshift-storage           ocs-operator-stable-4.14-redhat-operators-openshift-marketplace              <none>             <none>
      openshift-storage           odf-csi-addons-operator-stable-4.14-redhat-operators-openshift-marketplace   <none>             <none>
      openshift-storage           odf-operator                                                                 <none>             <none> 

       

      At the package server logs you can see one time the catalog source is not available, after a while the catalog source is available but the error doesn't disappear from the subscription.

      Package server logs: 

      time="2023-12-05T14:27:09Z" level=warning msg="error getting bundle stream" action="refresh cache" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp 172.30.37.69:50051: connect: connection refused\"" source="{redhat-operators openshift-marketplace}"
      time="2023-12-05T14:27:09Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
      time="2023-12-05T14:28:26Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-marketplace openshift-marketplace}" action="sync catalogsource" address="redhat-marketplace.openshift-marketplace.svc:50051" name=redhat-marketplace namespace=openshift-marketplace
      time="2023-12-05T14:30:23Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
      time="2023-12-05T14:35:56Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
      time="2023-12-05T14:37:28Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
      time="2023-12-05T14:37:28Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-operators openshift-marketplace}" action="sync catalogsource" address="redhat-operators.openshift-marketplace.svc:50051" name=redhat-operators namespace=openshift-marketplace
      time="2023-12-05T14:39:40Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-marketplace openshift-marketplace}" action="sync catalogsource" address="redhat-marketplace.openshift-marketplace.svc:50051" name=redhat-marketplace namespace=openshift-marketplace
      time="2023-12-05T14:46:07Z" level=info msg="updating PackageManifest based on CatalogSource changes: {certified-operators openshift-marketplace}" action="sync catalogsource" address="certified-operators.openshift-marketplace.svc:50051" name=certified-operators namespace=openshift-marketplace
      time="2023-12-05T14:47:37Z" level=info msg="updating PackageManifest based on CatalogSource changes: {redhat-operators openshift-marketplace}" action="sync catalogsource" address="redhat-operators.openshift-marketplace.svc:50051" name=redhat-operators namespace=openshift-marketplace
      time="2023-12-05T14:48:21Z" level=info msg="updating PackageManifest based on CatalogSource changes: {community-operators openshift-marketplace}" action="sync catalogsource" address="community-operators.openshift-marketplace.svc:50051" name=community-operators namespace=openshift-marketplace
      time="2023-12-05T14:49:53Z" level=info msg="updating  

       

      Version-Release number of selected component (if applicable):

      4.14.3    

      How reproducible:

       

      Steps to Reproduce:

          1. Install an operator for example metallb
          2. Wait until the catalog pod is not available for on time.
          3. ResolutionFailed doesn't disappear anymore     

      Actual results:

      ResolutionFailed doesn't disappear anymore from subscription.

      Expected results:

      ResolutionFailed disappear from subscription.

       

              anik120 Anik Bhattacharjee
              openshift-crt-jira-prow OpenShift Prow Bot
              Kui Wang Kui Wang
              Alex Dellapenta Alex Dellapenta
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: