Uploaded image for project: 'OpenShift GitOps'
  1. OpenShift GitOps
  2. GITOPS-5807

subscription health check reports erroneously reports degraded status

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Operator
    • False
    • None
    • False

      Description of Problem

      • when a subscription fails and then recovers, the error condition is not always overwritten. See for example this:

       

        conditions:
          - lastTransitionTime: '2024-10-30T18:21:24Z'
            message: all available catalogsources are healthy
            reason: AllCatalogSourcesHealthy
            status: 'False'
            type: CatalogSourcesUnhealthy
          - message: >-
              error using catalogsource
              openshift-marketplace/isf-data-foundation-catalog: failed to list
              bundles: rpc error: code = Unavailable desc = connection error: desc =
              "transport: Error while dialing: dial tcp 172.30.218.45:50051: connect:
              connection refused"
            reason: ErrorPreventedResolution
            status: 'True'
            type: ResolutionFailed 

       

      in these situations the openshift gitops healthchecks report the subscription to be degraded even thought the subscription is fine.

       

      Additional Info

      • <Any additional info such as logs, must-gather outputs, etc.>

      Problem Reproduction

      • in my case one of the catalog source was broken for a while. I think this can be manually induced. I don't know if there are any other ways to create the issue.

      Reproducibility

      • Always

      Prerequisites/Environment

      •  

      Steps to Reproduce

      • ...

      Expected Results

      • ...

      Actual Results

      • ...

      Problem Analysis

      • <Completed by engineering team as part of the triage/refinement process>

      Root Cause

      • <What is the root cause of the problem? Or, why is it not a bug?>

      Workaround (If Possible)

      • <Are there any workarounds we can provide to the customers?>

      Fix Approaches

      • <If we decide to fix this bug, how will we do it?>

      Acceptance Criteria

      • ...

      Definition of Done

      • Code Complete:
        • All code has been written, reviewed, and approved.
      • Tested:
        • Unit tests have been written and passed.
        • Ensure code coverage is not reduced with the changes.
        • Integration tests have been automated.
        • System tests have been conducted, and all critical bugs have been fixed.
        • Tested and merged on OpenShift either upstream or downstream on a local build.
      • Documentation:
        • User documentation or release notes have been written (if applicable).
      • Build:
        • Code has been successfully built and integrated into the main repository / project.
        • Midstream changes (if applicable) are done, reviewed, approved and merged.
      • Review:
        • Code has been peer-reviewed and meets coding standards.
        • All acceptance criteria defined in the user story have been met.
        • Tested by reviewer on OpenShift.
      • Deployment:
        • The feature has been deployed on OpenShift cluster for testing.

              Unassigned Unassigned
              rhn-gps-rspazzol Raffaele Spazzoli
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: