Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30961

insights operator DEGRADED after a couple of hours of cluster creation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Undefined
    • None
    • 4.15.0
    • Insights Operator
    • No
    • CCXDEV Sprint 112
    • 1
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

          insights operator reports DEGRADED after a couple of hours of cluster creation or after pod restart 

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. create private 4.15.0 rosa cluster with multi-az
          2.
          3.
          

      Actual results:

          NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
          True       8h      Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {"errors":[{"status":401,"detail":"UHC services authentication failed","meta":{"response_by":"gateway"}}]}...

      pod logs:

      I0315 10:25:17.987497 1 httplog.go:132] "HTTP" verb="GET" URI="/metrics" latency="3.515878ms" userAgent="Prometheus/2.48.0" audit-ID="64357a43-1f66-4b2d-a602-0fe0cf73aeff" srcIP="10.128.3.22:43000" resp=200 I0315 10:25:18.700669 1 controller.go:220] Number of last upload failures 9 exceeded the threshold 5. Marking as degraded. I0315 10:25:18.700725 1 controller.go:428] The operator has some internal errors: Reporting was not allowed: your Red Hat account is not enabled for remote support or your token has expired: {"errors":[{"status":401,"detail":"UHC services authentication failed","meta":{"response_by":"gateway"}}]}

      it seems the pod is failing to upload immediately but it doesn't report DEGRADED operator until some time because of failure threshold.

      I0315 10:37:04.518142 1 controller.go:216] Number of last upload failures 1 lower than threshold 5. Not marking as degraded. I0315 10:37:04.518182 1 controller.go:444] The operator is healthy

       

       

      Expected results:

          insight-operator stays AVAILABLE

      Additional info:

          ocm staging cluster-id: 2a0d3io5jag9pprqrrf1i0j8nn8e37ke   
          org_id: "1HAXGgCYqHpednsRDiwWsZBmDlA"

       

      Attachments

        Activity

          People

            tremes1@redhat.com Tomas Remes
            rh-ee-mraee Mulham Raee
            Joao Bastos Fula Joao Bastos Fula
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: