Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-21478

Enhance UHC-Proxy Error reporting and monitoring

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Unset
    • None
    • Platform A&M Sprint 53, Platform A&M Sprint 54, Platform A&M Sprint 55, Platform A&M Sprint 56, Platform A&M Sprint 57, Platform A&M Sprint 58, Platform A&M Sprint 59, Platform A&M Sprint 60, Platform A&M Sprint 61

      Openshift clusters are occasionally reporting "degraded" status due to error encountered during CCX data uploads.
      See https://issues.redhat.com/browse/SDB-3091 and https://issues.redhat.com/browse/CCXDEV-9209.

      The errors seem to be related to authentication issues reported by UHC-proxy:

      https://github.com/RedHatInsights/uhc-auth-proxy/blob/master/server/server.go#L141

      We need enhance logging and monitoring so we can better determine the root cause of the errors:

      (1) Add cloudwatch logging so we have better log retention. Logging should include detail error codes/messages.
      (2) Add Prometheus metrics so we can get a better view of failure patterns.
      (3) Add alerts based on (2)

              rh-ee-dagbay Daniel Agbay
              rhn-support-lphiri Lindani Phiri
              Drew Bomhof, Eric Himmelreich, Tomas Remes
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: