Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-43540

SM: Reduce 4xx Error alert to only 4xx's we care about as service owner

XMLWordPrintable

    • Product / Portfolio Work
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Unset
    • None

      As a service owner, we don’t need to alert on every 400-level error—many 4xx responses are expected (e.g., user entered wrong password, client sent invalid parameters, etc.). We should alert only on the ones that usually mean the service is broken, misconfigured, or being abused but can capture all 4xx errors in a graph but not alert on them (we do have this already in our main dashboards but its not great – need to fix it)

      4xx Codes we should alert on

      401 – Unauthorized to potentially uncover auth provider outage, token validation failures, all clients suddenly failing (original reason for this alert)
      403 – Forbidden (unexpected spike) to uncover bad service/config changes, internal client credentials between services failing
      404 – Not Found to incidicate possibly bad routing issues
      408 – Request Timeout to uncover any possible network issues
      409 – Conflict for maybe rare consistency issues
      429 – Too Many Requests if there is any rate limiting happening

      All other 400's should be captured in graph but not alerted on

              anatale.openshift Antony Natale
              anatale.openshift Antony Natale
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: