Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-1080

Proof-of-concept oc access to firing alerts

    XMLWordPrintable

Details

    • Story
    • Resolution: Done
    • Major
    • openshift-4.16
    • None
    • None
    • OTA 248, OTA 249

    Description

      Implementing RFE-928 would help a number of update-team use-cases:

      • OTA-368, OSDOCS-2427, and other tickets that are mulling over rendering update-related alerts when folks are trying to decide whether to launch an update.
      • OTA-1021's update status subcommand could use these to help admins discover and respond to update-related issues in their updating clusters.

      The updates team is not well positioned to maintain oc access long-term; that seems like a better fit for the monitoring team (who maintain Alertmanager) or the workloads team (who maintain the bulk of oc). But we can probably hack together a proof-of-concept which we could later hand off to those teams, and in the meantime it would unblock our work on tech-preview commands consuming the firing-alert information.

      The proof-of-concept could follow the following process:

      1. Get alertmanager URL from route alertmanager-main in openshift-monitoring namespace
      2. Use the $ALERTMANAGER/api/v1/alerts endpoint to get data about alerts (see https://github.com/prometheus/alertmanager#api)
      3. The endpoint is authenticated via bearer token, same as against apiserver (possible Role for this mentioned in MON-3396 OBSDA-530)

      Definition of done:

      $ OC_ENABLE_CMD_INSPECT_ALERTS=true oc adm inspect-alerts
      ...dump of firing alerts...
      

      and a backing Go function that other subcommands like oc adm upgrade status can consume internally.

      Attachments

        Issue Links

          Activity

            People

              trking W. Trevor King
              trking W. Trevor King
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: