-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
Strategic Product Work
-
5
-
False
-
None
-
False
-
OCPSTRAT-648 - (Tech Preview) 'oc adm upgrade status' command
-
-
-
OTA 248, OTA 249
Implementing RFE-928 would help a number of update-team use-cases:
- OTA-368, OSDOCS-2427, and other tickets that are mulling over rendering update-related alerts when folks are trying to decide whether to launch an update.
OTA-1021's update status subcommand could use these to help admins discover and respond to update-related issues in their updating clusters.
The updates team is not well positioned to maintain oc access long-term; that seems like a better fit for the monitoring team (who maintain Alertmanager) or the workloads team (who maintain the bulk of oc). But we can probably hack together a proof-of-concept which we could later hand off to those teams, and in the meantime it would unblock our work on tech-preview commands consuming the firing-alert information.
The proof-of-concept could follow the following process:
- Get alertmanager URL from route alertmanager-main in openshift-monitoring namespace
- Use the $ALERTMANAGER/api/v1/alerts endpoint to get data about alerts (see https://github.com/prometheus/alertmanager#api)
- The endpoint is authenticated via bearer token, same as against apiserver (possible Role for this mentioned in
MON-3396OBSDA-530)
Definition of done:
$ OC_ENABLE_CMD_INSPECT_ALERTS=true oc adm inspect-alerts ...dump of firing alerts...
and a backing Go function that other subcommands like oc adm upgrade status can consume internally.
- blocks
-
OCPBUGS-33896 status: show upgrade-related alerts in update health section
- Verified
- is related to
-
OCPBUGS-36406 PrometheusOperatorRejectedResources should link its runbook
- Verified
- relates to
-
MON-3396 add role.rbac.authorization.k8s.io/monitoring-alertmanager-view
- Closed
-
OTA-1021 Adding 'oc adm upgrade status' command (phase-1, undocumented)
- Closed
-
OTA-368 Documentation on how to avoid user errors during update.
- To Do
- links to