-
Task
-
Resolution: Obsolete
-
Major
-
1.23.0
-
False
-
None
-
False
-
No
-
RC1
Origin: tests/alerts/c05b-verify-alerts-are-exposed-and-no-critical-alerts-have-fired.md
Prerequisites
- access to `cloud-services-qe-reporting@redhat.com
` email list
- workload webapp should be running on cluster and have been deployed shortly after cluster was provisioned
Description
Note: double-check that the workload webapp is not already deployed before attempting to deploy it.
Note: some alerts might fire due to automated tests and destructive test-cases, these can be safely ignored.
Verify that the RHOAM alerts are exposed via Telementry to cloud.redhat.com and that no critical RHOAM alerts have fired during lifespan of cluster.
This should be one of the last testcases performed on a cluster to allow for maximum burn-in time on cluster.
Testcase should not be performed on a cluster that has been used for destructive testing.
Steps
- Login via oc as kubeadmin
- Confirm the e-mail address where the alert notifications are sent, it should be `cloud-services-qe-reporting@redhat.com
`.
oc get rhmi rhoam -n redhat-rhoam-operator -o json | jq -r .spec.alertingEmailAddress
- Check the inbox of the e-mail address and check if there are any alert notifications that are not related to testing. This can be acheived by subscribing to cloud-services-qe-reporting@redhat.com
here: https://post-office.corp.redhat.com/mailman/listinfo/cloud-services-qe-reporting or alternatively you can view the archives without subscription here: http://post-office.corp.redhat.com/archives/cloud-services-qe-reporting/
- Check there are no currently firing alerts.
-
- Get the prometheus route oc get routes -n redhat-rhoam-observability | grep prometheus
- Login to Prometheus using kubeadmin credentials
- Allow permissions
- In the alerts tab in the prometheus ui confirm that only one alert DeadMansSwitch is firing
The only RHOAM alert here should be DeadMansSwitch.
Note: there may be other alerts from the Openshift firing, however for the purposses of this test, it only fails if RHOAM alerts are firing here.
- Check no unexpected alert email notifications were received. Check this when cluster is more than a few hours old, at least 1 day old, and before cluster is deprovisioned.
If any critical alerts fired during any of these periods:
- Take screenshots showing the time the alerts fired and when they were resolved
- Create a followup bug JIRA and inform release coordinators. Example JIRA: https://issues.redhat.com/browse/INTLY-9443
- Request that cluster lifespan be extended to allow time for cluster to be investigated (ask release coordinator).
- Open the RHOAM Grafana Console in the redhat-rhoam-observability namespace
echo "https://$(oc get route grafana-route -n redhat-rhoam-observability -o=jsonpath='{.spec.host}')"
- Select the Workload App dashboard
Verify that 3scale and SSO are working by checking the Status graph.
Make sure the proper time interval is selected (you can ignore downtimes during automated tests and destructive test-cases).
Short initial 3scale downtime is expected, it is a known issue
Downtime measurement might not be 100% reliable, see MGDAPI-2333
General guidelines for testing
- blocks
-
MGDAPI-4210 K04 - [DESTRUCTIVE] - performance - Run performance test against RHOAM (3scale + user SSO)
-
- Closed
-
- mentioned on