Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-36737

WebRCA: Update status.redhat.com on incidents

    • False
    • Hide

      None

      Show
      None
    • False
    • None

      As a user of web-rca I want to have the Redhat status page status.redhat.com and/or status.quay.io or other status pages we have to be updated in the case of an incident with my Product.
      The status page content is curated and visible to customers, therefore some mapping is required from the internal component names to the status page provided to customers. Also, we might not want to expose all internal applications/services.
      Not sure if it would be required to automatically update the status page on a certain severity or if a manual change of the status page via web-rca is sufficient.

            [RHCLOUD-36737] WebRCA: Update status.redhat.com on incidents

            Patrick Martin added a comment - - edited

            FYI, app-interface also manages statuspage component status (and maintenance schedules):
            https://service.pages.redhat.com/dev-guidelines/docs/appsre/advanced/statuspage/#status-management

            Please keep AppSRE in the loop here so we work out

            • how to share the link between an app and it statuspage component (we have it in app-interface)
            • how to avoid 2 different systems fighting over the same statuspage component status. Maybe add a provider:webrca option in app-interface?

            FYI this feature was also requested following an incident where people did not know how or who could/should update statuspage manually.

            Patrick Martin added a comment - - edited FYI, app-interface also manages statuspage component status (and maintenance schedules): https://service.pages.redhat.com/dev-guidelines/docs/appsre/advanced/statuspage/#status-management Please keep AppSRE in the loop here so we work out how to share the link between an app and it statuspage component (we have it in app-interface) how to avoid 2 different systems fighting over the same statuspage component status. Maybe add a provider:webrca option in app-interface? FYI this feature was also requested following an incident where people did not know how or who could/should update statuspage manually. https://redhat-internal.slack.com/archives/C07FQ00BJV8/p1740753079646379 https://redhat-internal.slack.com/archives/C08FRHJF7BK/p1740695669219129

            Adam Drew added a comment -

            Removed Stephen because he is no longer with the team. Confirmed with Ben that we do want to do this so putting back in New status.

            Adam Drew added a comment - Removed Stephen because he is no longer with the team. Confirmed with Ben that we do want to do this so putting back in New status.

            This more or less blocked by https://jira.atlassian.com/browse/STATUS-871.  There is a work around which is to simply comment out certain lines, but that technically gives an unofficial client...

            Stephen Reaves (Inactive) added a comment - This more or less blocked by https://jira.atlassian.com/browse/STATUS-871.   There is a work around which is to simply comment out certain lines, but that technically gives an unofficial client...

            Another interesting tidbit is that status-board already supports maintenance windows so we may still want to use the statuspage client to post maintenance windows (which is the only workflow that use app-interface MRs), but there's currently no automatiion to create maintenance windows AFAIK so we'd be trading the app-interface MR for manually running 'ocm post' commands in the CLI

            Stephen Reaves (Inactive) added a comment - Another interesting tidbit is that status-board already supports maintenance windows so we may still want to use the statuspage client to post maintenance windows (which is the only workflow that use app-interface MRs), but there's currently no automatiion to create maintenance windows AFAIK so we'd be trading the app-interface MR for manually running 'ocm post' commands in the CLI

            It does seem that you can target VPN-internal services.  It seems you specify nodes or a node group that are internal.  I'd need to spend a little more time researching this but I was able to find other internal services that are tested via catchpoint.

            Stephen Reaves (Inactive) added a comment - It does seem that you can target VPN-internal services.  It seems you specify nodes or a node group that are internal.  I'd need to spend a little more time researching this but I was able to find other internal services that are tested via catchpoint.

            So it seems that the majority of statuspage updates actually come from catchpoint.  Catchpoint automatically runs tests on services and will send an "Alert" if those tests fail.  If we were to send a message directly to statuspage, catchpoint will overwrite it, so we actually may need to integrate with catchpoint.  The unfortunate side of this is the catchpoint API doesn't seem to support sending alerts, only triggering a retest or viewing existing alerts.  I'm thinking this might require rethinking this ticket.

            It might make more sense to have catchpoint pull from web-rca/status as an additional test.  The tests seem to be generic javascript snippets so we theoretically could write custom functions that make GET requests against web-rca (and status-board).  There are catchpoint tests that do auth related stuff, so I'm not super worried about that.  I haven't found any catchpoint tests dealing with VPN-related stuff but I only just got access to the catchpoint and don't quite have everything figured out yet.

            CC: gburges@redhat.com rhn-support-bturner 

            Stephen Reaves (Inactive) added a comment - So it seems that the majority of statuspage updates actually come from catchpoint.  Catchpoint automatically runs tests on services and will send an "Alert" if those tests fail.  If we were to send a message directly to statuspage, catchpoint will overwrite it, so we actually may need to integrate with catchpoint.  The unfortunate side of this is the catchpoint API doesn't seem to support sending alerts, only triggering a retest or viewing existing alerts.  I'm thinking this might require rethinking this ticket. It might make more sense to have catchpoint pull from web-rca/status as an additional test.  The tests seem to be generic javascript snippets so we theoretically could write custom functions that make GET requests against web-rca (and status-board).  There are catchpoint tests that do auth related stuff, so I'm not super worried about that.  I haven't found any catchpoint tests dealing with VPN-related stuff but I only just got access to the catchpoint and don't quite have everything figured out yet. CC: gburges@redhat.com rhn-support-bturner  

            Stephen Reaves mentioned this issue in merge request !577 of Service Delivery / Status Board on branch statuspage:

            Draft: Initial statuspage Integration

            GitLab CEE Bot added a comment - Stephen Reaves mentioned this issue in merge request !577 of Service Delivery / Status Board on branch statuspage : Draft: Initial statuspage Integration

            The majority of products should be able to use the statuspage API to update status.redhat.com (https://developer.statuspage.io/) but the products/applications which are managed clusters will need to use managed-notifications (https://github.com/openshift/managed-notifications).  Luckily, managed-notifications seem to work through POST requests sent via osdctl, which seems to be a wrapper for the ocm-go-sdk, which web-rca already uses.  The solution to this ticket needs to take into consideration that different products send their statuses to different locations.

            Stephen Reaves (Inactive) added a comment - The majority of products should be able to use the statuspage API to update status.redhat.com ( https://developer.statuspage.io/) but the products/applications which are managed clusters will need to use managed-notifications ( https://github.com/openshift/managed-notifications).   Luckily, managed-notifications seem to work through POST requests sent via osdctl, which seems to be a wrapper for the ocm-go-sdk, which web-rca already uses.  The solution to this ticket needs to take into consideration that different products send their statuses to different locations.

              Unassigned Unassigned
              jboll@redhat.com Jan-Hendrik Boll
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: