Uploaded image for project: 'OpenShift Cloud'
  1. OpenShift Cloud
  2. OCPCLOUD-1659

Update MachineWithNoRunningPhase and MachineWithoutValidNode alert descriptions to include diagnostic commands

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False

      User Story

      As a user I would like to be able to take action on alerts that are presented so that I can better diagnose the root cause of those alerts.

      Background

      Bug 2104511 describes a situation where it is possible for users to be unable to create new machines because of the usage of the AMI filters to find the instance image instead of using the AMI ID. As of 4.10, the webhooks for AWS machines will properly reject a provider spec that uses the filter method.

      But, there are scenarios where a machine will enter a failed state (potentially related to cluster upgrades) and the current description for the related alerts does not give the user guidance on how to check the conditions on the failed machines.

      This could be improved by adding an "oc" command suggestion and link to the machines console in the alert description.

      See the attached file example-alert.yaml for an example of the format.

      For reference about this issue please read this thread https://coreos.slack.com/archives/CBZHF4DHC/p1660837393467059

      Steps

      • update alert description text

      Stakeholders

      • cloud infra team

      Definition of Done

      • user can view the alerts for MachineWithoutValidNode and MachineWithNoRunningPhase and find the console or oc command to run for more information.
      • Docs
      • don't think this will require a docs update
      • Testing
      • we just need to confirm that the alert descriptions appear properly

            Unassigned Unassigned
            mimccune@redhat.com Michael McCune
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: