Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41367

MachineWithNoRunningPhase should include the phase in its rendered message

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.18
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem

      Since machine-api-operator#986 introduced (name, namespace) aggregation, $labels.phase substitution in the summary has stopped working, and admins just see a message like:

      machine build09-... is in phase:
      

      when we want them to see:

      machine build09-... is in phase: Provisioning
      

      or similar.

      Version-Release number of selected component

      I haven't tracked machine-api-operator#986 down to a 4.y, but it's a 2022 pull request, so likely a bunch. Not sure a fix is worth backporting or not.

      How reproducible

      Every time.

      Steps to Reproduce

      1. Break machine provisioning, maybe by creating a Machine(Set) with a garbage instance type?
      2. See the MachineWithNoRunningPhase alert pending or firing.
      3. Check the summary on that alert.

      Actual results

      machine build09-... is in phase:
      

      without a phase listed.

      Expected results

      machine ... is in phase: Provisioning
      

      or similar, mentioning the actual phase.

      Additional info

      You can probably just include phase in the list of labels that you preserve in the sum by ....

      You may also want to add $labels.namespace to the summary or description to help admins locate the troubled machine. You can also use with subqueries like the CVO does to give cluster admins oc and web-console hints about how to dig in.

      You may also want to add runbook_url pointing at your alert docs, (which you could also move to the runbook repo), if you wanted to make that more accessible.

      No need to boil the ocean here, I'm just floating some additional polish in case it's easier to handle more things in one pull. My personal definition of done for this ticket is limited to populating the phase in the summary.

              raryan@redhat.com Rachel Ryan
              trking W. Trevor King
              Huali Liu Huali Liu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: