Uploaded image for project: 'OpenShift Cloud'
  1. OpenShift Cloud
  2. OCPCLOUD-1660

Improve error conditions for MachineSet failing to create new Machines

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False

      User Story

      As a user I would like to be able to read error conditions on a MachineSet when it is failing to create new Machines, so that I can properly diagnose the failure.

      Background

      While investigating Bug 2104511 , it is possible to have a MachineSet with a providerSpec that will not pass webhook validation. When the replicas are increased on the MachineSet (using the scale subresource), a new Machine is created and rejected by the webhook but no condition is ever surfaced for the user to inspect. This can be determined by inspecting the events for the webhooks in the openshift-machine-api namespace. It would be convenient for users to see this information in the MachineSet conditions as well.

      This might require some investigation about if we can add a condition to the MachineSet during a webhook operation.

      We should also investigate if exporting the conditions from the MachineSet and creating alerts based on those conditions would be an improvement for users.

      For reference about this issue please read this thread https://coreos.slack.com/archives/CBZHF4DHC/p1660837393467059

      Steps

      • investigate adding conditions to MachineSet from Machine webhook
      • if possible/reasonable, add conditions to MachineSet when a validating webhook rejection occurs
      • if reasonable, export conditions and create alerts for MachineSets based on error conditions

      Stakeholders

      • cloud infra team

      Definition of Done

      • user can observe Machine validation webhook failures on the MachineSet
      • Docs
      • we might need to update the product docs, need to double check if we have any guidance here already
      • Testing
      • should add unit testing at the least to ensure this transaction works

              Unassigned Unassigned
              mimccune@redhat.com Michael McCune
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: