Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-81

MCD: emit earlier events to warn about failing drains

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • MCO Sprint 263 (DevEx), MCO Sprint 264
    • 0
    • 0.000

      In newer versions of OCP, we have changed our draining mechanism to only fail after 1 hour. This also means that the event which captures the failing drain was also moved to the failure at the 1hr mark.

       

      Today, upgrade tests oft fail with timeouts related to drain errors (PDB or other). There exists no good way to distinguish what pods are failing and for what reason, so we cannot easily aggregate this data in CI to tackle issues related to PDBs to improve upgrade and CI pass rate.

       

      If the MCD, upon a drain run failure, emits the failing pod and reason (PDB, timeout) as an event, it would be easier to write a test to aggregate this data.

       

      Context in this thread: https://coreos.slack.com/archives/C01CQA76KMX/p1633635861184300 

              djoshy David Joshy
              jerzhang@redhat.com Yu Qi Zhang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: