Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-687

TestMetrics e2e not cleaning up correctly after itself


    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • 1
    • MCO Sprint 239
    • 0
    • 0

      The TestMetrics e2e test is not correctly cleaning up the MachineConfigs and MachineConfigPools it creates. This means that other e2e tests which run after this e2e test can falsely fail or become flaky.

      What's happening is this:

      1. The target node is removed from the ephemeral MachineConfigPool by unlabelling it.
      2. A race condition occurs when we call WaitForPoolComplete because technically, the pool is updated at this point since it has not yet picked up the unlabelling event from the target node.
      3. We delete the ephemeral MachineConfigPool, which deletes the rendered MachineConfigs that belong to it.
      4. The node starts the update process, but cannot find the rendered MachineConfigs for the ephemeral pool since they were deleted. The MCD degrades at this point and blocks the worker MachineConfigPool.


      The cleanup flow should look like this:

      1. The target node is removed from the ephemeral MachineConfigPool by unlabeling it.
      2. Wait until the target node completes the switch back to the worker pool.
      3. Delete the ephemeral MachineConfigPool that was created for the test.
      4. Delete any MachineConfigs assigned to that ephemeral MachineConfigPool.


            zzlotnik@redhat.com Zack Zlotnik
            zzlotnik@redhat.com Zack Zlotnik
            0 Vote for this issue
            1 Start watching this issue