Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-72523

Machine creation is not fully covered during periods of inconsistent response from AWS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.17.z, 4.16.z, 4.18.z, 4.19.z, 4.20.z, 4.21.z, 4.22
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Rejected
    • CLOUD Sprint 282
    • 1
    • In Progress
    • Bug Fix
    • Hide
      Cause: AWS APIs are not always consistent in returning whether or not the Machine truly exists. We have guards in place to prevent issues when there are inconsistent results, but these checks were looking at the wrong place to find the stored instance ID
      Consequence: During periods of AWS API instability, it's possible that VMs were leaked and would sit in the background trying to join the cluster
      Fix: Ensure we rely on the correct provider ID for eventual consistency checks
      Result: We should no longer leak instances, and any machine where we believe we have created an instance, but have not seen this instance in 20s, will be moved to failed.
      Show
      Cause: AWS APIs are not always consistent in returning whether or not the Machine truly exists. We have guards in place to prevent issues when there are inconsistent results, but these checks were looking at the wrong place to find the stored instance ID Consequence: During periods of AWS API instability, it's possible that VMs were leaked and would sit in the background trying to join the cluster Fix: Ensure we rely on the correct provider ID for eventual consistency checks Result: We should no longer leak instances, and any machine where we believe we have created an instance, but have not seen this instance in 20s, will be moved to failed.
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-72390. The following is the description of the original issue:

      Description of problem:

          We have checks in place for inconsistent responses from AWS that rely on us having successfully stored the instance ID when we create the Machine. Originally this logic always stored the ID in the spec, but later was moved to status. Some areas were not updated, meaning that, if we created a Machine, and then AWS was not consistent about whether the Machine existed or not, we may end up creating additional VMS and leaking instances.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Pretty hard, relies on AWS being flaky

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              ddonati@redhat.com Damiano Donati
              joelspeed Joel Speed
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: