Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-73789

Machine creation is not fully covered during periods of inconsistent response from AWS

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Rejected
    • CLOUD Sprint 282
    • 1
    • Done
    • Bug Fix
    • Hide
      Before this update, AWS APIs returned inconsistent results regarding the existence of a Machine. The safeguards designed to handle this inconsistency checked the stored instance ID in the incorrect location. Consequently, during AWS API instability, virtual machines (VMs) leaked and attempted to join the cluster indefinitely. With this release, the system uses the correct provider ID for consistency checks. If an instance does not appear within 20 seconds, the machine status changes to Failed to prevent instance leaks. (link:https://issues.redhat.com/browse/OCPBUGS-73789[OCPBUGS-73789])
      Show
      Before this update, AWS APIs returned inconsistent results regarding the existence of a Machine. The safeguards designed to handle this inconsistency checked the stored instance ID in the incorrect location. Consequently, during AWS API instability, virtual machines (VMs) leaked and attempted to join the cluster indefinitely. With this release, the system uses the correct provider ID for consistency checks. If an instance does not appear within 20 seconds, the machine status changes to Failed to prevent instance leaks. (link: https://issues.redhat.com/browse/OCPBUGS-73789 [ OCPBUGS-73789 ])
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-73729. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-72570. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-72523. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-72390. The following is the description of the original issue:

      Description of problem:

          We have checks in place for inconsistent responses from AWS that rely on us having successfully stored the instance ID when we create the Machine. Originally this logic always stored the ID in the spec, but later was moved to status. Some areas were not updated, meaning that, if we created a Machine, and then AWS was not consistent about whether the Machine existed or not, we may end up creating additional VMS and leaking instances.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          Pretty hard, relies on AWS being flaky

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              joelspeed Joel Speed
              joelspeed Joel Speed
              None
              None
              Huali Liu Huali Liu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: