Uploaded image for project: 'OpenShift Cloud'
  1. OpenShift Cloud
  2. OCPCLOUD-2911

Ensure NICs in ProvisioningFailed state trigger a retry

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • None
    • CLOUD Sprint 271

      This is issue tracks implementing a fix for the OCPBUGS-31515 behavior. While that Jira card handles the scenarios for MAPI, the solution for CAPI will require a different approach given the changes upstream has made.

      The upstream issue is https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/5515.

      https://learn.microsoft.com/en-us/azure/networking/troubleshoot-failed-state#provisioning-states is helpful in understanding how Azure's API reacts.

      In summary:

      • A customer has observed that NICs can be attached to a VM and functional, but the Azure API marks metadata for the NIC as "ProvisioningFailed", which then caused cascading problems because dependent resources will not proceed until the "ProvisioningFailed" status is cleared.
      • MAPI had a fix with OCPBUGS-31515
      • Cluster API Provider for Azure has changed their code around this handling significantly, opting to use the Azure Service Operator (ASO) for Azure resources. The NIC code resides in https://github.com/kubernetes-sigs/cluster-api-provider-azure/tree/main/azure/services/networkinterfaces

              Unassigned Unassigned
              rh-ee-nbrubake Nolan Brubaker
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: