Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62819

Running OpenShift 4.19.10, in Azure, with IPI and ACM 2.14 on a Hub-A cluster. Azure appends '-internal' to the load balance

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • CLOUD Sprint 278
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      Running OpenShift 4.19.10, in Azure, with IPI and ACM on a Hub-A cluster.
      Start with the following MachinePool yaml
      ``` yaml
      apiVersion: hive.openshift.io/v1
      kind: MachinePool
      metadata:
        annotations:
          argocd.argoproj.io/sync-wave: '2'
        labels:
          app.kubernetes.io/instance: cluster-azncusnprd
        name: azncusnprd-infra
        namespace: azncusnprd
      spec:
        clusterDeploymentRef:
          name: azncusnprd
        labels:
          node-role.kubernetes.io/infra: ''
        name: infra
        platform:
          azure:
            computeSubnet: snet-openshift-dev-ncus-compute
            networkResourceGroupName: az-openshift-dev-001
            osDisk:
              diskSizeGB: 256
            type: Standard_D4s_v3
            virtualNetwork: vnet-openshift-dev-ncus-001
            zones:
              - '1'
              - '2'
              - '3'
        replicas: 3
        taints:
          - effect: NoSchedule
            key: node-role.kubernetes.io/infra
            value: reserved
      ```
      
      The MachinePool is created on Hub-A cluster and successfully generates the corresponding MachineSets on the spoke cluster.
      ```yaml
          - lastProbeTime: '2025-09-30T17:54:44Z'
            lastTransitionTime: '2025-09-30T17:54:44Z'
            message: MachineSets generated successfully
            reason: MachineSetGenerationSucceeded
            status: 'True'
            type: MachineSetsGenerated
      ```
      
      Issue 1 pops up in the Machine status conditions:
      ```yaml
      status:
        phase: Provisioning
        providerStatus:
          conditions:
            - lastTransitionTime: '2025-09-30T17:57:07Z'
              message: 'failed to create nic azncusnprd-zcwrb-infra-northcentralus1-b8fwr-nic for machine azncusnprd-zcwrb-infra-northcentralus1-b8fwr: unable to create VM network interface: load balancer azncusnprd-zcwrb not found: network.LoadBalancersClient#Get: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="ResourceNotFound" Message="The Resource ''Microsoft.Network/loadBalancers/azncusnprd-zcwrb'' under resource group ''azncusnprd-zcwrb-rg'' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"'
              reason: MachineCreationFailed
              status: 'False'
              type: MachineCreated
      ```
      
      The balancer in Azure has appended `-internal` to the name, so the Machine controller is looking for the wrong name. I do not see an option to pass a custom name into the MachinePool to match the balancer name. This may even have to be an object created in the default Azure resource group before cluster creation.
      
      If I manually create a balancer without the `-internal` suffix the Machine gets further and hits Issue 2:
      ```yaml
        errorMessage: |
          failed to reconcile machine "azncusnprd-zcwrb-infra-northcentralus1-b8fwr": failed to create vm azncusnprd-zcwrb-infra-northcentralus1-b8fwr: failed to create VM: cannot create vm: PUT https://management.azure.com/subscriptions/56a334d6-620c-49b9-a0b7-08fd25ebf304/resourceGroups/azncusnprd-zcwrb-rg/providers/Microsoft.Compute/virtualMachines/azncusnprd-zcwrb-infra-northcentralus1-b8fwr
          --------------------------------------------------------------------------------
          RESPONSE 400: 400 Bad Request
          ERROR CODE: LocationNotSupportAvailabilityZones
          --------------------------------------------------------------------------------
          {
            "error": {
              "code": "LocationNotSupportAvailabilityZones",
              "message": "The resource 'Microsoft.Compute/virtualMachines/azncusnprd-zcwrb-infra-northcentralus1-b8fwr' does not support availability zones at location 'northcentralus'."
            }
          }
          --------------------------------------------------------------------------------
        errorReason: InvalidConfiguration
        lastUpdated: '2025-09-30T18:11:49Z'
        phase: Failed
      ```
      Azure documentation states that the VM does support zones in `northcentralus`.
      If I go into one of the MachineSets and manually remove the spec.template.spec.providerSpec.value.zone: 1 from azncusnprd-zcwrb-infra-northcentralus1 then the machine Provisions successfully and eventually starts up as a node.
      ```yaml
      status:
        providerStatus:
          conditions:
            - lastTransitionTime: '2025-09-30T18:18:31Z'
              message: machine successfully created
              reason: MachineCreationSucceeded
              status: 'True'
              type: MachineCreated
      ```
      
      If I delete the MachinePool, remove the zones section from the yaml, and recreate the MachinePool then OpenShift on Hub-A that creates the MachinePool errors out because of requiring zones.
      ```yaml
      status:
        conditions:
          - lastProbeTime: '2025-09-30T18:34:09Z'
            lastTransitionTime: '2025-09-30T18:34:09Z'
            message: 'could not generate machinesets: zero zones returned for region northcentralus'
            reason: MachineSetGenerationFailed
            status: 'False'
            type: MachineSetsGenerated
      ```
      1. unsure about the load balancer naming issue.
      2. Openshift MachinePool requires a zone for northcentralus. 
      Azure requires no zone for northcentralus.
      
      
      Describe the impact to you or the business
      This is testing for future use cases.
      
          

      Version-Release number of selected component (if applicable):

      
          

      How reproducible:

      
          

      Steps to Reproduce:

      See above 
          

      Actual results:

      
      Requires manual intervention to fix the install issues
      
          

      Expected results:

      
      No manual intervention
      
          

      Additional info:

      
          

              rhn-gps-mbooth Matthew Booth
              rhn-support-brstone Brian Stone
              Brian Stone
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: