Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-24191

[4.14] Load balancers are not created in ARO

    XMLWordPrintable

Details

    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, after installing an Azure Red Hat OpenShift cluster, some cluster Operators were unavailable. This was the result of one of the cluster’s load balancers not being created as part of the installation process. With this update, the load balancer is correctly created. After installing a cluster, all cluster Operators are available. (link:https://issues.redhat.com/browse/OCPBUGS-24191[*OCPBUGS-24191*])
      Show
      Previously, after installing an Azure Red Hat OpenShift cluster, some cluster Operators were unavailable. This was the result of one of the cluster’s load balancers not being created as part of the installation process. With this update, the load balancer is correctly created. After installing a cluster, all cluster Operators are available. (link: https://issues.redhat.com/browse/OCPBUGS-24191 [* OCPBUGS-24191 *])
    • Bug Fix
    • Done

    Description

      After creating a 4.14 ARO cluster, some cluster operators are not available because load balancer can't be created.

      It is because of the change of the default value of vmType in cloud-provider-azure.

      https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4214

      In ARO, we use standard vmType and don't use any vmss as a cluster node, but installer doesn't specify vmType, which causes vmType mismatch and cloud-provider-azure can't configure load balancer.

      https://github.com/openshift/installer/blob/release-4.14/pkg/asset/manifests/azure/cloudproviderconfig.go

      We would like it to make vmType default `standard` or to have an option to change it via install config or something.

      discussion thread: https://redhat-internal.slack.com/archives/C68TNFWA2/p1700814868246649

       

      Reproducible steps:

      Create an 4.14 ARO cluster.
      Creating a normal cluster with standard vm in Azure might also reproduce the issue
      

      What I got:

      ❯ oc get co
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.14.1    False       True          True       21m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.atokubi.eastus.osadev.cloud/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)...
      cloud-controller-manager                   4.14.1    True        False         False      24m
      cloud-credential                           4.14.1    True        False         False      26m
      cluster-autoscaler                         4.14.1    True        False         False      20m
      config-operator                            4.14.1    True        False         False      21m
      console                                    4.14.1    False       True          False      13m     DeploymentAvailable: 0 replicas available for console deployment...
      control-plane-machine-set                  4.14.1    True        False         False      14m
      csi-snapshot-controller                    4.14.1    True        False         False      20m
      dns                                        4.14.1    True        False         False      20m
      etcd                                       4.14.1    True        False         False      19m
      image-registry                             4.14.1    True        False         False      8m11s
      ingress                                              False       True          True       7m36s   The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0...
      insights                                   4.14.1    True        False         False      14m
      kube-apiserver                             4.14.1    True        True          False      10m     NodeInstallerProgressing: 1 nodes are at revision 5; 2 nodes are at revision 6
      kube-controller-manager                    4.14.1    True        False         False      18m
      kube-scheduler                             4.14.1    True        False         False      17m
      kube-storage-version-migrator              4.14.1    True        False         False      21m
      machine-api                                4.14.1    True        False         False      11m
      machine-approver                           4.14.1    True        False         False      20m
      machine-config                             4.14.1    True        False         False      15m
      marketplace                                4.14.1    True        False         False      20m
      monitoring                                 4.14.1    True        False         False      6m53s
      network                                    4.14.1    True        False         False      22m
      node-tuning                                4.14.1    True        False         False      20m
      openshift-apiserver                        4.14.1    True        False         False      14m
      openshift-controller-manager               4.14.1    True        False         False      20m
      openshift-samples                          4.14.1    True        False         False      14m
      operator-lifecycle-manager                 4.14.1    True        False         False      20m
      operator-lifecycle-manager-catalog         4.14.1    True        False         False      20m
      operator-lifecycle-manager-packageserver   4.14.1    True        False         False      14m
      service-ca                                 4.14.1    True        False         False      21m
      storage                                    4.14.1    True        False         False      20m 
      ❯ oc get svc -A | grep LoadBalancer
      openshift-ingress                                  router-default                             LoadBalancer   172.30.43.24     <pending>                              80:32538/TCP,443:31115/TCP                38m
      
      ❯ oc get cm cloud-provider-config -n openshift-config -oyaml
      apiVersion: v1
      data:
        config: '{"cloud":"AzurePublicCloud","tenantId":"<reducted>","aadClientId":"","aadClientSecret":"","aadClientCertPath":"","aadClientCertPassword":"","useManagedIdentityExtension":false,"userAssignedIdentityID":"","subscriptionId":"<reducted>","resourceGroup":"aro-atokubi","location":"eastus","vnetName":"dev-vnet","vnetResourceGroup":"v4-eastus","subnetName":"atokubi-worker","securityGroupName":"atokubi-vnkt5-nsg","routeTableName":"atokubi-vnkt5-node-routetable","primaryAvailabilitySetName":"","vmType":"","primaryScaleSetName":"","cloudProviderBackoff":true,"cloudProviderBackoffRetries":0,"cloudProviderBackoffExponent":0,"cloudProviderBackoffDuration":6,"cloudProviderBackoffJitter":0,"cloudProviderRateLimit":false,"cloudProviderRateLimitQPS":0,"cloudProviderRateLimitBucket":0,"cloudProviderRateLimitQPSWrite":0,"cloudProviderRateLimitBucketWrite":0,"useInstanceMetadata":true,"loadBalancerSku":"standard","excludeMasterFromStandardLB":false,"disableOutboundSNAT":true,"maximumLoadBalancerRuleCount":0}'
      kind: ConfigMap
      metadata:
        creationTimestamp: "2023-11-29T10:08:19Z"
        name: cloud-provider-config
        namespace: openshift-config
        resourceVersion: "33363"
        uid: 8b35cf3f-65ee-428d-92e6-304165301e96 
      ❯ oc logs azure-cloud-controller-manager-fbdfbdb86-hk646 -n openshift-cloud-controller-manager
      Defaulted container "cloud-controller-manager" out of: cloud-controller-manager, azure-inject-credentials (init)
      <omitted>
      I1129 10:46:47.401672       1 controller.go:388] Ensuring load balancer for service openshift-ingress/router-default
      I1129 10:46:47.401732       1 azure_loadbalancer.go:122] reconcileService: Start reconciling Service "openshift-ingress/router-default" with its resource basename "ac376ce0f66164eebb9fc0fa76a9c697"
      I1129 10:46:47.401742       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(openshift-ingress/router-default) - wantLb(true): started
      I1129 10:46:47.401849       1 event.go:307] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
      I1129 10:46:47.505374       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-atokubi) success
      I1129 10:46:47.573290       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(openshift-ingress/router-default): lb(aro-atokubi/atokubi-vnkt5) wantLb(true) resolved load balancer name
      I1129 10:46:47.643053       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
      E1129 10:46:47.716774       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-atokubi/providers/Microsoft.Network/networkInterfaces/atokubi-vnkt5-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
      E1129 10:46:47.716802       1 azure_loadbalancer.go:126] reconcileLoadBalancer(openshift-ingress/router-default) failed: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
      I1129 10:46:47.716835       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.315082823 request="services_ensure_loadbalancer" resource_group="aro-atokubi" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="openshift-ingress/router-default" result_code="failed_ensure_loadbalancer"
      E1129 10:46:47.716866       1 controller.go:291] error processing service openshift-ingress/router-default (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0
      I1129 10:46:47.716964       1 event.go:307] "Event occurred" object="openshift-ingress/router-default" fieldPath="" kind="Service" apiVersion="v1" type="Warning" reason="SyncLoadBalancerFailed" message="Error syncing load balancer: failed to ensure load balancer: failed to map VM Name to NodeName: VM Name atokubi-vnkt5-master-0"
      

       

      After changing vmType from empty to "standard" in cloud-provider-config, it can configure load balancer and errors are gone.

       

      Attachments

        Issue Links

          Activity

            People

              padillon Patrick Dillon
              rh-ee-atokubi Ayato Tokubi
              Mike Gahagan Mike Gahagan
              Mike Pytlak Mike Pytlak
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: