Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54427

azure-cloud-provider rbac error causes ingress SyncLoadBalancer failure

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • Yes
    • None
    • Approved
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      During AzureStack installs, both Terraform and CAPZ (WIP) based, the ingress operator goes degraded:

      The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: EnsureLoadBalancer failed due to fail to lock azure resources. This may because another component is trying to update azure resources, e.g., load balancers. This will be automatically retried by cloud provider exponentially: leases.coordination.k8s.io "aks-managed-resource-locker" is forbidden: User "system:serviceaccount:kube-system:azure-cloud-provider" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"...
      
      

      Version-Release number of selected component (if applicable):

      4.19ec3    

      How reproducible:

          Always

      Steps to Reproduce:

          1. Use WIP cloud-provider-azure fix https://github.com/openshift/cloud-provider-azure/pull/141
          2. Install to Azure Stack
          3. check ingress co
          

      Actual results:

          Operator is degraded with rbac error above

      Expected results:

          Operator is available

      Additional info:

      This upstream change looks related:

      https://github.com/kubernetes-sigs/cloud-provider-azure/pull/7344

      As mentioned in steps to reproduce, I have a WIP fix for https://issues.redhat.com/browse/OCPBUGS-51090 which then results is this bug. If RBAC is related to node labels, these two bugs could be related. I think they are separate bugs, but worth mentioning.

      Our Azure Stack environment is inaccessible to CI right now due to a new security policy from the provider. The provider is meeting today to put together a plan to move forward. I am happy to facilitate or test in any way, I just don't know  how to fix this bug.

      Must gather attached.

              rmanak@redhat.com Radek Manak
              padillon Patrick Dillon
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: