Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58031

AKS: HCP operator should set Shared ClusterServiceLoadBalancerHealthProbeMode

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 4.20.0
    • 4.20.0
    • HyperShift
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • Proposed
    • None
    • Proposed
    • Bug Fix
    • Hide
      *Cause*: The Azure Cloud Provider no longer sets the default ping target "HTTP:10256/healthz" for Azure Load Balancer. For Services of type LoadBalancer running on Azure, the Load Balancer object created in Azure now has a ping target "TCP:30810".
      *Consequence*: The health probes for cluster wide services are basically non-functional and during upgrades, these services observe downtime.
      *Fix*: Configure the ClusterServiceLoadBalancerHealthProbeMode property of the cloud config to be "Shared". This cloud config is passed to the Azure Cloud Provider.
      *Result*: Load Balancers in Azure have correct health check ping target HTTP:10256/healthz which points to kube-proxy health endpoints running on nodes.
      Show
      *Cause*: The Azure Cloud Provider no longer sets the default ping target "HTTP:10256/healthz" for Azure Load Balancer. For Services of type LoadBalancer running on Azure, the Load Balancer object created in Azure now has a ping target "TCP:30810". *Consequence*: The health probes for cluster wide services are basically non-functional and during upgrades, these services observe downtime. *Fix*: Configure the ClusterServiceLoadBalancerHealthProbeMode property of the cloud config to be "Shared". This cloud config is passed to the Azure Cloud Provider. *Result*: Load Balancers in Azure have correct health check ping target HTTP:10256/healthz which points to kube-proxy health endpoints running on nodes.
    • None
    • None
    • None
    • None

      Description of problem:

      As reported in OCPBUGS-56688, the following test started to fail as it newly validates connectivity to the http route.

      [sig-network-edge][OCPFeatureGate:GatewayAPIController][Feature:Router][apigroup:gateway.networking.k8s.io] Ensure HTTPRoute object is created [Suite:openshift/conformance/parallel]    

      The root cause is most likely same as for OCPBUGS-56011 .
      The ping protocol, path and port should point to kube-proxy running on each node. This change was brought upstream for Azure here. This change was brought to classic OpenShift in cluster-cloud-controller-manager-operator a long time ago where the "shared" mode is enforced by default: here.

      However, it is NOT configured/enforced for Hypershift. See config overrides and default flags

      Version-Release number of selected component (if applicable):

          4.20
      It appears only for 4.20 as the test enhancement was only committed on release-4.20 branch. But the functionality is probably affected back to 4.16 where the "shared" load balancer mode was introduced: https://github.com/openshift/cluster-cloud-controller-manager-operator/blob/release-4.16/pkg/cloud/azure/azure.go#L180 (branch release-4.15 doesn't have it)

      How reproducible:

          Always

      Steps to Reproduce:

          1. Run the "e2e-azure-aks-ovn-conformance" test jobs on AKS like in this run:
      https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.20-periodics-e2e-azure-aks-ovn-conformance/1937334309027319808
          2.
          3.
          

      Actual results:

          The test "Ensure HTTPRoute object is created" fails

      Expected results:

          The test passes

      Additional info:

          

              mgencur@redhat.com Martin Gencur
              mgencur@redhat.com Martin Gencur
              None
              None
              Martin Gencur Martin Gencur
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: