Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2554

ingress, authentication and console operator goes to degraded after switching default application router scope


    • None
    • SDN Sprint 227, SDN Sprint 228
    • 2
    • Rejected
    • False
    • Hide


    • NA
    • Bug Fix

      Description of problem:
      Switching the spec.endpointPublishingStrategy.loadBalancer.scope of the default ingresscontroller results in a degraded ingress operator. The routes using that endpoint like the console URL become inaccessible.
      Degraded operators after scope change:

      $ oc get co | grep -v ' True        False         False'
      NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.11.4    False       False         True       72m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.kartrosa.ukld.s1.devshift.org/healthz": EOF
      console                                    4.11.4    False       False         False      72m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.kartrosa.ukld.s1.devshift.org): Get "https://console-openshift-console.apps.kartrosa.ukld.s1.devshift.org": EOF
      ingress                                    4.11.4    True        False         True       65m     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)

      We have noticed that each time this happens the underlying AWS loadbalancer gets recreated which is as expected however the router pods probably do not get notified about the new loadbalancer. The instances in the new loadbalancer become 'outOfService'.

      Restarting one of the router pods fixes the issue and brings back a couple of instances under the loadbalancer back to 'InService' which leads to the operators becoming happy again.

      Version-Release number of selected component (if applicable):

      ingress in 4.11.z however we suspect this issue to also apply to older versions

      How reproducible:

      Consistently reproducible

      Steps to Reproduce:

      1. Create a test OCP 4.11 cluster in AWS
      2. Switch the spec.endpointPublishingStrategy.loadBalancer.scope of the default ingresscontroller in openshift-ingress-operator to Internal from External (or vice versa)
      3. New Loadbalancer is created in AWS for the default router service, however the instances behind are not in service

      Actual results:

      ingress, authentication and console operators go into a degraded state. Console URL of the cluster is inaccessible

      Expected results:

      The ingresscontroller scope transition from internal->External (or vice versa) is smooth without any downtime or operators going into degraded state. The console is accessible.


            mmahmoud@redhat.com Mohamed Mahmoud
            kramraja.openshift Karthik Perumal
            Hongan Li Hongan Li
            0 Vote for this issue
            13 Start watching this issue