Description of problem:
Switching the spec.endpointPublishingStrategy.loadBalancer.scope of the default ingresscontroller results in a degraded ingress operator. The routes using that endpoint like the console URL become inaccessible.
Degraded operators after scope change:
$ oc get co | grep -v ' True False False'
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.11.4 False False True 72m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.kartrosa.ukld.s1.devshift.org/healthz": EOF
console 4.11.4 False False False 72m RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.kartrosa.ukld.s1.devshift.org): Get "https://console-openshift-console.apps.kartrosa.ukld.s1.devshift.org": EOF
ingress 4.11.4 True False True 65m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
We have noticed that each time this happens the underlying AWS loadbalancer gets recreated which is as expected however the router pods probably do not get notified about the new loadbalancer. The instances in the new loadbalancer become 'outOfService'.
Restarting one of the router pods fixes the issue and brings back a couple of instances under the loadbalancer back to 'InService' which leads to the operators becoming happy again.
Version-Release number of selected component (if applicable):
ingress in 4.11.z however we suspect this issue to also apply to older versions
Steps to Reproduce:
1. Create a test OCP 4.11 cluster in AWS
2. Switch the spec.endpointPublishingStrategy.loadBalancer.scope of the default ingresscontroller in openshift-ingress-operator to Internal from External (or vice versa)
3. New Loadbalancer is created in AWS for the default router service, however the instances behind are not in service
ingress, authentication and console operators go into a degraded state. Console URL of the cluster is inaccessible
The ingresscontroller scope transition from internal->External (or vice versa) is smooth without any downtime or operators going into degraded state. The console is accessible.