Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.11.z
Affects Version/s: 4.11.z
Component/s: Networking / ovn-kubernetes
Labels:
- ServiceDeliveryImpact

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.11.z
Release Blocker:
Rejected
Sprint:
SDN Sprint 227, SDN Sprint 228
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
Bug Fix
Release Note Text:
NA

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
Switching the spec.endpointPublishingStrategy.loadBalancer.scope of the default ingresscontroller results in a degraded ingress operator. The routes using that endpoint like the console URL become inaccessible.
Degraded operators after scope change:

$ oc get co | grep -v ' True        False         False'
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.11.4    False       False         True       72m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.kartrosa.ukld.s1.devshift.org/healthz": EOF
console                                    4.11.4    False       False         False      72m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.kartrosa.ukld.s1.devshift.org): Get "https://console-openshift-console.apps.kartrosa.ukld.s1.devshift.org": EOF
ingress                                    4.11.4    True        False         True       65m     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)

We have noticed that each time this happens the underlying AWS loadbalancer gets recreated which is as expected however the router pods probably do not get notified about the new loadbalancer. The instances in the new loadbalancer become 'outOfService'.

Restarting one of the router pods fixes the issue and brings back a couple of instances under the loadbalancer back to 'InService' which leads to the operators becoming happy again.

Version-Release number of selected component (if applicable):

ingress in 4.11.z however we suspect this issue to also apply to older versions

How reproducible:

Consistently reproducible

Steps to Reproduce:

1. Create a test OCP 4.11 cluster in AWS
2. Switch the spec.endpointPublishingStrategy.loadBalancer.scope of the default ingresscontroller in openshift-ingress-operator to Internal from External (or vice versa)
3. New Loadbalancer is created in AWS for the default router service, however the instances behind are not in service

Actual results:

ingress, authentication and console operators go into a degraded state. Console URL of the cluster is inaccessible

Expected results:

The ingresscontroller scope transition from internal->External (or vice versa) is smooth without any downtime or operators going into degraded state. The console is accessible.

depends on

OCPBUGS-3003 Ignore non-ready endpoints when processing endpointslices

Closed

duplicates

OCPBUGS-21801 Openshift web console getting logout automatically

Closed

is depended on by

OCPBUGS-5077 Service spec value `externalTrafficPolicy` does not trigger rules update in ovnkube-node pod handlers on edit

Closed

relates to

OCPBUGS-2493 TestUnmanagedDNSToManagedDNSInternal E2E failing to verify connectivity

Closed

OCPBUGS-6013 OSD clusters' Ingress health checks & routes fail after swapping application router between public and private

Closed

links to

openshift/ovn-kubernetes#1389: [release-4.11] OCPBUGS-2554: Fix UpdateService healthport checks

(1 links to)

Assignee:: Mohamed Mahmoud (Inactive)

Reporter:: Karthik Perumal

Need Info From:: None

Contributors:: None

QA Contact:: Hongan Li

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: 2022/10/19 6:32 AM

Updated:: 2025/09/13 11:37 PM

Resolved:: 2022/12/15 9:33 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide