Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16, 4.17
Component/s: Networking / router
Labels:
- ne-triaged

Severity:
Critical
Regression:
Yes
Story Points:
2
Sprint:
NE Sprint 258
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, the Ingress Controller status incorrectly displayed as `Degraded=False` because of a migration time issue with the `CanaryRepetitiveFailures` condition. With this release, the Ingress Controller status is correctly marked as `Degraded=True` for the appropriate length of time that the `CanaryRepetitiveFailures` condition exists. (link:https://issues.redhat.com/browse/OCPBUGS-37491[*~~OCPBUGS-37491~~*])

Show
* Previously, the Ingress Controller status incorrectly displayed as `Degraded=False` because of a migration time issue with the `CanaryRepetitiveFailures` condition. With this release, the Ingress Controller status is correctly marked as `Degraded=True` for the appropriate length of time that the `CanaryRepetitiveFailures` condition exists. (link: https://issues.redhat.com/browse/OCPBUGS-37491 [* OCPBUGS-37491 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.18.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

co/ingress is always good even operator pod log error:

2024-07-24T06:42:09.580Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Version-Release number of selected component (if applicable):

    4.17.0-0.nightly-2024-07-20-191204

How reproducible:

    100%

Steps to Reproduce:

    1. install AWS cluster
    2. update ingresscontroller/default and adding   "endpointPublishingStrategy.loadBalancer.allowedSourceRanges", eg

spec:
  endpointPublishingStrategy:
    loadBalancer:
      allowedSourceRanges:
      - 1.1.1.2/32

    3. above setting drop most traffic to LB, so some operator degraded

Actual results:

    co/authentication and console degraded but co/ingress is still good

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-aws.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 
console                                    4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 
ingress                                    4.17.0-0.nightly-2024-07-20-191204   True        False         False      3h58m   


check the ingress operator log and see:

2024-07-24T06:59:09.588Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Expected results:

    co/ingress status should reflect the real condition timely

Additional info:

    even co/ingress status can be updated in some scenarios, but it is always less sensitive than authentication and console, we always rely on authentication/console to know the route healthy, the purpose of ingress canary route becomes meaningless.

blocks

OCPBUGS-39220 [Backport-4.17] co/ingress status cannot reflect the real condition

Closed

is caused by

OCPBUGS-3522 Improve CanaryChecksRepetitiveFailures actionability

Closed

is cloned by

OCPBUGS-39220 [Backport-4.17] co/ingress status cannot reflect the real condition

Closed

is duplicated by

OCPBUGS-35071 Canary route checks for the default ingress controller are failing but co/ingress is still available

Closed

links to

openshift/api 2143

openshift/cluster-ingress-operator#1125: OCPBUGS-37491: Ingress operator status not degraded when canary route fails

RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update

(2 links to)

Assignee:: Candace Holman

Reporter:: Hongan Li

QA Contact:: Hongan Li

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/07/24 7:15 AM

Updated:: 2025/02/25 4:39 AM

Resolved:: 2025/02/25 4:39 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates