Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.17.z
Affects Version/s: 4.16, 4.17
Component/s: Networking / router
Labels:
- ne-triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Severity:
Critical
Regression:
Yes

Target Backport Versions:
None
Target Version:

4.17.0
Release Blocker:
Rejected
Sprint:
NE Sprint 258
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Previously, the Ingress Controller status incorrectly displayed as `Degraded=False` because of a migration time issue with the `CanaryRepetitiveFailures` condition. With this release, the Ingress Controller status is correctly marked as `Degraded=True` for the appropriate length of time that the `CanaryRepetitiveFailures` condition exists. (link:https://issues.redhat.com/browse/OCPBUGS-39220[*~~OCPBUGS-39220~~*])

Show
* Previously, the Ingress Controller status incorrectly displayed as `Degraded=False` because of a migration time issue with the `CanaryRepetitiveFailures` condition. With this release, the Ingress Controller status is correctly marked as `Degraded=True` for the appropriate length of time that the `CanaryRepetitiveFailures` condition exists. (link: https://issues.redhat.com/browse/OCPBUGS-39220 [* OCPBUGS-39220 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-37491~~. The following is the description of the original issue:
—
Description of problem:

co/ingress is always good even operator pod log error:

2024-07-24T06:42:09.580Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Version-Release number of selected component (if applicable):

    4.17.0-0.nightly-2024-07-20-191204

How reproducible:

    100%

Steps to Reproduce:

    1. install AWS cluster
    2. update ingresscontroller/default and adding   "endpointPublishingStrategy.loadBalancer.allowedSourceRanges", eg

spec:
  endpointPublishingStrategy:
    loadBalancer:
      allowedSourceRanges:
      - 1.1.1.2/32

    3. above setting drop most traffic to LB, so some operator degraded

Actual results:

    co/authentication and console degraded but co/ingress is still good

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-aws.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 
console                                    4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
 
ingress                                    4.17.0-0.nightly-2024-07-20-191204   True        False         False      3h58m   


check the ingress operator log and see:

2024-07-24T06:59:09.588Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

Expected results:

    co/ingress status should reflect the real condition timely

Additional info:

    even co/ingress status can be updated in some scenarios, but it is always less sensitive than authentication and console, we always rely on authentication/console to know the route healthy, the purpose of ingress canary route becomes meaningless.

blocks

OCPBUGS-39323 [Backport-4.16] co/ingress status cannot reflect the real condition

Closed

clones

OCPBUGS-37491 co/ingress status cannot reflect the real condition

Closed

is blocked by

OCPBUGS-37491 co/ingress status cannot reflect the real condition

Closed

is cloned by

OCPBUGS-39323 [Backport-4.16] co/ingress status cannot reflect the real condition

Closed

links to

openshift/cluster-ingress-operator#1136: [release-4.17] OCPBUGS-39220: Ingress operator status not degraded when canary route fails

RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update

(1 links to)

Assignee:: Candace Holman

Reporter:: OpenShift Prow Bot

Need Info From:: None

Contributors:: None

QA Contact:: Hongan Li

Doc Contact:: Darragh Fitzmaurice

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/08/29 4:26 PM

Updated:: 2025/07/21 11:24 PM

Resolved:: 2024/10/01 5:42 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide