Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-39220

[Backport-4.17] co/ingress status cannot reflect the real condition

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.16, 4.17
    • Networking / router
    • Critical
    • Yes
    • 1
    • NE Sprint 258
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the Ingress Controller status incorrectly displayed as `Degraded=False` because of a migration time issue with the `CanaryRepetitiveFailures` condition. With this release, the Ingress Controller status is correctly marked as `Degraded=True` for the appropriate length of time that the `CanaryRepetitiveFailures` condition exists. (link:https://issues.redhat.com/browse/OCPBUGS-39220[*OCPBUGS-39220*])
      Show
      * Previously, the Ingress Controller status incorrectly displayed as `Degraded=False` because of a migration time issue with the `CanaryRepetitiveFailures` condition. With this release, the Ingress Controller status is correctly marked as `Degraded=True` for the appropriate length of time that the `CanaryRepetitiveFailures` condition exists. (link: https://issues.redhat.com/browse/OCPBUGS-39220 [* OCPBUGS-39220 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-37491. The following is the description of the original issue:

      Description of problem:

      co/ingress is always good even operator pod log error:
      
      2024-07-24T06:42:09.580Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
          

      Version-Release number of selected component (if applicable):

          4.17.0-0.nightly-2024-07-20-191204

      How reproducible:

          100%

      Steps to Reproduce:

          1. install AWS cluster
          2. update ingresscontroller/default and adding   "endpointPublishingStrategy.loadBalancer.allowedSourceRanges", eg
      
      spec:
        endpointPublishingStrategy:
          loadBalancer:
            allowedSourceRanges:
            - 1.1.1.2/32
      
          3. above setting drop most traffic to LB, so some operator degraded  
          

      Actual results:

          co/authentication and console degraded but co/ingress is still good
      
      $ oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-aws.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
       
      console                                    4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
       
      ingress                                    4.17.0-0.nightly-2024-07-20-191204   True        False         False      3h58m   
      
      
      check the ingress operator log and see:
      
      2024-07-24T06:59:09.588Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

      Expected results:

          co/ingress status should reflect the real condition timely

      Additional info:

          even co/ingress status can be updated in some scenarios, but it is always less sensitive than authentication and console, we always rely on authentication/console to know the route healthy, the purpose of ingress canary route becomes meaningless.

       

              cholman@redhat.com Candace Holman
              openshift-crt-jira-prow OpenShift Prow Bot
              Hongan Li Hongan Li
              Darragh Fitzmaurice Darragh Fitzmaurice
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: