Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37491

co/ingress status cannot reflect the real condition

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.16, 4.17
    • Networking / router
    • Critical
    • Yes
    • 2
    • NE Sprint 258
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: The ingress controller Degraded status was not being set because the CanaryRepetitiveFailures condition transition time was continually being updated due to a flaw in condition status detection.

      Result: The ingress controller was incorrectly displaying Degraded=False when it should have been Degraded=True.

      Fix: Update the condition transition time only when the condition status changes, and not when just the message or reason changes.

      Show
      Cause: The ingress controller Degraded status was not being set because the CanaryRepetitiveFailures condition transition time was continually being updated due to a flaw in condition status detection. Result: The ingress controller was incorrectly displaying Degraded=False when it should have been Degraded=True. Fix: Update the condition transition time only when the condition status changes, and not when just the message or reason changes.
    • Bug Fix
    • In Progress

      Description of problem:

      co/ingress is always good even operator pod log error:
      
      2024-07-24T06:42:09.580Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
          

      Version-Release number of selected component (if applicable):

          4.17.0-0.nightly-2024-07-20-191204

      How reproducible:

          100%

      Steps to Reproduce:

          1. install AWS cluster
          2. update ingresscontroller/default and adding   "endpointPublishingStrategy.loadBalancer.allowedSourceRanges", eg
      
      spec:
        endpointPublishingStrategy:
          loadBalancer:
            allowedSourceRanges:
            - 1.1.1.2/32
      
          3. above setting drop most traffic to LB, so some operator degraded  
          

      Actual results:

          co/authentication and console degraded but co/ingress is still good
      
      $ oc get co
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.hongli-aws.qe.devcluster.openshift.com/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
       
      console                                    4.17.0-0.nightly-2024-07-20-191204   False       False         True       22m     RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com): Get "https://console-openshift-console.apps.hongli-aws.qe.devcluster.openshift.com": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
       
      ingress                                    4.17.0-0.nightly-2024-07-20-191204   True        False         False      3h58m   
      
      
      check the ingress operator log and see:
      
      2024-07-24T06:59:09.588Z    ERROR    operator.canary_controller    wait/backoff.go:226    error performing canary route check    {"error": "error sending canary HTTP Request: Timeout: Get \"https://canary-openshift-ingress-canary.apps.hongli-aws.qe.devcluster.openshift.com\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

      Expected results:

          co/ingress status should reflect the real condition timely

      Additional info:

          even co/ingress status can be updated in some scenarios, but it is always less sensitive than authentication and console, we always rely on authentication/console to know the route healthy, the purpose of ingress canary route becomes meaningless.

       

            cholman@redhat.com Candace Holman
            rhn-support-hongli Hongan Li
            Hongan Li Hongan Li
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: