Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33432

[4.12 Backport] - Route 'haproxy.router.openshift.io/timeout' value is not validated

XMLWordPrintable

    • No
    • 2
    • Sprint 253, Sprint 254, NE Sprint 255
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, timeout values larger than what Golang could parse were not properly validated. Consequently, timeout values larger than what HAProxy could parse caused issues with HAProxy. With this update, if the timeout specifies a values larger than what can be parsed, it is capped at the maximum that HAProxy can parse. As a result, issues are no longer caused for HAProxy. (link:https://issues.redhat.com/browse/OCPBUGS-30432[*OCPBUGS-30432*])
      Show
      * Previously, timeout values larger than what Golang could parse were not properly validated. Consequently, timeout values larger than what HAProxy could parse caused issues with HAProxy. With this update, if the timeout specifies a values larger than what can be parsed, it is capped at the maximum that HAProxy can parse. As a result, issues are no longer caused for HAProxy. (link: https://issues.redhat.com/browse/OCPBUGS-30432 [*OCPBUGS-30432*])
    • Bug Fix
    • Proposed

      This is a clone of issue OCPBUGS-33280. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-30773. The following is the description of the original issue:

      Backport ticket for 4.14

      Description of problem:

      When creating a Route object with the `haproxy.router.openshift.io/timeout annotation. If the value is set very high, the Router will silently fail and will stall Router reloads.
      
      The Route object is created successfully and the option is passed through to the HAProxy instance, however the when the value is higher than 24.8 days, the Router reloads will start crashing.
      
      Reviewing the logs from the Router instance:
      ~~~
      E1121 05:43:51.365875       1 limiter.go:165] error reloading router: exit status 1
      [NOTICE] 324/054351 (350) : haproxy version is 2.2.19-7ea3822
      [NOTICE] 324/054351 (350) : path to executable is /usr/sbin/haproxy
      [ALERT] 324/054351 (350) : parsing [/var/lib/haproxy/conf/haproxy.config:226] : timer overflow in argument '100000000000s' to 'timeout server' (maximum value is 2147483647 ms or ~24.8 days)
      [ALERT] 324/054351 (350) : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config
      [ALERT] 324/054351 (350) : Fatal errors found in configuration.
      ~~~
      
      There are no events produced in the Route namespace:
      ~~~
      |⇒ kge
      No resources found in openshift-console namespace.
      ~~~
      
      After removing the bad annotation, the Router reloads correctly and connections can be established again.
      
      This is a concern as anyone with access to creating Routes can lock-up the Router instances.
      
      

      Version-Release number of selected component (if applicable):

      4.10
      
      

      How reproducible:

      Everytime
      
      

      Steps to Reproduce:

      For example:
      1. Create a Route with the timeout annotation of `100000000000s`
      2. Review the Router logs
      3. Attempt to access the Route (or any newly created Routes without the annotation). Response will be HTTP 503
      
      

      Actual results:

      Response will be HTTP 503
      
      

      Expected results:

      A valid HTTP response (2XX)
      
      

      Additional info:

      -----
      There's this PR which appears to have attempted to address this issue:
      https://github.com/openshift/router/pull/196/files
      https://issues.redhat.com/browse/OCPBUGSM-10016
      
      However I have tested this in OCP 4.10, and the issue is still present.
      
      Tagged this BZ as an OpenShift APIServer issue as it might be desirable to validate the Route as it's added. This also provides the ability to give feedback by rejecting the Route or creating Event objects for why it's failing. (as there's no CRD to set validations on for Routes)
      
      ------
      After reviewing the code, the issue with the current fix implementation looks like it's here:
      https://github.com/openshift/router/blob/master/pkg/router/template/template_helper.go/#L334-L338
      
      The `ParseDuration` function caps out at 290 years and then the function fails-open.
      
      This function is just for truncating the values, but there should be feedback provided to Route creator without having to check the logs. A lot of times the Route creator might not have access to the `openshift-ingress` Namespace.
      
      It looks like this might be a good start here:
      https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/deployment.go/#L184-L209
      
      Might be worth putting the validation in the IngressControllerOperator?
      

              cholman@redhat.com Candace Holman
              openshift-crt-jira-prow OpenShift Prow Bot
              Hongan Li Hongan Li
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: