Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30773

[4.14 Backport] - Route 'haproxy.router.openshift.io/timeout' value is not validated

XMLWordPrintable

    • Sprint 250, Sprint 251, Sprint 252, Sprint 253
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, timeout values larger than what Golang could parse were not properly validated. Consequently, timeout values larger than what HAProxy could parse caused issues with HAProxy. With this update, if the timeout specifies a values larger than what can be parsed, it is capped at the maximum that HAProxy can parse. As a result, issues are no longer caused for HAProxy. (link:https://issues.redhat.com/browse/OCPBUGS-6959[*OCPBUGS-6959*])

      Original CCFR:

      Cause: Timeout values larger than what golang can parse are not properly validated.

      Consequence: Timeout values larger than what HAProxy can parse cause issues with HAProxy.

      Fix: If the timeout specifies a value larger than what we can parse, cap it at the maximum that HAProxy can parse.

      Results: Any timeout value larger than we can validate with the operator, will be capped at the maximum value that HAProxy can parse.
      Show
      * Previously, timeout values larger than what Golang could parse were not properly validated. Consequently, timeout values larger than what HAProxy could parse caused issues with HAProxy. With this update, if the timeout specifies a values larger than what can be parsed, it is capped at the maximum that HAProxy can parse. As a result, issues are no longer caused for HAProxy. (link: https://issues.redhat.com/browse/OCPBUGS-6959 [* OCPBUGS-6959 *]) Original CCFR: Cause: Timeout values larger than what golang can parse are not properly validated. Consequence: Timeout values larger than what HAProxy can parse cause issues with HAProxy. Fix: If the timeout specifies a value larger than what we can parse, cap it at the maximum that HAProxy can parse. Results: Any timeout value larger than we can validate with the operator, will be capped at the maximum value that HAProxy can parse.
    • Done

      Backport ticket for 4.14

      Description of problem:

      When creating a Route object with the `haproxy.router.openshift.io/timeout annotation. If the value is set very high, the Router will silently fail and will stall Router reloads.
      
      The Route object is created successfully and the option is passed through to the HAProxy instance, however the when the value is higher than 24.8 days, the Router reloads will start crashing.
      
      Reviewing the logs from the Router instance:
      ~~~
      E1121 05:43:51.365875       1 limiter.go:165] error reloading router: exit status 1
      [NOTICE] 324/054351 (350) : haproxy version is 2.2.19-7ea3822
      [NOTICE] 324/054351 (350) : path to executable is /usr/sbin/haproxy
      [ALERT] 324/054351 (350) : parsing [/var/lib/haproxy/conf/haproxy.config:226] : timer overflow in argument '100000000000s' to 'timeout server' (maximum value is 2147483647 ms or ~24.8 days)
      [ALERT] 324/054351 (350) : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config
      [ALERT] 324/054351 (350) : Fatal errors found in configuration.
      ~~~
      
      There are no events produced in the Route namespace:
      ~~~
      |⇒ kge
      No resources found in openshift-console namespace.
      ~~~
      
      After removing the bad annotation, the Router reloads correctly and connections can be established again.
      
      This is a concern as anyone with access to creating Routes can lock-up the Router instances.
      
      

      Version-Release number of selected component (if applicable):

      4.10
      
      

      How reproducible:

      Everytime
      
      

      Steps to Reproduce:

      For example:
      1. Create a Route with the timeout annotation of `100000000000s`
      2. Review the Router logs
      3. Attempt to access the Route (or any newly created Routes without the annotation). Response will be HTTP 503
      
      

      Actual results:

      Response will be HTTP 503
      
      

      Expected results:

      A valid HTTP response (2XX)
      
      

      Additional info:

      -----
      There's this PR which appears to have attempted to address this issue:
      https://github.com/openshift/router/pull/196/files
      https://issues.redhat.com/browse/OCPBUGSM-10016
      
      However I have tested this in OCP 4.10, and the issue is still present.
      
      Tagged this BZ as an OpenShift APIServer issue as it might be desirable to validate the Route as it's added. This also provides the ability to give feedback by rejecting the Route or creating Event objects for why it's failing. (as there's no CRD to set validations on for Routes)
      
      ------
      After reviewing the code, the issue with the current fix implementation looks like it's here:
      https://github.com/openshift/router/blob/master/pkg/router/template/template_helper.go/#L334-L338
      
      The `ParseDuration` function caps out at 290 years and then the function fails-open.
      
      This function is just for truncating the values, but there should be feedback provided to Route creator without having to check the logs. A lot of times the Route creator might not have access to the `openshift-ingress` Namespace.
      
      It looks like this might be a good start here:
      https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/deployment.go/#L184-L209
      
      Might be worth putting the validation in the IngressControllerOperator?
      

            cholman@redhat.com Candace Holman
            rhn-support-mwasher Michael Washer
            Shudi Li Shudi Li
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: