-
Bug
-
Resolution: Done-Errata
-
Major
-
4.10.0
Description of problem:
When creating a Route object with the `haproxy.router.openshift.io/timeout annotation. If the value is set very high, the Router will silently fail and will stall Router reloads. The Route object is created successfully and the option is passed through to the HAProxy instance, however the when the value is higher than 24.8 days, the Router reloads will start crashing. Reviewing the logs from the Router instance: ~~~ E1121 05:43:51.365875 1 limiter.go:165] error reloading router: exit status 1 [NOTICE] 324/054351 (350) : haproxy version is 2.2.19-7ea3822 [NOTICE] 324/054351 (350) : path to executable is /usr/sbin/haproxy [ALERT] 324/054351 (350) : parsing [/var/lib/haproxy/conf/haproxy.config:226] : timer overflow in argument '100000000000s' to 'timeout server' (maximum value is 2147483647 ms or ~24.8 days) [ALERT] 324/054351 (350) : Error(s) found in configuration file : /var/lib/haproxy/conf/haproxy.config [ALERT] 324/054351 (350) : Fatal errors found in configuration. ~~~ There are no events produced in the Route namespace: ~~~ |⇒ kge No resources found in openshift-console namespace. ~~~ After removing the bad annotation, the Router reloads correctly and connections can be established again. This is a concern as anyone with access to creating Routes can lock-up the Router instances.
Version-Release number of selected component (if applicable):
4.10
How reproducible:
Everytime
Steps to Reproduce:
For example: 1. Create a Route with the timeout annotation of `100000000000s` 2. Review the Router logs 3. Attempt to access the Route (or any newly created Routes without the annotation). Response will be HTTP 503
Actual results:
Response will be HTTP 503
Expected results:
A valid HTTP response (2XX)
Additional info:
----- There's this PR which appears to have attempted to address this issue: https://github.com/openshift/router/pull/196/files https://issues.redhat.com/browse/OCPBUGSM-10016 However I have tested this in OCP 4.10, and the issue is still present. Tagged this BZ as an OpenShift APIServer issue as it might be desirable to validate the Route as it's added. This also provides the ability to give feedback by rejecting the Route or creating Event objects for why it's failing. (as there's no CRD to set validations on for Routes) ------ After reviewing the code, the issue with the current fix implementation looks like it's here: https://github.com/openshift/router/blob/master/pkg/router/template/template_helper.go/#L334-L338 The `ParseDuration` function caps out at 290 years and then the function fails-open. This function is just for truncating the values, but there should be feedback provided to Route creator without having to check the logs. A lot of times the Route creator might not have access to the `openshift-ingress` Namespace. It looks like this might be a good start here: https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/deployment.go/#L184-L209 Might be worth putting the validation in the IngressControllerOperator?
- is cloned by
-
OCPBUGS-30773 [4.14 Backport] - Route 'haproxy.router.openshift.io/timeout' value is not validated
- Closed
- is depended on by
-
OCPBUGS-30773 [4.14 Backport] - Route 'haproxy.router.openshift.io/timeout' value is not validated
- Closed
- relates to
-
OCPBUGS-38078 Abnormal values for 'router.openshift.io/haproxy.health.check.interval' annotation breaks the router-default pods
- POST
-
OCPBUGS-15477 Update cluster-ingress-operator clipHAProxyTimeout
- Closed
- links to
-
RHEA-2023:7198 rpm