Resolution: Done-Errata
4.13, 4.12, 4.11, 4.10, 4.9, 4.14, 4.15, 4.16
Description of problem
As a cluster-admin, I want to configure the router default connect timeout, so that HAProxy doesn't time out when connecting to a slow backend server, or so that I can set a shorter timeout to mitigate DoS attacks. This setting is controlled by the ROUTER_DEFAULT_CONNECT_TIMEOUT environment variable on the router deployment, which OpenShift router uses to set HAProxy's timeout connect setting.
This issue is being tracked as a defect rather than a new feature request because configurability of ROUTER_DEFAULT_CONNECT_TIMEOUT / timeout connect was an explicit requirement for RFE-403 (which was accepted) and NE-412 (which was completed and closed), but somehow the option got dropped during the design and implementation of the feature. This should have been caught during review of the pull request for the enhancement proposal, and the EP should have either included the option or justified its exclusion. As it stands, the feature is incomplete, and therefore this is a defect, not an RFE.
Version-Release number of selected component (if applicable)
4.16 through 4.9. The feature that was supposed to include this option shipped in OCP 4.9.
How reproducible
Steps to Reproduce
1. Check the supported router tuning options: oc explain IngressController.spec.tuningOptions
Actual results
There is no option to configure the server connect timeout:
% oc explain IngressController.spec.tuningOptions KIND: IngressController VERSION: operator.openshift.io/v1 RESOURCE: tuningOptions <Object> DESCRIPTION: tuningOptions defines parameters for adjusting the performance of ingress controller pods. All fields are optional and will use their respective defaults if not set. See specific tuningOptions fields for more details. Setting fields within tuningOptions is generally not recommended. The default values are suitable for most configurations. FIELDS: clientFinTimeout <string> clientFinTimeout defines how long a connection will be held open while waiting for the client response to the server/backend closing the connection. If unset, the default timeout is 1s clientTimeout <string> clientTimeout defines how long a connection will be held open while waiting for a client response. If unset, the default timeout is 30s headerBufferBytes <integer> headerBufferBytes describes how much memory should be reserved (in bytes) for IngressController connection sessions. Note that this value must be at least 16384 if HTTP/2 is enabled for the IngressController (https://tools.ietf.org/html/rfc7540). If this field is empty, the IngressController will use a default value of 32768 bytes. Setting this field is generally not recommended as headerBufferBytes values that are too small may break the IngressController and headerBufferBytes values that are too large could cause the IngressController to use significantly more memory than necessary. headerBufferMaxRewriteBytes <integer> headerBufferMaxRewriteBytes describes how much memory should be reserved (in bytes) from headerBufferBytes for HTTP header rewriting and appending for IngressController connection sessions. Note that incoming HTTP requests will be limited to (headerBufferBytes - headerBufferMaxRewriteBytes) bytes, meaning headerBufferBytes must be greater than headerBufferMaxRewriteBytes. If this field is empty, the IngressController will use a default value of 8192 bytes. Setting this field is generally not recommended as headerBufferMaxRewriteBytes values that are too small may break the IngressController and headerBufferMaxRewriteBytes values that are too large could cause the IngressController to use significantly more memory than necessary. healthCheckInterval <string> healthCheckInterval defines how long the router waits between two consecutive health checks on its configured backends. This value is applied globally as a default for all routes, but may be overridden per-route by the route annotation "router.openshift.io/haproxy.health.check.interval". Expects an unsigned duration string of decimal numbers, each with optional fraction and a unit suffix, eg "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs" U+00B5 or "μs" U+03BC), "ms", "s", "m", "h". Setting this to less than 5s can cause excess traffic due to too frequent TCP health checks and accompanying SYN packet storms. Alternatively, setting this too high can result in increased latency, due to backend servers that are no longer available, but haven't yet been detected as such. An empty or zero healthCheckInterval means no opinion and IngressController chooses a default, which is subject to change over time. Currently the default healthCheckInterval value is 5s. Currently the minimum allowed value is 1s and the maximum allowed value is 2147483647ms (24.85 days). Both are subject to change over time. maxConnections <integer> maxConnections defines the maximum number of simultaneous connections that can be established per HAProxy process. Increasing this value allows each ingress controller pod to handle more connections but at the cost of additional system resources being consumed. Permitted values are: empty, 0, -1, and the range 2000-2000000. If this field is empty or 0, the IngressController will use the default value of 50000, but the default is subject to change in future releases. If the value is -1 then HAProxy will dynamically compute a maximum value based on the available ulimits in the running container. Selecting -1 (i.e., auto) will result in a large value being computed (~520000 on OpenShift >=4.10 clusters) and therefore each HAProxy process will incur significant memory usage compared to the current default of 50000. Setting a value that is greater than the current operating system limit will prevent the HAProxy process from starting. If you choose a discrete value (e.g., 750000) and the router pod is migrated to a new node, there's no guarantee that that new node has identical ulimits configured. In such a scenario the pod would fail to start. If you have nodes with different ulimits configured (e.g., different tuned profiles) and you choose a discrete value then the guidance is to use -1 and let the value be computed dynamically at runtime. You can monitor memory usage for router containers with the following metric: 'container_memory_working_set_bytes{container="router",namespace="openshift-ingress"}'. You can monitor memory usage of individual HAProxy processes in router containers with the following metric: 'container_memory_working_set_bytes{container="router",namespace="openshift-ingress"}/container_processes{container="router",namespace="openshift-ingress"}'. reloadInterval <string> reloadInterval defines the minimum interval at which the router is allowed to reload to accept new changes. Increasing this value can prevent the accumulation of HAProxy processes, depending on the scenario. Increasing this interval can also lessen load imbalance on a backend's servers when using the roundrobin balancing algorithm. Alternatively, decreasing this value may decrease latency since updates to HAProxy's configuration can take effect more quickly. The value must be a time duration value; see <https://pkg.go.dev/time#ParseDuration>. Currently, the minimum value allowed is 1s, and the maximum allowed value is 120s. Minimum and maximum allowed values may change in future versions of OpenShift. Note that if a duration outside of these bounds is provided, the value of reloadInterval will be capped/floored and not rejected (e.g. a duration of over 120s will be capped to 120s; the IngressController will not reject and replace this disallowed value with the default). A zero value for reloadInterval tells the IngressController to choose the default, which is currently 5s and subject to change without notice. This field expects an unsigned duration string of decimal numbers, each with optional fraction and a unit suffix, e.g. "300ms", "1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs" U+00B5 or "μs" U+03BC), "ms", "s", "m", "h". Note: Setting a value significantly larger than the default of 5s can cause latency in observing updates to routes and their endpoints. HAProxy's configuration will be reloaded less frequently, and newly created routes will not be served until the subsequent reload. serverFinTimeout <string> serverFinTimeout defines how long a connection will be held open while waiting for the server/backend response to the client closing the connection. If unset, the default timeout is 1s serverTimeout <string> serverTimeout defines how long a connection will be held open while waiting for a server/backend response. If unset, the default timeout is 30s threadCount <integer> threadCount defines the number of threads created per HAProxy process. Creating more threads allows each ingress controller pod to handle more connections, at the cost of more system resources being used. HAProxy currently supports up to 64 threads. If this field is empty, the IngressController will use the default value. The current default is 4 threads, but this may change in future releases. Setting this field is generally not recommended. Increasing the number of HAProxy threads allows ingress controller pods to utilize more CPU time under load, potentially starving other pods if set too high. Reducing the number of threads may cause the ingress controller to perform poorly. tlsInspectDelay <string> tlsInspectDelay defines how long the router can hold data to find a matching route. Setting this too short can cause the router to fall back to the default certificate for edge-terminated or reencrypt routes even when a better matching certificate could be used. If unset, the default inspect delay is 5s tunnelTimeout <string> tunnelTimeout defines how long a tunnel connection (including websockets) will be held open while the tunnel is idle. If unset, the default timeout is 1h % oc version Client Version: 4.13.0-0.ci-2022-11-11-144318 Kustomize Version: v4.5.7 Server Version: 4.12.0-0.nightly-2024-02-06-121927 Kubernetes Version: v1.25.16+6df2177 %
Expected results
spec.tuningOptions should have an option to configure the connect timeout. For example:
connectTimeout <string> connectTimeout defines how long the router will wait for a response when establishing a connection to a backend server. If unset, the default timeout is 5s.
- blocks
OCPBUGS-36555 [release-4.14] Allow configuring router default connect timeout on an ingress controller
- Closed
- is cloned by
OCPBUGS-36555 [release-4.14] Allow configuring router default connect timeout on an ingress controller
- Closed
- links to
RHSA-2024:4699 OpenShift Container Platform 4.15.z security update