-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
4.20.0
-
None
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The router pod will restart or reload under heavy traffic(send many http requests, but on the server side, drop the http 200 ok message, there will many alive http connections)
Version-Release number of selected component (if applicable):
4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest
How reproducible:
100%
Steps to Reproduce:
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest True False 46m Cluster version is 4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest
2. update the .spec.relicas of the default ingresscontroller from 2 to 1
% oc -n openshift-ingress get pods
NAME READY STATUS RESTARTS AGE
router-default-cbdc59d54-cdkrq 1/1 Running 0 49m
% oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=10
I1211 11:57:48.287077 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.287949 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.288257 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.288764 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289089 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289384 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289754 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290149 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290529 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.333172 1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
%
3. Create two deployments, a un-secure service and a route in the default namespace
% oc create -f server-deployment.yaml
deployment.apps/appach-server created
% oc create -f perf-tool-deployment.yaml
deployment.apps/perf-tool created
% oc create -f unsvc-apach.json
service/unsec-apach created
% oc expose svc unsec-apach
route.route.openshift.io/unsec-apach exposed
% cat server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: appach-server
name: appach-server
spec:
replicas: 10
selector:
matchLabels:
name: appach-server
strategy: {}
template:
metadata:
labels:
name: appach-server
spec:
containers:
- image: quay.io/shudili/new-server
name: appach-server
securityContext:
privileged: true
lifecycle:
postStart:
exec:
command: ["sh", "-c", "service httpd start; sleep 1"]
%
% cat perf-tool-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
name: perf-tool
name: perf-tool
spec:
replicas: 10
selector:
matchLabels:
name: perf-tool
strategy: {}
template:
metadata:
labels:
name: perf-tool
spec:
containers:
- image: quay.io/shudili/perf-tool
name: perf-tool
command: ["/bin/bash", "-ce", "tail -f /dev/null"]
securityContext:
privileged: true
%
% cat unsvc-apach.json
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "unsec-apach"
},
"spec": {
"ports": [
{
"name": "unsec-apach",
"protocol": "TCP",
"port": 28080,
"targetPort": 8080
}
],
"selector": {
"name": "appach-server"
}
}
}
%
4. oc get pods
%oc get pods
NAME READY STATUS RESTARTS AGE
appach-server-79bf599dfb-2m958 1/1 Running 0 2m32s
appach-server-79bf599dfb-5g9h8 1/1 Running 0 2m32s
appach-server-79bf599dfb-6z7m9 1/1 Running 0 2m32s
appach-server-79bf599dfb-88p2v 1/1 Running 0 2m32s
appach-server-79bf599dfb-fkzd5 1/1 Running 0 2m32s
appach-server-79bf599dfb-gskmw 1/1 Running 0 2m32s
appach-server-79bf599dfb-kfgr8 1/1 Running 0 2m32s
appach-server-79bf599dfb-m5stq 1/1 Running 0 2m32s
appach-server-79bf599dfb-np9l4 1/1 Running 0 2m32s
appach-server-79bf599dfb-tkffn 1/1 Running 0 2m32s
perf-tool-555cd469d7-7s2kv 1/1 Running 0 2m15s
perf-tool-555cd469d7-b2pfl 1/1 Running 0 2m15s
perf-tool-555cd469d7-bln5m 1/1 Running 0 2m15s
perf-tool-555cd469d7-drx9j 1/1 Running 0 2m15s
perf-tool-555cd469d7-h7kk8 1/1 Running 0 2m16s
perf-tool-555cd469d7-l6svr 1/1 Running 0 2m16s
perf-tool-555cd469d7-pjx54 1/1 Running 0 2m15s
perf-tool-555cd469d7-sgt69 1/1 Running 0 2m15s
perf-tool-555cd469d7-st754 1/1 Running 0 2m16s
perf-tool-555cd469d7-vq87w 1/1 Running 0 2m15s
%
5. oc get route
% oc get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
unsec-apach unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org unsec-apach unsec-apach None
6. oc rsh to the 10 appach-server pods, and restart the http service
% oc rsh appach-server-79bf599dfb-2m958
sh-4.4#
sh-4.4# service httpd restart
7. oc rsh to the 10 perf-tool pods, and use hey to send http requests in one pod
% oc rsh perf-tool-555cd469d7-7s2kv
sh-4.4# hey -n 20 -c 20 http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org
Summary:
Total: 0.0208 secs
Slowest: 0.0206 secs
Fastest: 0.0176 secs
Average: 0.0191 secs
Requests/sec: 961.1960
Total data: 280 bytes
Size/request: 14 bytes
Response time histogram:
0.018 [1] |■■■■■■■■■■
0.018 [3] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.018 [0] |
0.019 [0] |
0.019 [4] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.019 [0] |
0.019 [4] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.020 [2] |■■■■■■■■■■■■■■■■■■■■
0.020 [4] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.020 [0] |
0.021 [2] |■■■■■■■■■■■■■■■■■■■■
Latency distribution:
10% in 0.0177 secs
25% in 0.0186 secs
50% in 0.0193 secs
75% in 0.0198 secs
90% in 0.0205 secs
95% in 0.0206 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0155 secs, 0.0176 secs, 0.0206 secs
DNS-lookup: 0.0115 secs, 0.0110 secs, 0.0123 secs
req write: 0.0001 secs, 0.0000 secs, 0.0002 secs
resp wait: 0.0034 secs, 0.0020 secs, 0.0046 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0002 secs
Status code distribution:
[200] 20 responses
8. in the 10 appach-server pods, use the iptables to drop the outgoing http 200 ok message
% oc rsh appach-server-79bf599dfb-2m958
sh-4.4#
sh-4.4# cat /var/www/html/index.html
It is a test!
sh-4.4#
sh-4.4# iptables -A OUTPUT -p tcp -m string --algo bm --string "test" -j DROP
sh-4.4#
9. in the 10 perf-tool pods, use hey to send many http requests
sh-4.4# hey -n 50000 -c 30000 http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org
Summary:
Total: 40.2897 secs
Slowest: 23.7704 secs
Fastest: 12.3062 secs
Average: 20.7963 secs
Requests/sec: 744.6078
Total data: 222970 bytes
Size/request: 110 bytes
Response time histogram:
12.306 [1] |
13.453 [125] |■■■■
14.599 [0] |
15.745 [0] |
16.892 [0] |
18.038 [0] |
19.185 [0] |
20.331 [0] |
21.478 [1330] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
22.624 [538] |■■■■■■■■■■■■■■■■
23.770 [33] |■
Latency distribution:
10% in 20.8619 secs
25% in 21.0100 secs
50% in 21.3103 secs
75% in 21.5177 secs
90% in 21.6391 secs
95% in 21.7478 secs
99% in 23.2159 secs
Details (average, fastest, slowest):
DNS+dialup: 14.8466 secs, 12.3062 secs, 23.7704 secs
DNS-lookup: 0.5490 secs, 0.0287 secs, 3.7908 secs
req write: 0.4469 secs, 0.0000 secs, 16.3936 secs
resp wait: 20.3368 secs, 0.0033 secs, 23.0600 secs
resp read: 0.0143 secs, 0.0000 secs, 2.2631 secs
Status code distribution:
[408] 2027 responses
Error distribution:
[10] Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": EOF
[25434] Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
[2507] Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": dial tcp 136.119.157.1:80: i/o timeout (Client.Timeout exceeded while awaiting headers)
[4] Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": http: server closed idle connection
[17] Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": http: server closed idle connection (Client.Timeout exceeded while awaiting headers)
[1] Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": read tcp 10.131.0.26:46255->136.119.157.1:80: read: connection reset by peer (Client.Timeout exceeded while awaiting headers)
sh-4.4#
10. the router pod restarted
% oc -n openshift-ingress get pods
NAME READY STATUS RESTARTS AGE
router-default-cbdc59d54-cdkrq 1/1 Running 0 56m
% oc -n openshift-ingress get pods
NAME READY STATUS RESTARTS AGE
router-default-cbdc59d54-cdkrq 1/1 Running 1 (3s ago) 56m
%
11. check the router pod logs
% oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=20 -p
I1211 11:57:48.288764 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289089 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289384 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289754 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290149 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290529 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.333172 1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1211 12:12:47.274487 1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:12:48.351573 1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:12:57.274953 1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:12:58.353417 1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:13:07.294512 1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:13:07.730274 1 template.go:841] "msg"="Shutdown requested, waiting 45s for new connections to cease" "logger"="router"
I1211 12:13:16.329057 1 healthz.go:255] process-running check failed: healthz
[-]process-running failed: process is terminating
%
% oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=30
I1211 12:13:19.742142 1 template.go:560] "msg"="starting router" "logger"="router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: db8d384266051ef06b67883aaa83674bc6c9f1ae\nversionFromGit: 4.0.0-583-gdb8d3842\ngitTreeState: clean\nbuildDate: 2025-12-08T23:47:14Z\n"
I1211 12:13:19.744647 1 metrics.go:156] "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" "logger"="metrics"
I1211 12:13:19.749055 1 router.go:214] "msg"="creating a new template router" "logger"="template" "writeDir"="/var/lib/haproxy"
I1211 12:13:19.749147 1 router.go:298] "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" "logger"="template"
I1211 12:13:19.749628 1 router.go:368] "msg"="watching for changes" "logger"="template" "path"="/etc/pki/tls/private"
I1211 12:13:19.749698 1 router.go:283] "msg"="router is including routes in all namespaces" "logger"="router"
I1211 12:13:19.765386 1 reflector.go:359] Caches populated for *v1.EndpointSlice from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
I1211 12:13:19.767727 1 reflector.go:359] Caches populated for *v1.Service from github.com/openshift/router/pkg/router/template/service_lookup.go:33
I1211 12:13:19.771506 1 reflector.go:359] Caches populated for *v1.Route from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
I1211 12:13:19.857273 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.858058 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.858508 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.858955 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.859235 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.859489 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.859791 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.860080 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.860429 1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
E1211 12:13:19.860855 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1211 12:13:19.899666 1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
%
Actual results:
the router pod restarted
Expected results:
the router pod didn't restart or reload
Additional info:
- depends on
-
OCPBUGS-67161 Router pods restart upon hitting maxconn
-
- New
-