Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: 4.20.0
Component/s: Networking / router
Labels:
- ne-triaged

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    The router pod will restart or reload under heavy traffic(send many http requests, but on the server side, drop the http 200 ok message, there will many alive http connections)

Version-Release number of selected component (if applicable):

    4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest

How reproducible:

    100%

Steps to Reproduce:

1.
% oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest   True        False         46m     Cluster version is 4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest    

2. update the .spec.relicas of the default ingresscontroller from 2 to 1

% oc -n openshift-ingress get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-cbdc59d54-cdkrq   1/1     Running   0          49m

% oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=10
I1211 11:57:48.287077       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.287949       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.288257       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.288764       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289089       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289384       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289754       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290149       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290529       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.333172       1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
% 
 

3. Create two deployments, a un-secure service and a route in the default namespace
% oc create -f server-deployment.yaml 
deployment.apps/appach-server created
 % oc create -f perf-tool-deployment.yaml 
deployment.apps/perf-tool created
 % oc create -f unsvc-apach.json
service/unsec-apach created
% oc expose svc unsec-apach
route.route.openshift.io/unsec-apach exposed

% cat server-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: appach-server
  name: appach-server
spec:
  replicas: 10
  selector:
    matchLabels:
      name: appach-server
  strategy: {}
  template:
    metadata:
      labels:
        name: appach-server
    spec:
      containers:
      - image: quay.io/shudili/new-server
        name: appach-server
        securityContext:
          privileged: true
        lifecycle:
          postStart:
            exec:
              command: ["sh", "-c", "service httpd start; sleep 1"]
%

% cat perf-tool-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    name: perf-tool
  name: perf-tool
spec:
  replicas: 10
  selector:
    matchLabels:
      name: perf-tool
  strategy: {}
  template:
    metadata:
      labels:
        name: perf-tool
    spec:
      containers:
      - image: quay.io/shudili/perf-tool
        name: perf-tool
        command: ["/bin/bash", "-ce", "tail -f /dev/null"]
        securityContext:
          privileged: true
% 

 % cat unsvc-apach.json
{
    "kind": "Service",
    "apiVersion": "v1",
    "metadata": {
        "name": "unsec-apach"
    },
    "spec": {
        "ports": [
                {
                        "name": "unsec-apach",
                        "protocol": "TCP",
                        "port": 28080,
        "targetPort": 8080
                }
        ],
        "selector": {
                "name": "appach-server"
        }
    }
}


% 

4. oc get pods
%oc get pods
NAME                             READY   STATUS    RESTARTS   AGE
appach-server-79bf599dfb-2m958   1/1     Running   0          2m32s
appach-server-79bf599dfb-5g9h8   1/1     Running   0          2m32s
appach-server-79bf599dfb-6z7m9   1/1     Running   0          2m32s
appach-server-79bf599dfb-88p2v   1/1     Running   0          2m32s
appach-server-79bf599dfb-fkzd5   1/1     Running   0          2m32s
appach-server-79bf599dfb-gskmw   1/1     Running   0          2m32s
appach-server-79bf599dfb-kfgr8   1/1     Running   0          2m32s
appach-server-79bf599dfb-m5stq   1/1     Running   0          2m32s
appach-server-79bf599dfb-np9l4   1/1     Running   0          2m32s
appach-server-79bf599dfb-tkffn   1/1     Running   0          2m32s
perf-tool-555cd469d7-7s2kv       1/1     Running   0          2m15s
perf-tool-555cd469d7-b2pfl       1/1     Running   0          2m15s
perf-tool-555cd469d7-bln5m       1/1     Running   0          2m15s
perf-tool-555cd469d7-drx9j       1/1     Running   0          2m15s
perf-tool-555cd469d7-h7kk8       1/1     Running   0          2m16s
perf-tool-555cd469d7-l6svr       1/1     Running   0          2m16s
perf-tool-555cd469d7-pjx54       1/1     Running   0          2m15s
perf-tool-555cd469d7-sgt69       1/1     Running   0          2m15s
perf-tool-555cd469d7-st754       1/1     Running   0          2m16s
perf-tool-555cd469d7-vq87w       1/1     Running   0          2m15s
%     

5. oc get route
% oc get route
NAME          HOST/PORT                                                             PATH   SERVICES      PORT          TERMINATION   WILDCARD
unsec-apach   unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org          unsec-apach   unsec-apach                 None

6. oc rsh to the 10 appach-server pods, and restart the http service
% oc rsh appach-server-79bf599dfb-2m958
sh-4.4# 
sh-4.4# service httpd restart

7. oc rsh to the 10 perf-tool pods, and use hey to send http requests in one pod
% oc rsh perf-tool-555cd469d7-7s2kv 
sh-4.4# hey -n 20 -c 20 http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org


Summary:
  Total:	0.0208 secs
  Slowest:	0.0206 secs
  Fastest:	0.0176 secs
  Average:	0.0191 secs
  Requests/sec:	961.1960
  
  Total data:	280 bytes
  Size/request:	14 bytes


Response time histogram:
  0.018 [1]	|■■■■■■■■■■
  0.018 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.018 [0]	|
  0.019 [0]	|
  0.019 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.019 [0]	|
  0.019 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.020 [2]	|■■■■■■■■■■■■■■■■■■■■
  0.020 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.020 [0]	|
  0.021 [2]	|■■■■■■■■■■■■■■■■■■■■




Latency distribution:
  10% in 0.0177 secs
  25% in 0.0186 secs
  50% in 0.0193 secs
  75% in 0.0198 secs
  90% in 0.0205 secs
  95% in 0.0206 secs
  0% in 0.0000 secs


Details (average, fastest, slowest):
  DNS+dialup:	0.0155 secs, 0.0176 secs, 0.0206 secs
  DNS-lookup:	0.0115 secs, 0.0110 secs, 0.0123 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0002 secs
  resp wait:	0.0034 secs, 0.0020 secs, 0.0046 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0002 secs


Status code distribution:
  [200]	20 responses

8. in the 10 appach-server pods, use the iptables to drop the outgoing http 200 ok message
 % oc rsh appach-server-79bf599dfb-2m958
sh-4.4#  
sh-4.4# cat /var/www/html/index.html 
It is a test!
sh-4.4# 
sh-4.4# iptables -A OUTPUT -p tcp  -m string --algo bm  --string "test" -j DROP
sh-4.4#

9. in the 10 perf-tool pods, use hey to send many http requests
sh-4.4# hey -n 50000 -c 30000 http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org


Summary:
  Total:	40.2897 secs
  Slowest:	23.7704 secs
  Fastest:	12.3062 secs
  Average:	20.7963 secs
  Requests/sec:	744.6078
  
  Total data:	222970 bytes
  Size/request:	110 bytes


Response time histogram:
  12.306 [1]	|
  13.453 [125]	|■■■■
  14.599 [0]	|
  15.745 [0]	|
  16.892 [0]	|
  18.038 [0]	|
  19.185 [0]	|
  20.331 [0]	|
  21.478 [1330]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  22.624 [538]	|■■■■■■■■■■■■■■■■
  23.770 [33]	|■




Latency distribution:
  10% in 20.8619 secs
  25% in 21.0100 secs
  50% in 21.3103 secs
  75% in 21.5177 secs
  90% in 21.6391 secs
  95% in 21.7478 secs
  99% in 23.2159 secs


Details (average, fastest, slowest):
  DNS+dialup:	14.8466 secs, 12.3062 secs, 23.7704 secs
  DNS-lookup:	0.5490 secs, 0.0287 secs, 3.7908 secs
  req write:	0.4469 secs, 0.0000 secs, 16.3936 secs
  resp wait:	20.3368 secs, 0.0033 secs, 23.0600 secs
  resp read:	0.0143 secs, 0.0000 secs, 2.2631 secs


Status code distribution:
  [408]	2027 responses


Error distribution:
  [10]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": EOF
  [25434]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  [2507]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": dial tcp 136.119.157.1:80: i/o timeout (Client.Timeout exceeded while awaiting headers)
  [4]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": http: server closed idle connection
  [17]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": http: server closed idle connection (Client.Timeout exceeded while awaiting headers)
  [1]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": read tcp 10.131.0.26:46255->136.119.157.1:80: read: connection reset by peer (Client.Timeout exceeded while awaiting headers)


sh-4.4#

10. the router pod restarted
% oc -n openshift-ingress get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-cbdc59d54-cdkrq   1/1     Running   0          56m
% oc -n openshift-ingress get pods
NAME                             READY   STATUS    RESTARTS     AGE
router-default-cbdc59d54-cdkrq   1/1     Running   1 (3s ago)   56m
% 

11. check the router pod logs
% oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=20 -p 
I1211 11:57:48.288764       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289089       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289384       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.289754       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290149       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.290529       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 11:57:48.333172       1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1211 12:12:47.274487       1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:12:48.351573       1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:12:57.274953       1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:12:58.353417       1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:13:07.294512       1 healthz.go:255] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1211 12:13:07.730274       1 template.go:841] "msg"="Shutdown requested, waiting 45s for new connections to cease" "logger"="router"
I1211 12:13:16.329057       1 healthz.go:255] process-running check failed: healthz
[-]process-running failed: process is terminating
%

% oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=30
I1211 12:13:19.742142       1 template.go:560] "msg"="starting router" "logger"="router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: db8d384266051ef06b67883aaa83674bc6c9f1ae\nversionFromGit: 4.0.0-583-gdb8d3842\ngitTreeState: clean\nbuildDate: 2025-12-08T23:47:14Z\n"
I1211 12:13:19.744647       1 metrics.go:156] "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" "logger"="metrics"
I1211 12:13:19.749055       1 router.go:214] "msg"="creating a new template router" "logger"="template" "writeDir"="/var/lib/haproxy"
I1211 12:13:19.749147       1 router.go:298] "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" "logger"="template"
I1211 12:13:19.749628       1 router.go:368] "msg"="watching for changes" "logger"="template" "path"="/etc/pki/tls/private"
I1211 12:13:19.749698       1 router.go:283] "msg"="router is including routes in all namespaces" "logger"="router"
I1211 12:13:19.765386       1 reflector.go:359] Caches populated for *v1.EndpointSlice from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
I1211 12:13:19.767727       1 reflector.go:359] Caches populated for *v1.Service from github.com/openshift/router/pkg/router/template/service_lookup.go:33
I1211 12:13:19.771506       1 reflector.go:359] Caches populated for *v1.Route from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
I1211 12:13:19.857273       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.858058       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.858508       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.858955       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.859235       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.859489       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.859791       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.860080       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
I1211 12:13:19.860429       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
E1211 12:13:19.860855       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1211 12:13:19.899666       1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
%

Actual results:

    the router pod restarted

Expected results:

    the router pod didn't restart or reload

Additional info:

depends on

OCPBUGS-67161 Router pods restart upon hitting maxconn

POST

Assignee:: NID Team Bot

Reporter:: Shudi Li

QA Contact:: Shudi Li

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/12/11 12:50 PM

Updated:: 2025/12/18 10:53 PM

Resolved:: 2025/12/18 10:53 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates