Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-67219

the router pod restarted in the stress traffic test

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 4.20.0
    • Networking / router
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          The router pod will restart or reload under heavy traffic(send many http requests, but on the server side, drop the http 200 ok message, there will many alive http connections)

      Version-Release number of selected component (if applicable):

          4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest

      How reproducible:

          100%

      Steps to Reproduce:

      1.
      % oc get clusterversion
      NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest   True        False         46m     Cluster version is 4.20.0-0-2025-12-11-105319-test-ci-ln-92w7c4b-latest    
      
      2. update the .spec.relicas of the default ingresscontroller from 2 to 1
      
      % oc -n openshift-ingress get pods
      NAME                             READY   STATUS    RESTARTS   AGE
      router-default-cbdc59d54-cdkrq   1/1     Running   0          49m
      
      % oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=10
      I1211 11:57:48.287077       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.287949       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.288257       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.288764       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.289089       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.289384       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.289754       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.290149       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.290529       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.333172       1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
      % 
       
      
      3. Create two deployments, a un-secure service and a route in the default namespace
      % oc create -f server-deployment.yaml 
      deployment.apps/appach-server created
       % oc create -f perf-tool-deployment.yaml 
      deployment.apps/perf-tool created
       % oc create -f unsvc-apach.json
      service/unsec-apach created
      % oc expose svc unsec-apach
      route.route.openshift.io/unsec-apach exposed
      
      % cat server-deployment.yaml 
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          name: appach-server
        name: appach-server
      spec:
        replicas: 10
        selector:
          matchLabels:
            name: appach-server
        strategy: {}
        template:
          metadata:
            labels:
              name: appach-server
          spec:
            containers:
            - image: quay.io/shudili/new-server
              name: appach-server
              securityContext:
                privileged: true
              lifecycle:
                postStart:
                  exec:
                    command: ["sh", "-c", "service httpd start; sleep 1"]
      %
      
      % cat perf-tool-deployment.yaml 
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          name: perf-tool
        name: perf-tool
      spec:
        replicas: 10
        selector:
          matchLabels:
            name: perf-tool
        strategy: {}
        template:
          metadata:
            labels:
              name: perf-tool
          spec:
            containers:
            - image: quay.io/shudili/perf-tool
              name: perf-tool
              command: ["/bin/bash", "-ce", "tail -f /dev/null"]
              securityContext:
                privileged: true
      % 
      
       % cat unsvc-apach.json
      {
          "kind": "Service",
          "apiVersion": "v1",
          "metadata": {
              "name": "unsec-apach"
          },
          "spec": {
              "ports": [
                      {
                              "name": "unsec-apach",
                              "protocol": "TCP",
                              "port": 28080,
              "targetPort": 8080
                      }
              ],
              "selector": {
                      "name": "appach-server"
              }
          }
      }
      
      
      % 
      
      4. oc get pods
      %oc get pods
      NAME                             READY   STATUS    RESTARTS   AGE
      appach-server-79bf599dfb-2m958   1/1     Running   0          2m32s
      appach-server-79bf599dfb-5g9h8   1/1     Running   0          2m32s
      appach-server-79bf599dfb-6z7m9   1/1     Running   0          2m32s
      appach-server-79bf599dfb-88p2v   1/1     Running   0          2m32s
      appach-server-79bf599dfb-fkzd5   1/1     Running   0          2m32s
      appach-server-79bf599dfb-gskmw   1/1     Running   0          2m32s
      appach-server-79bf599dfb-kfgr8   1/1     Running   0          2m32s
      appach-server-79bf599dfb-m5stq   1/1     Running   0          2m32s
      appach-server-79bf599dfb-np9l4   1/1     Running   0          2m32s
      appach-server-79bf599dfb-tkffn   1/1     Running   0          2m32s
      perf-tool-555cd469d7-7s2kv       1/1     Running   0          2m15s
      perf-tool-555cd469d7-b2pfl       1/1     Running   0          2m15s
      perf-tool-555cd469d7-bln5m       1/1     Running   0          2m15s
      perf-tool-555cd469d7-drx9j       1/1     Running   0          2m15s
      perf-tool-555cd469d7-h7kk8       1/1     Running   0          2m16s
      perf-tool-555cd469d7-l6svr       1/1     Running   0          2m16s
      perf-tool-555cd469d7-pjx54       1/1     Running   0          2m15s
      perf-tool-555cd469d7-sgt69       1/1     Running   0          2m15s
      perf-tool-555cd469d7-st754       1/1     Running   0          2m16s
      perf-tool-555cd469d7-vq87w       1/1     Running   0          2m15s
      %     
      
      5. oc get route
      % oc get route
      NAME          HOST/PORT                                                             PATH   SERVICES      PORT          TERMINATION   WILDCARD
      unsec-apach   unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org          unsec-apach   unsec-apach                 None
      
      6. oc rsh to the 10 appach-server pods, and restart the http service
      % oc rsh appach-server-79bf599dfb-2m958
      sh-4.4# 
      sh-4.4# service httpd restart
      
      7. oc rsh to the 10 perf-tool pods, and use hey to send http requests in one pod
      % oc rsh perf-tool-555cd469d7-7s2kv 
      sh-4.4# hey -n 20 -c 20 http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org
      
      
      Summary:
        Total:	0.0208 secs
        Slowest:	0.0206 secs
        Fastest:	0.0176 secs
        Average:	0.0191 secs
        Requests/sec:	961.1960
        
        Total data:	280 bytes
        Size/request:	14 bytes
      
      
      Response time histogram:
        0.018 [1]	|■■■■■■■■■■
        0.018 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
        0.018 [0]	|
        0.019 [0]	|
        0.019 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
        0.019 [0]	|
        0.019 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
        0.020 [2]	|■■■■■■■■■■■■■■■■■■■■
        0.020 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
        0.020 [0]	|
        0.021 [2]	|■■■■■■■■■■■■■■■■■■■■
      
      
      
      
      Latency distribution:
        10% in 0.0177 secs
        25% in 0.0186 secs
        50% in 0.0193 secs
        75% in 0.0198 secs
        90% in 0.0205 secs
        95% in 0.0206 secs
        0% in 0.0000 secs
      
      
      Details (average, fastest, slowest):
        DNS+dialup:	0.0155 secs, 0.0176 secs, 0.0206 secs
        DNS-lookup:	0.0115 secs, 0.0110 secs, 0.0123 secs
        req write:	0.0001 secs, 0.0000 secs, 0.0002 secs
        resp wait:	0.0034 secs, 0.0020 secs, 0.0046 secs
        resp read:	0.0001 secs, 0.0000 secs, 0.0002 secs
      
      
      Status code distribution:
        [200]	20 responses
      
      8. in the 10 appach-server pods, use the iptables to drop the outgoing http 200 ok message
       % oc rsh appach-server-79bf599dfb-2m958
      sh-4.4#  
      sh-4.4# cat /var/www/html/index.html 
      It is a test!
      sh-4.4# 
      sh-4.4# iptables -A OUTPUT -p tcp  -m string --algo bm  --string "test" -j DROP
      sh-4.4#
      
      9. in the 10 perf-tool pods, use hey to send many http requests
      sh-4.4# hey -n 50000 -c 30000 http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org
      
      
      Summary:
        Total:	40.2897 secs
        Slowest:	23.7704 secs
        Fastest:	12.3062 secs
        Average:	20.7963 secs
        Requests/sec:	744.6078
        
        Total data:	222970 bytes
        Size/request:	110 bytes
      
      
      Response time histogram:
        12.306 [1]	|
        13.453 [125]	|■■■■
        14.599 [0]	|
        15.745 [0]	|
        16.892 [0]	|
        18.038 [0]	|
        19.185 [0]	|
        20.331 [0]	|
        21.478 [1330]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
        22.624 [538]	|■■■■■■■■■■■■■■■■
        23.770 [33]	|■
      
      
      
      
      Latency distribution:
        10% in 20.8619 secs
        25% in 21.0100 secs
        50% in 21.3103 secs
        75% in 21.5177 secs
        90% in 21.6391 secs
        95% in 21.7478 secs
        99% in 23.2159 secs
      
      
      Details (average, fastest, slowest):
        DNS+dialup:	14.8466 secs, 12.3062 secs, 23.7704 secs
        DNS-lookup:	0.5490 secs, 0.0287 secs, 3.7908 secs
        req write:	0.4469 secs, 0.0000 secs, 16.3936 secs
        resp wait:	20.3368 secs, 0.0033 secs, 23.0600 secs
        resp read:	0.0143 secs, 0.0000 secs, 2.2631 secs
      
      
      Status code distribution:
        [408]	2027 responses
      
      
      Error distribution:
        [10]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": EOF
        [25434]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
        [2507]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": dial tcp 136.119.157.1:80: i/o timeout (Client.Timeout exceeded while awaiting headers)
        [4]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": http: server closed idle connection
        [17]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": http: server closed idle connection (Client.Timeout exceeded while awaiting headers)
        [1]	Get "http://unsec-apach-default.apps.ci-ln-92w7c4b-72292.gcp-2.ci.openshift.org": read tcp 10.131.0.26:46255->136.119.157.1:80: read: connection reset by peer (Client.Timeout exceeded while awaiting headers)
      
      
      sh-4.4#
      
      10. the router pod restarted
      % oc -n openshift-ingress get pods
      NAME                             READY   STATUS    RESTARTS   AGE
      router-default-cbdc59d54-cdkrq   1/1     Running   0          56m
      % oc -n openshift-ingress get pods
      NAME                             READY   STATUS    RESTARTS     AGE
      router-default-cbdc59d54-cdkrq   1/1     Running   1 (3s ago)   56m
      % 
      
      11. check the router pod logs
      % oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=20 -p 
      I1211 11:57:48.288764       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.289089       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.289384       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.289754       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.290149       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.290529       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 11:57:48.333172       1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
      I1211 12:12:47.274487       1 healthz.go:255] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1211 12:12:48.351573       1 healthz.go:255] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1211 12:12:57.274953       1 healthz.go:255] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1211 12:12:58.353417       1 healthz.go:255] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1211 12:13:07.294512       1 healthz.go:255] backend-http check failed: healthz
      [-]backend-http failed: backend reported failure
      I1211 12:13:07.730274       1 template.go:841] "msg"="Shutdown requested, waiting 45s for new connections to cease" "logger"="router"
      I1211 12:13:16.329057       1 healthz.go:255] process-running check failed: healthz
      [-]process-running failed: process is terminating
      %
      
      % oc -n openshift-ingress logs router-default-cbdc59d54-cdkrq --tail=30
      I1211 12:13:19.742142       1 template.go:560] "msg"="starting router" "logger"="router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: db8d384266051ef06b67883aaa83674bc6c9f1ae\nversionFromGit: 4.0.0-583-gdb8d3842\ngitTreeState: clean\nbuildDate: 2025-12-08T23:47:14Z\n"
      I1211 12:13:19.744647       1 metrics.go:156] "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" "logger"="metrics"
      I1211 12:13:19.749055       1 router.go:214] "msg"="creating a new template router" "logger"="template" "writeDir"="/var/lib/haproxy"
      I1211 12:13:19.749147       1 router.go:298] "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" "logger"="template"
      I1211 12:13:19.749628       1 router.go:368] "msg"="watching for changes" "logger"="template" "path"="/etc/pki/tls/private"
      I1211 12:13:19.749698       1 router.go:283] "msg"="router is including routes in all namespaces" "logger"="router"
      I1211 12:13:19.765386       1 reflector.go:359] Caches populated for *v1.EndpointSlice from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
      I1211 12:13:19.767727       1 reflector.go:359] Caches populated for *v1.Service from github.com/openshift/router/pkg/router/template/service_lookup.go:33
      I1211 12:13:19.771506       1 reflector.go:359] Caches populated for *v1.Route from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
      I1211 12:13:19.857273       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.858058       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.858508       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.858955       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.859235       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.859489       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.859791       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.860080       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      I1211 12:13:19.860429       1 template_helper.go:370] "msg"="parseIPList empty list found" "logger"="template"
      E1211 12:13:19.860855       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
      I1211 12:13:19.899666       1 router.go:665] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
      %
       

      Actual results:

          the router pod restarted

      Expected results:

          the router pod didn't restart or reload

      Additional info:

          

              nid-team-bot NID Team Bot
              shudili@redhat.com Shudi Li
              None
              None
              Shudi Li Shudi Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: