-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.8
-
None
Description of problem:
No timeout or 2min timeout for services to acknowledge there are no endpoints
How reproducible:
Tried to reproduce the issue on 3.11 OpenShift cluster by scaling to 0 a deployment and the connection request fails for "no route to host" after 3s:
[quicklab@master-0 ~]$ oc project agabriel
Now using project "agabriel" on server "https://openshift.internal.sharedocp311cns.lab.upshift.rdu2.redhat.com:443".
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$ oc get pod
NAME READY STATUS RESTARTS AGE
client-1-c8frb 1/1 Running 3 21m
nginx-container-1-build 0/1 Completed 0 22m
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$ oc get dc
NAME REVISION DESIRED CURRENT TRIGGERED BY
client 1 1 1 config
nginx-container 1 0 0 config,image(nginx-container:latest)
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$ oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-container ClusterIP 172.30.9.177 <none> 8080/TCP,8443/TCP 22m
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$ oc get ep
NAME ENDPOINTS AGE
nginx-container <none> 22m
[quicklab@master-0 ~]$
[quicklab@master-0 ~]$ oc attach client-1-c8frb -c client -i -t
If you don't see a command prompt, try pressing enter.
bash-4.4$
bash-4.4$ time curl 172.30.9.177:8080/ -vvv
- Trying 172.30.9.177...
- TCP_NODELAY set
- connect to 172.30.9.177 port 8080 failed: No route to host
- Failed to connect to 172.30.9.177 port 8080: No route to host
- Closing connection 0
curl: (7) Failed to connect to 172.30.9.177 port 8080: No route to host
real 0m3.102s
user 0m0.002s
sys 0m0.013s
bash-4.4$
bash-4.4$
bash-4.4$ time curl nginx-container:8080/ -vvv
- Trying 172.30.9.177...
- TCP_NODELAY set
- connect to 172.30.9.177 port 8080 failed: No route to host
- Failed to connect to nginx-container port 8080: No route to host
- Closing connection 0
curl: (7) Failed to connect to nginx-container port 8080: No route to host
real 0m3.017s
user 0m0.006s
sys 0m0.005s
bash-4.4$
On and 4.8 OpenShift cluster a service without endpoints is timing out after 2 minutes:
[quicklab@upi-0 ~]$ oc attach client2 -c client2 -i -t
If you don't see a command prompt, try pressing enter.
[root@client2 /]#
[root@client2 /]#
[root@client2 /]#
[root@client2 /]# time curl 172.30.12.178:8080/ -vvv
- Trying 172.30.12.178...
- TCP_NODELAY set
- connect to 172.30.12.178 port 8080 failed: Connection timed out
- Failed to connect to 172.30.12.178 port 8080: Connection timed out
- Closing connection 0
curl: (7) Failed to connect to 172.30.12.178 port 8080: Connection timed out
real 2m11.674s
user 0m0.016s
sys 0m0.027s
[root@client2 /]#
while accessing directly an OpenShift 4.8 nodes (CoreOS) and pointing the same service without endpoints it's timed out after 2 secs:
sh-4.4# time curl 172.30.12.178:8080/ -vvv
- Trying 172.30.12.178...
- TCP_NODELAY set
- connect to 172.30.12.178 port 8080 failed: Connection refused
- Failed to connect to 172.30.12.178 port 8080: Connection refused
- Closing connection 0
curl: (7) Failed to connect to 172.30.12.178 port 8080: Connection refused
real 0m2.119s
user 0m0.007s
sys 0m0.014s
sh-4.4#
A REJECT rule is present on CoreOS iptables but it's working only from Node to Service connection (NOT from Pod to Service connection):
sh-4.4# iptables -L KUBE-SERVICES -v -n --line-number | grep agabriel
6 1 60 REJECT tcp – * * 0.0.0.0/0 172.30.12.178 /* agabriel/nginx-container:8080-tcp has no endpoints */ tcp dpt:8080 reject-with icmp-port-unreachable
7 0 0 REJECT tcp – * * 0.0.0.0/0 172.30.12.178 /* agabriel/nginx-container:8443-tcp has no endpoints */ tcp dpt:8443 reject-with icmp-port-unreachable