-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.12
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
Rejected
-
ShiftStack Sprint 231, ShiftStack Sprint 232, ShiftStack Sprint 233, ShiftStack Sprint 234
-
4
-
+
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The LB member health monitor sets all the OCP nodes as ONLINE for a UDP LB type service so the traffic from the LB will be sent to all the OCP nodes and there won't be connectivity when it reaches the nodes that doesn't have a service endpoint due to the ETP:local policy. The traffic should reach only the nodes that have a service endpoint pod and for that Octavia LB should mark only those nodes as ONLINE.
Version-Release number of selected component (if applicable):
OCP 4.12.0-0.nightly-2023-02-04-034821 OSP 16.2.4
How reproducible:
Always with LB+UDP+ETP:local services Works ok with LB+TCP+ETP:local services Works ok with OVN-Kubernetes
Steps to Reproduce (find here ETPlocal-udp-manifests.yaml
):
1. Create the LB+UDP+ETP:local+monitor service
$ oc apply -f ETPlocal-udp-manifests.yaml
2. Check the svc creation and the endpoint pods
$ oc -n udp-lb-etplocal-ns get svc udp-lb-etplocal-svc -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"loadbalancer.openstack.org/enable-health-monitor":"true","loadbalancer.openstack.org/health-monitor-delay":"5","loadbalancer.openstack.org/health-monitor-max-retries":"2","loadbalancer.openstack.org/health-monitor-timeout":"5"},"labels":{"app":"udp-lb-etplocal-dep"},"name":"udp-lb-etplocal-svc","namespace":"udp-lb-etplocal-ns"},"spec":{"externalTrafficPolicy":"Local","ports":[{"port":8082,"protocol":"UDP","targetPort":8081}],"selector":{"app":"udp-lb-etplocal-dep"},"type":"LoadBalancer"}}
loadbalancer.openstack.org/enable-health-monitor: "true"
loadbalancer.openstack.org/health-monitor-delay: "5"
loadbalancer.openstack.org/health-monitor-max-retries: "2"
loadbalancer.openstack.org/health-monitor-timeout: "5"
loadbalancer.openstack.org/load-balancer-id: ade496fd-54b2-4db5-aa20-a91517d7be93
creationTimestamp: "2023-02-07T11:34:41Z"
finalizers:
- service.kubernetes.io/load-balancer-cleanup
labels:
app: udp-lb-etplocal-dep
name: udp-lb-etplocal-svc
namespace: udp-lb-etplocal-ns
resourceVersion: "780125"
uid: 0a596a04-b9f3-4928-94bd-5d0c16ed1b29
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 172.30.32.211
clusterIPs:
- 172.30.32.211
externalTrafficPolicy: Local
healthCheckNodePort: 31339
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- nodePort: 30753
port: 8082
protocol: UDP
targetPort: 8081
selector:
app: udp-lb-etplocal-dep
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: hidden
$ oc -n udp-lb-etplocal-ns get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
udp-lb-etplocal-dep-749f8cdbc9-v22gj 1/1 Running 0 27h 10.128.2.24 ostest-6vx7f-worker-0-l78rf <none> <none>
udp-lb-etplocal-dep-749f8cdbc9-zrv4w 1/1 Running 0 27h 10.129.3.39 ostest-6vx7f-worker-0-pjxdx <none> <none>
3. Check OSP LB and members
$ openstack loadbalancer list
+--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
| id | name | project_id | vip_address | provisioning_status | provider |
+--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
| ade496fd-54b2-4db5-aa20-a91517d7be93 | kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc | 1ead4da027d94d3f925eaad7d3859b8a | 10.196.2.47 | ACTIVE | amphora |
+--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
$ openstack loadbalancer member list pool_0_kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc
+--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight |
+--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| 11e51c25-cbeb-465d-a6ec-f4bdb4e320a7 | ostest-6vx7f-worker-0-l78rf | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.3.190 | 30753 | ONLINE | 1 |
| 2fbd645d-805d-4ded-acf1-2d58795f0388 | ostest-6vx7f-worker-0-pjxdx | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.2.85 | 30753 | ONLINE | 1 |
| 525d95e2-848b-4fe0-b2ec-60426583dbdc | ostest-6vx7f-master-2 | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.0.71 | 30753 | ONLINE | 1 |
| 9345b86c-0d5c-40e6-a087-f7b373eab95b | ostest-6vx7f-master-1 | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.1.28 | 30753 | ONLINE | 1 |
| a6ecd43a-61b4-458a-aa25-55d3967bc286 | ostest-6vx7f-master-0 | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.3.167 | 30753 | ONLINE | 1 |
| ff0be722-d141-47d1-8166-3bd929292c00 | ostest-6vx7f-worker-0-r5tfw | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.1.106 | 30753 | ONLINE | 1 |
+--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
Note all the members are online while only ostest-6vx7f-worker-0-l78rf and ostest-6vx7f-worker-0-pjxdx should be.
4. Check UDP connectivity towards nodePort in all the cluster nodes (it's performed from the loadbalancer itself, which is in the cluster network). The udp server returns the hostname of the answering pod
$ oc get nodes -o custom-columns=NAME:.metadata.name,HOST_IP:.status.addresses[0].address
NAME HOST_IP
ostest-6vx7f-master-0 10.196.3.167
ostest-6vx7f-master-1 10.196.1.28
ostest-6vx7f-master-2 10.196.0.71
ostest-6vx7f-worker-0-l78rf 10.196.3.190
ostest-6vx7f-worker-0-pjxdx 10.196.2.85
ostest-6vx7f-worker-0-r5tfw 10.196.1.106
# for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; cat <(echo hostname) <(sleep 1) | nc -w 1 -u $i 30753; done
10.196.3.167
10.196.1.28
10.196.0.71
10.196.1.106
10.196.3.190
udp-lb-etplocal-dep-749f8cdbc9-v22gj
10.196.2.85
udp-lb-etplocal-dep-749f8cdbc9-zrv4w
The connectivity works as expected, we only get replies from udp-lb-etplocal-dep-749f8cdbc9-v22gj and udp-lb-etplocal-dep-749f8cdbc9-zrv4w
5. Run UDP health monitor check towards nodePort in all the cluster nodes
# for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; nc -uzv -w1 $i 30753; done
10.196.3.167
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.3.167:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
10.196.1.28
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.1.28:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
10.196.0.71
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.0.71:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
10.196.1.106
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.1.106:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
10.196.3.190
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.3.190:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
10.196.2.85
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.2.85:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.04 seconds.
Note that Ncat connects towards all the ocp nodes
Actual results:
Netcat (nc -uzv -w1 <node_ip> <nodePort>) returns connected towards all the OCP nodes, even for those that don't have local endpoints
Expected results:
Connection refused would be expected from the nodes without local endpoints Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.3.190:30754. Ncat: Connection refused.
Additional info:
Thanks to michjohn@redhat.com we know it's expected as no udp port unreachable ICMP packet is returned from the node.
And this can be due to the iptable rule managing that traffic
-A KUBE-EXTERNAL-SERVICES -p udp -m comment --comment "udp-lb-etplocal-ns/udp-lb-etplocal-svc has no local endpoints" -m addrtype --dst-type LOCAL -m udp --dport 30753 -j DROP
DROP action won't send ICMP host unreachable packet while REJECT action would send it and thus the nc connection would fail (as well as the monitor healthcheck).
- causes
-
OCPBUGS-7475 Skip ETP:local UDP LB svc test for OpenshiftSDN and octavia version < 2.16
-
- Closed
-
- links to
-
RHEA-2023:5006
rpm