-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.12
-
+
-
Important
-
None
-
ShiftStack Sprint 231, ShiftStack Sprint 232, ShiftStack Sprint 233, ShiftStack Sprint 234
-
4
-
Rejected
-
False
-
-
None
Description of problem:
The LB member health monitor sets all the OCP nodes as ONLINE for a UDP LB type service so the traffic from the LB will be sent to all the OCP nodes and there won't be connectivity when it reaches the nodes that doesn't have a service endpoint due to the ETP:local policy. The traffic should reach only the nodes that have a service endpoint pod and for that Octavia LB should mark only those nodes as ONLINE.
Version-Release number of selected component (if applicable):
OCP 4.12.0-0.nightly-2023-02-04-034821 OSP 16.2.4
How reproducible:
Always with LB+UDP+ETP:local services Works ok with LB+TCP+ETP:local services Works ok with OVN-Kubernetes
Steps to Reproduce (find here ETPlocal-udp-manifests.yaml):
1. Create the LB+UDP+ETP:local+monitor service $ oc apply -f ETPlocal-udp-manifests.yaml 2. Check the svc creation and the endpoint pods $ oc -n udp-lb-etplocal-ns get svc udp-lb-etplocal-svc -o yaml apiVersion: v1 kind: Service metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"loadbalancer.openstack.org/enable-health-monitor":"true","loadbalancer.openstack.org/health-monitor-delay":"5","loadbalancer.openstack.org/health-monitor-max-retries":"2","loadbalancer.openstack.org/health-monitor-timeout":"5"},"labels":{"app":"udp-lb-etplocal-dep"},"name":"udp-lb-etplocal-svc","namespace":"udp-lb-etplocal-ns"},"spec":{"externalTrafficPolicy":"Local","ports":[{"port":8082,"protocol":"UDP","targetPort":8081}],"selector":{"app":"udp-lb-etplocal-dep"},"type":"LoadBalancer"}} loadbalancer.openstack.org/enable-health-monitor: "true" loadbalancer.openstack.org/health-monitor-delay: "5" loadbalancer.openstack.org/health-monitor-max-retries: "2" loadbalancer.openstack.org/health-monitor-timeout: "5" loadbalancer.openstack.org/load-balancer-id: ade496fd-54b2-4db5-aa20-a91517d7be93 creationTimestamp: "2023-02-07T11:34:41Z" finalizers: - service.kubernetes.io/load-balancer-cleanup labels: app: udp-lb-etplocal-dep name: udp-lb-etplocal-svc namespace: udp-lb-etplocal-ns resourceVersion: "780125" uid: 0a596a04-b9f3-4928-94bd-5d0c16ed1b29 spec: allocateLoadBalancerNodePorts: true clusterIP: 172.30.32.211 clusterIPs: - 172.30.32.211 externalTrafficPolicy: Local healthCheckNodePort: 31339 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - nodePort: 30753 port: 8082 protocol: UDP targetPort: 8081 selector: app: udp-lb-etplocal-dep sessionAffinity: None type: LoadBalancer status: loadBalancer: ingress: - ip: hidden $ oc -n udp-lb-etplocal-ns get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES udp-lb-etplocal-dep-749f8cdbc9-v22gj 1/1 Running 0 27h 10.128.2.24 ostest-6vx7f-worker-0-l78rf <none> <none> udp-lb-etplocal-dep-749f8cdbc9-zrv4w 1/1 Running 0 27h 10.129.3.39 ostest-6vx7f-worker-0-pjxdx <none> <none> 3. Check OSP LB and members $ openstack loadbalancer list +--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+ | id | name | project_id | vip_address | provisioning_status | provider | +--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+ | ade496fd-54b2-4db5-aa20-a91517d7be93 | kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc | 1ead4da027d94d3f925eaad7d3859b8a | 10.196.2.47 | ACTIVE | amphora | +--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+ $ openstack loadbalancer member list pool_0_kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc +--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | id | name | project_id | provisioning_status | address | protocol_port | operating_status | weight | +--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ | 11e51c25-cbeb-465d-a6ec-f4bdb4e320a7 | ostest-6vx7f-worker-0-l78rf | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.3.190 | 30753 | ONLINE | 1 | | 2fbd645d-805d-4ded-acf1-2d58795f0388 | ostest-6vx7f-worker-0-pjxdx | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.2.85 | 30753 | ONLINE | 1 | | 525d95e2-848b-4fe0-b2ec-60426583dbdc | ostest-6vx7f-master-2 | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.0.71 | 30753 | ONLINE | 1 | | 9345b86c-0d5c-40e6-a087-f7b373eab95b | ostest-6vx7f-master-1 | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.1.28 | 30753 | ONLINE | 1 | | a6ecd43a-61b4-458a-aa25-55d3967bc286 | ostest-6vx7f-master-0 | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.3.167 | 30753 | ONLINE | 1 | | ff0be722-d141-47d1-8166-3bd929292c00 | ostest-6vx7f-worker-0-r5tfw | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE | 10.196.1.106 | 30753 | ONLINE | 1 | +--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+ Note all the members are online while only ostest-6vx7f-worker-0-l78rf and ostest-6vx7f-worker-0-pjxdx should be. 4. Check UDP connectivity towards nodePort in all the cluster nodes (it's performed from the loadbalancer itself, which is in the cluster network). The udp server returns the hostname of the answering pod $ oc get nodes -o custom-columns=NAME:.metadata.name,HOST_IP:.status.addresses[0].address NAME HOST_IP ostest-6vx7f-master-0 10.196.3.167 ostest-6vx7f-master-1 10.196.1.28 ostest-6vx7f-master-2 10.196.0.71 ostest-6vx7f-worker-0-l78rf 10.196.3.190 ostest-6vx7f-worker-0-pjxdx 10.196.2.85 ostest-6vx7f-worker-0-r5tfw 10.196.1.106 # for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; cat <(echo hostname) <(sleep 1) | nc -w 1 -u $i 30753; done 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 udp-lb-etplocal-dep-749f8cdbc9-v22gj 10.196.2.85 udp-lb-etplocal-dep-749f8cdbc9-zrv4w The connectivity works as expected, we only get replies from udp-lb-etplocal-dep-749f8cdbc9-v22gj and udp-lb-etplocal-dep-749f8cdbc9-zrv4w 5. Run UDP health monitor check towards nodePort in all the cluster nodes # for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; nc -uzv -w1 $i 30753; done 10.196.3.167 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.3.167:30753. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds. 10.196.1.28 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.1.28:30753. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. 10.196.0.71 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.0.71:30753. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds. 10.196.1.106 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.1.106:30753. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds. 10.196.3.190 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.3.190:30753. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds. 10.196.2.85 Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.2.85:30753. Ncat: UDP packet sent successfully Ncat: 1 bytes sent, 0 bytes received in 2.04 seconds. Note that Ncat connects towards all the ocp nodes
Actual results:
Netcat (nc -uzv -w1 <node_ip> <nodePort>) returns connected towards all the OCP nodes, even for those that don't have local endpoints
Expected results:
Connection refused would be expected from the nodes without local endpoints Ncat: Version 7.70 ( https://nmap.org/ncat ) Ncat: Connected to 10.196.3.190:30754. Ncat: Connection refused.
Additional info:
Thanks to michjohn@redhat.com we know it's expected as no udp port unreachable ICMP packet is returned from the node.
And this can be due to the iptable rule managing that traffic
-A KUBE-EXTERNAL-SERVICES -p udp -m comment --comment "udp-lb-etplocal-ns/udp-lb-etplocal-svc has no local endpoints" -m addrtype --dst-type LOCAL -m udp --dport 30753 -j DROP
DROP action won't send ICMP host unreachable packet while REJECT action would send it and thus the nc connection would fail (as well as the monitor healthcheck).
- causes
-
OCPBUGS-7475 Skip ETP:local UDP LB svc test for OpenshiftSDN and octavia version < 2.16
- Closed
- links to
-
RHEA-2023:5006 rpm