Loading...

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Cloud Compute / OpenStack Provider
Labels:
- QA-Triaged
- Triaged

Test Coverage:

+
Severity:
Important
Regression:
None
Sprint:
ShiftStack Sprint 231, ShiftStack Sprint 232, ShiftStack Sprint 233, ShiftStack Sprint 234
sprint_count:
4
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
None
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The LB member health monitor sets all the OCP nodes as ONLINE for a UDP LB type service so the traffic from the LB will be sent to all the OCP nodes and there won't be connectivity when it reaches the nodes that doesn't have a service endpoint due to the ETP:local policy.
The traffic should reach only the nodes that have a service endpoint pod and for that Octavia LB should mark only those nodes as ONLINE.

Version-Release number of selected component (if applicable):

OCP 4.12.0-0.nightly-2023-02-04-034821
OSP 16.2.4

How reproducible:

Always with LB+UDP+ETP:local services
Works ok with LB+TCP+ETP:local services
Works ok with OVN-Kubernetes

Steps to Reproduce (find here ETPlocal-udp-manifests.yaml):

1. Create the LB+UDP+ETP:local+monitor service
$ oc apply -f ETPlocal-udp-manifests.yaml

2. Check the svc creation and the endpoint pods
$ oc -n udp-lb-etplocal-ns get svc udp-lb-etplocal-svc -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"loadbalancer.openstack.org/enable-health-monitor":"true","loadbalancer.openstack.org/health-monitor-delay":"5","loadbalancer.openstack.org/health-monitor-max-retries":"2","loadbalancer.openstack.org/health-monitor-timeout":"5"},"labels":{"app":"udp-lb-etplocal-dep"},"name":"udp-lb-etplocal-svc","namespace":"udp-lb-etplocal-ns"},"spec":{"externalTrafficPolicy":"Local","ports":[{"port":8082,"protocol":"UDP","targetPort":8081}],"selector":{"app":"udp-lb-etplocal-dep"},"type":"LoadBalancer"}}
    loadbalancer.openstack.org/enable-health-monitor: "true"
    loadbalancer.openstack.org/health-monitor-delay: "5"
    loadbalancer.openstack.org/health-monitor-max-retries: "2"
    loadbalancer.openstack.org/health-monitor-timeout: "5"
    loadbalancer.openstack.org/load-balancer-id: ade496fd-54b2-4db5-aa20-a91517d7be93
  creationTimestamp: "2023-02-07T11:34:41Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  labels:
    app: udp-lb-etplocal-dep
  name: udp-lb-etplocal-svc
  namespace: udp-lb-etplocal-ns
  resourceVersion: "780125"
  uid: 0a596a04-b9f3-4928-94bd-5d0c16ed1b29
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.30.32.211
  clusterIPs:
  - 172.30.32.211
  externalTrafficPolicy: Local
  healthCheckNodePort: 31339
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - nodePort: 30753
    port: 8082
    protocol: UDP
    targetPort: 8081
  selector:
    app: udp-lb-etplocal-dep
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: hidden

$ oc -n udp-lb-etplocal-ns get pods -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                          NOMINATED NODE   READINESS GATES
udp-lb-etplocal-dep-749f8cdbc9-v22gj   1/1     Running   0          27h   10.128.2.24   ostest-6vx7f-worker-0-l78rf   <none>           <none>
udp-lb-etplocal-dep-749f8cdbc9-zrv4w   1/1     Running   0          27h   10.129.3.39   ostest-6vx7f-worker-0-pjxdx   <none>           <none>

3. Check OSP LB and members

$ openstack loadbalancer list
+--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
| id                                   | name                                                                               | project_id                       | vip_address  | provisioning_status | provider |
+--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
| ade496fd-54b2-4db5-aa20-a91517d7be93 | kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc                     | 1ead4da027d94d3f925eaad7d3859b8a | 10.196.2.47  | ACTIVE              | amphora  |
+--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+


$ openstack loadbalancer member list pool_0_kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc
+--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| id                                   | name                        | project_id                       | provisioning_status | address      | protocol_port | operating_status | weight |
+--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
| 11e51c25-cbeb-465d-a6ec-f4bdb4e320a7 | ostest-6vx7f-worker-0-l78rf | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.3.190 |         30753 | ONLINE           |      1 |
| 2fbd645d-805d-4ded-acf1-2d58795f0388 | ostest-6vx7f-worker-0-pjxdx | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.2.85  |         30753 | ONLINE           |      1 |
| 525d95e2-848b-4fe0-b2ec-60426583dbdc | ostest-6vx7f-master-2       | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.0.71  |         30753 | ONLINE           |      1 |
| 9345b86c-0d5c-40e6-a087-f7b373eab95b | ostest-6vx7f-master-1       | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.1.28  |         30753 | ONLINE           |      1 |
| a6ecd43a-61b4-458a-aa25-55d3967bc286 | ostest-6vx7f-master-0       | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.3.167 |         30753 | ONLINE           |      1 |
| ff0be722-d141-47d1-8166-3bd929292c00 | ostest-6vx7f-worker-0-r5tfw | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.1.106 |         30753 | ONLINE           |      1 |
+--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+

Note all the members are online while only ostest-6vx7f-worker-0-l78rf and ostest-6vx7f-worker-0-pjxdx should be.

4. Check UDP connectivity towards nodePort in all the cluster nodes (it's performed from the loadbalancer itself, which is in the cluster network). The udp server returns the hostname of the answering pod

$ oc get nodes -o custom-columns=NAME:.metadata.name,HOST_IP:.status.addresses[0].address
NAME                          HOST_IP
ostest-6vx7f-master-0         10.196.3.167
ostest-6vx7f-master-1         10.196.1.28
ostest-6vx7f-master-2         10.196.0.71
ostest-6vx7f-worker-0-l78rf   10.196.3.190
ostest-6vx7f-worker-0-pjxdx   10.196.2.85
ostest-6vx7f-worker-0-r5tfw   10.196.1.106

# for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; cat <(echo hostname) <(sleep 1) | nc -w 1 -u $i 30753; done
10.196.3.167
10.196.1.28
10.196.0.71
10.196.1.106
10.196.3.190
udp-lb-etplocal-dep-749f8cdbc9-v22gj
10.196.2.85
udp-lb-etplocal-dep-749f8cdbc9-zrv4w

The connectivity works as expected, we only get replies from udp-lb-etplocal-dep-749f8cdbc9-v22gj and udp-lb-etplocal-dep-749f8cdbc9-zrv4w

5. Run UDP health monitor check towards nodePort in all the cluster nodes

# for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; nc -uzv -w1 $i 30753; done                                                  
10.196.3.167
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.3.167:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
10.196.1.28
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.1.28:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
10.196.0.71
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.0.71:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
10.196.1.106
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.1.106:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
10.196.3.190
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.3.190:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
10.196.2.85
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.2.85:30753.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.04 seconds.

Note that Ncat connects towards all the ocp nodes

Actual results:

Netcat (nc -uzv -w1 <node_ip> <nodePort>) returns connected towards all the OCP nodes, even for those that don't have local endpoints

Expected results:

Connection refused would be expected from the nodes without local endpoints
Ncat: Version 7.70 ( https://nmap.org/ncat )
Ncat: Connected to 10.196.3.190:30754.
Ncat: Connection refused.

Additional info:
Thanks to michjohn@redhat.com we know it's expected as no udp port unreachable ICMP packet is returned from the node.
And this can be due to the iptable rule managing that traffic

-A KUBE-EXTERNAL-SERVICES -p udp -m comment --comment "udp-lb-etplocal-ns/udp-lb-etplocal-svc has no local endpoints" -m addrtype --dst-type LOCAL -m udp --dport 30753 -j DROP

DROP action won't send ICMP host unreachable packet while REJECT action would send it and thus the nc connection would fail (as well as the monitor healthcheck).