Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7229

LB+UDP+ETP:local+monitor service misbehaving with OpenshiftSDN

XMLWordPrintable

    • +
    • Important
    • None
    • ShiftStack Sprint 231, ShiftStack Sprint 232, ShiftStack Sprint 233, ShiftStack Sprint 234
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • None

      Description of problem:

      The LB member health monitor sets all the OCP nodes as ONLINE for a UDP LB type service so the traffic from the LB will be sent to all the OCP nodes and there won't be connectivity when it reaches the nodes that doesn't have a service endpoint due to the ETP:local policy.
      The traffic should reach only the nodes that have a service endpoint pod and for that Octavia LB should mark only those nodes as ONLINE.

      Version-Release number of selected component (if applicable):

      OCP 4.12.0-0.nightly-2023-02-04-034821
      OSP 16.2.4

      How reproducible:

      Always with LB+UDP+ETP:local services
      Works ok with LB+TCP+ETP:local services
      Works ok with OVN-Kubernetes

      Steps to Reproduce (find here ETPlocal-udp-manifests.yaml):

      1. Create the LB+UDP+ETP:local+monitor service
      $ oc apply -f ETPlocal-udp-manifests.yaml
      
      2. Check the svc creation and the endpoint pods
      $ oc -n udp-lb-etplocal-ns get svc udp-lb-etplocal-svc -o yaml
      apiVersion: v1
      kind: Service
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"loadbalancer.openstack.org/enable-health-monitor":"true","loadbalancer.openstack.org/health-monitor-delay":"5","loadbalancer.openstack.org/health-monitor-max-retries":"2","loadbalancer.openstack.org/health-monitor-timeout":"5"},"labels":{"app":"udp-lb-etplocal-dep"},"name":"udp-lb-etplocal-svc","namespace":"udp-lb-etplocal-ns"},"spec":{"externalTrafficPolicy":"Local","ports":[{"port":8082,"protocol":"UDP","targetPort":8081}],"selector":{"app":"udp-lb-etplocal-dep"},"type":"LoadBalancer"}}
          loadbalancer.openstack.org/enable-health-monitor: "true"
          loadbalancer.openstack.org/health-monitor-delay: "5"
          loadbalancer.openstack.org/health-monitor-max-retries: "2"
          loadbalancer.openstack.org/health-monitor-timeout: "5"
          loadbalancer.openstack.org/load-balancer-id: ade496fd-54b2-4db5-aa20-a91517d7be93
        creationTimestamp: "2023-02-07T11:34:41Z"
        finalizers:
        - service.kubernetes.io/load-balancer-cleanup
        labels:
          app: udp-lb-etplocal-dep
        name: udp-lb-etplocal-svc
        namespace: udp-lb-etplocal-ns
        resourceVersion: "780125"
        uid: 0a596a04-b9f3-4928-94bd-5d0c16ed1b29
      spec:
        allocateLoadBalancerNodePorts: true
        clusterIP: 172.30.32.211
        clusterIPs:
        - 172.30.32.211
        externalTrafficPolicy: Local
        healthCheckNodePort: 31339
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: SingleStack
        ports:
        - nodePort: 30753
          port: 8082
          protocol: UDP
          targetPort: 8081
        selector:
          app: udp-lb-etplocal-dep
        sessionAffinity: None
        type: LoadBalancer
      status:
        loadBalancer:
          ingress:
          - ip: hidden
      
      $ oc -n udp-lb-etplocal-ns get pods -o wide
      NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                          NOMINATED NODE   READINESS GATES
      udp-lb-etplocal-dep-749f8cdbc9-v22gj   1/1     Running   0          27h   10.128.2.24   ostest-6vx7f-worker-0-l78rf   <none>           <none>
      udp-lb-etplocal-dep-749f8cdbc9-zrv4w   1/1     Running   0          27h   10.129.3.39   ostest-6vx7f-worker-0-pjxdx   <none>           <none>
      
      3. Check OSP LB and members
      
      $ openstack loadbalancer list
      +--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
      | id                                   | name                                                                               | project_id                       | vip_address  | provisioning_status | provider |
      +--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
      | ade496fd-54b2-4db5-aa20-a91517d7be93 | kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc                     | 1ead4da027d94d3f925eaad7d3859b8a | 10.196.2.47  | ACTIVE              | amphora  |
      +--------------------------------------+------------------------------------------------------------------------------------+----------------------------------+--------------+---------------------+----------+
      
      
      $ openstack loadbalancer member list pool_0_kube_service_kubernetes_udp-lb-etplocal-ns_udp-lb-etplocal-svc
      +--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
      | id                                   | name                        | project_id                       | provisioning_status | address      | protocol_port | operating_status | weight |
      +--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
      | 11e51c25-cbeb-465d-a6ec-f4bdb4e320a7 | ostest-6vx7f-worker-0-l78rf | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.3.190 |         30753 | ONLINE           |      1 |
      | 2fbd645d-805d-4ded-acf1-2d58795f0388 | ostest-6vx7f-worker-0-pjxdx | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.2.85  |         30753 | ONLINE           |      1 |
      | 525d95e2-848b-4fe0-b2ec-60426583dbdc | ostest-6vx7f-master-2       | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.0.71  |         30753 | ONLINE           |      1 |
      | 9345b86c-0d5c-40e6-a087-f7b373eab95b | ostest-6vx7f-master-1       | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.1.28  |         30753 | ONLINE           |      1 |
      | a6ecd43a-61b4-458a-aa25-55d3967bc286 | ostest-6vx7f-master-0       | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.3.167 |         30753 | ONLINE           |      1 |
      | ff0be722-d141-47d1-8166-3bd929292c00 | ostest-6vx7f-worker-0-r5tfw | 1ead4da027d94d3f925eaad7d3859b8a | ACTIVE              | 10.196.1.106 |         30753 | ONLINE           |      1 |
      +--------------------------------------+-----------------------------+----------------------------------+---------------------+--------------+---------------+------------------+--------+
      
      Note all the members are online while only ostest-6vx7f-worker-0-l78rf and ostest-6vx7f-worker-0-pjxdx should be.
      
      4. Check UDP connectivity towards nodePort in all the cluster nodes (it's performed from the loadbalancer itself, which is in the cluster network). The udp server returns the hostname of the answering pod
      
      $ oc get nodes -o custom-columns=NAME:.metadata.name,HOST_IP:.status.addresses[0].address
      NAME                          HOST_IP
      ostest-6vx7f-master-0         10.196.3.167
      ostest-6vx7f-master-1         10.196.1.28
      ostest-6vx7f-master-2         10.196.0.71
      ostest-6vx7f-worker-0-l78rf   10.196.3.190
      ostest-6vx7f-worker-0-pjxdx   10.196.2.85
      ostest-6vx7f-worker-0-r5tfw   10.196.1.106
      
      # for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; cat <(echo hostname) <(sleep 1) | nc -w 1 -u $i 30753; done
      10.196.3.167
      10.196.1.28
      10.196.0.71
      10.196.1.106
      10.196.3.190
      udp-lb-etplocal-dep-749f8cdbc9-v22gj
      10.196.2.85
      udp-lb-etplocal-dep-749f8cdbc9-zrv4w
      
      The connectivity works as expected, we only get replies from udp-lb-etplocal-dep-749f8cdbc9-v22gj and udp-lb-etplocal-dep-749f8cdbc9-zrv4w
      
      5. Run UDP health monitor check towards nodePort in all the cluster nodes
      
      # for i in 10.196.3.167 10.196.1.28 10.196.0.71 10.196.1.106 10.196.3.190 10.196.2.85; do echo $i; nc -uzv -w1 $i 30753; done                                                  
      10.196.3.167
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.3.167:30753.
      Ncat: UDP packet sent successfully
      Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
      10.196.1.28
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.1.28:30753.
      Ncat: UDP packet sent successfully
      Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
      10.196.0.71
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.0.71:30753.
      Ncat: UDP packet sent successfully
      Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
      10.196.1.106
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.1.106:30753.
      Ncat: UDP packet sent successfully
      Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
      10.196.3.190
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.3.190:30753.
      Ncat: UDP packet sent successfully
      Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
      10.196.2.85
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.2.85:30753.
      Ncat: UDP packet sent successfully
      Ncat: 1 bytes sent, 0 bytes received in 2.04 seconds.
      
      Note that Ncat connects towards all the ocp nodes

      Actual results:

      Netcat (nc -uzv -w1 <node_ip> <nodePort>) returns connected towards all the OCP nodes, even for those that don't have local endpoints

      Expected results:

      Connection refused would be expected from the nodes without local endpoints
      Ncat: Version 7.70 ( https://nmap.org/ncat )
      Ncat: Connected to 10.196.3.190:30754.
      Ncat: Connection refused.

      Additional info:
      Thanks to michjohn@redhat.com we know it's expected as no udp port unreachable ICMP packet is returned from the node.
      And this can be due to the iptable rule managing that traffic

      -A KUBE-EXTERNAL-SERVICES -p udp -m comment --comment "udp-lb-etplocal-ns/udp-lb-etplocal-svc has no local endpoints" -m addrtype --dst-type LOCAL -m udp --dport 30753 -j DROP

      DROP action won't send ICMP host unreachable packet while REJECT action would send it and thus the nc connection would fail (as well as the monitor healthcheck).

              mdulko Michał Dulko (Inactive)
              juriarte@redhat.com Jon Uriarte
              Ramón Lobillo Ramón Lobillo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: