Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2827

OVNK: NAT issue for packets exceeding check_pkt_larger() for NodePort services that route to hostNetworked pods

XMLWordPrintable

      Description of problem:

      Create Loadbalancer type service within the OCP 4.11.x OVNKubernetes cluster to expose the api server endpoint, the service does not response for normal oc request. 
      But some of them are working, like "oc whoami", "oc get --raw /api"

      Version-Release number of selected component (if applicable):

      4.11.8 with OVNKubernetes

      How reproducible:

      always

      Steps to Reproduce:

      1. Setup openshift cluster 4.11 on AWS with OVNKubernetes as the default network
      2. Create the following service under openshift-kube-apiserver namespace to expose the api
      ----
      apiVersion: v1
      kind: Service
      metadata:
        annotations:
          service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1800"
        finalizers:
        - service.kubernetes.io/load-balancer-cleanup
        name: test-api
        namespace: openshift-kube-apiserver
      spec:
        allocateLoadBalancerNodePorts: true
        externalTrafficPolicy: Cluster
        internalTrafficPolicy: Cluster
        ipFamilies:
        - IPv4
        ipFamilyPolicy: SingleStack
        loadBalancerSourceRanges:
        - <my_ip>/32
        ports:
        - nodePort: 31248
          port: 6443
          protocol: TCP
          targetPort: 6443
        selector:
          apiserver: "true"
          app: openshift-kube-apiserver
        sessionAffinity: None
        type: LoadBalancer
      
      3. Setup the DNS resolution for the access
      xxx.mydomain.com ---> <elb-auto-generated-dns>
      
      4. Try to access the cluster api via the service above by updating the kubeconfig to use the custom dns name
      

      Actual results:

      No response from the server side.
      
      $ time oc get node -v8
      I1025 08:29:10.284069  103974 loader.go:375] Config loaded from file:  bmeng.kubeconfig
      I1025 08:29:10.294017  103974 round_trippers.go:420] GET https://rh-api.bmeng-ccs-ovn.3o13.s1.devshift.org:6443/api/v1/nodes?limit=500
      I1025 08:29:10.294035  103974 round_trippers.go:427] Request Headers:
      I1025 08:29:10.294043  103974 round_trippers.go:431]     Accept: application/json;as=Table;v=v1;g=meta.k8s.io,application/json;as=Table;v=v1beta1;g=meta.k8s.io,application/json
      I1025 08:29:10.294052  103974 round_trippers.go:431]     User-Agent: oc/openshift (linux/amd64) kubernetes/e40bd2d
      I1025 08:29:10.365119  103974 round_trippers.go:446] Response Status: 200 OK in 71 milliseconds
      I1025 08:29:10.365142  103974 round_trippers.go:449] Response Headers:
      I1025 08:29:10.365148  103974 round_trippers.go:452]     Audit-Id: 83b9d8ae-05a4-4036-bff6-de371d5bec12
      I1025 08:29:10.365155  103974 round_trippers.go:452]     Cache-Control: no-cache, private
      I1025 08:29:10.365161  103974 round_trippers.go:452]     Content-Type: application/json
      I1025 08:29:10.365167  103974 round_trippers.go:452]     X-Kubernetes-Pf-Flowschema-Uid: 2abc2e2d-ada3-4cb8-a86f-235df3a4e214
      I1025 08:29:10.365173  103974 round_trippers.go:452]     X-Kubernetes-Pf-Prioritylevel-Uid: 02f7a188-43c7-4827-af58-5ebe861a1891
      I1025 08:29:10.365179  103974 round_trippers.go:452]     Date: Tue, 25 Oct 2022 08:29:10 GMT
      ^C
      real    17m4.840s
      user    0m0.567s
      sys    0m0.163s
      
      
      However, it has the correct response if using --raw to request, eg:
      $ oc get --raw /api/v1  --kubeconfig bmeng.kubeconfig 
      {"kind":"APIResourceList","groupVersion":"v1","resources":[{"name":"bindings","singularName":"","namespaced":true,"kind":"Binding","verbs":["create"]},{"name":"componentstatuses","singularName":"","namespaced":false,"kind":"ComponentStatus","verbs":["get","list"],"shortNames":["cs"]},{"name":"configmaps","singularName":"","namespaced":true,"kind":"ConfigMap","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["cm"],"storageVersionHash":"qFsyl6wFWjQ="},{"name":"endpoints","singularName":"","namespaced":true,"kind":"Endpoints","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["ep"],"storageVersionHash":"fWeeMqaN/OA="},{"name":"events","singularName":"","namespaced":true,"kind":"Event","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["ev"],"storageVersionHash":"r2yiGXH7wu8="},{"name":"limitranges","singularName":"","namespaced":true,"kind":"LimitRange","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["limits"],"storageVersionHash":"EBKMFVe6cwo="},{"name":"namespaces","singularName":"","namespaced":false,"kind":"Namespace","verbs":["create","delete","get","list","patch","update","watch"],"shortNames":["ns"],"storageVersionHash":"Q3oi5N2YM8M="},{"name":"namespaces/finalize","singularName":"","namespaced":false,"kind":"Namespace","verbs":["update"]},{"name":"namespaces/status","singularName":"","namespaced":false,"kind":"Namespace","verbs":["get","patch","update"]},{"name":"nodes","singularName":"","namespaced":false,"kind":"Node","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["no"],"storageVersionHash":"XwShjMxG9Fs="},{"name":"nodes/proxy","singularName":"","namespaced":false,"kind":"NodeProxyOptions","verbs":["create","delete","get","patch","update"]},{"name":"nodes/status","singularName":"","namespaced":false,"kind":"Node","verbs":["get","patch","update"]},{"name":"persistentvolumeclaims","singularName":"","namespaced":true,"kind":"PersistentVolumeClaim","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["pvc"],"storageVersionHash":"QWTyNDq0dC4="},{"name":"persistentvolumeclaims/status","singularName":"","namespaced":true,"kind":"PersistentVolumeClaim","verbs":["get","patch","update"]},{"name":"persistentvolumes","singularName":"","namespaced":false,"kind":"PersistentVolume","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["pv"],"storageVersionHash":"HN/zwEC+JgM="},{"name":"persistentvolumes/status","singularName":"","namespaced":false,"kind":"PersistentVolume","verbs":["get","patch","update"]},{"name":"pods","singularName":"","namespaced":true,"kind":"Pod","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["po"],"categories":["all"],"storageVersionHash":"xPOwRZ+Yhw8="},{"name":"pods/attach","singularName":"","namespaced":true,"kind":"PodAttachOptions","verbs":["create","get"]},{"name":"pods/binding","singularName":"","namespaced":true,"kind":"Binding","verbs":["create"]},{"name":"pods/ephemeralcontainers","singularName":"","namespaced":true,"kind":"Pod","verbs":["get","patch","update"]},{"name":"pods/eviction","singularName":"","namespaced":true,"group":"policy","version":"v1","kind":"Eviction","verbs":["create"]},{"name":"pods/exec","singularName":"","namespaced":true,"kind":"PodExecOptions","verbs":["create","get"]},{"name":"pods/log","singularName":"","namespaced":true,"kind":"Pod","verbs":["get"]},{"name":"pods/portforward","singularName":"","namespaced":true,"kind":"PodPortForwardOptions","verbs":["create","get"]},{"name":"pods/proxy","singularName":"","namespaced":true,"kind":"PodProxyOptions","verbs":["create","delete","get","patch","update"]},{"name":"pods/status","singularName":"","namespaced":true,"kind":"Pod","verbs":["get","patch","update"]},{"name":"podtemplates","singularName":"","namespaced":true,"kind":"PodTemplate","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"storageVersionHash":"LIXB2x4IFpk="},{"name":"replicationcontrollers","singularName":"","namespaced":true,"kind":"ReplicationController","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["rc"],"categories":["all"],"storageVersionHash":"Jond2If31h0="},{"name":"replicationcontrollers/scale","singularName":"","namespaced":true,"group":"autoscaling","version":"v1","kind":"Scale","verbs":["get","patch","update"]},{"name":"replicationcontrollers/status","singularName":"","namespaced":true,"kind":"ReplicationController","verbs":["get","patch","update"]},{"name":"resourcequotas","singularName":"","namespaced":true,"kind":"ResourceQuota","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["quota"],"storageVersionHash":"8uhSgffRX6w="},{"name":"resourcequotas/status","singularName":"","namespaced":true,"kind":"ResourceQuota","verbs":["get","patch","update"]},{"name":"secrets","singularName":"","namespaced":true,"kind":"Secret","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"storageVersionHash":"S6u1pOWzb84="},{"name":"serviceaccounts","singularName":"","namespaced":true,"kind":"ServiceAccount","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["sa"],"storageVersionHash":"pbx9ZvyFpBE="},{"name":"serviceaccounts/token","singularName":"","namespaced":true,"group":"authentication.k8s.io","version":"v1","kind":"TokenRequest","verbs":["create"]},{"name":"services","singularName":"","namespaced":true,"kind":"Service","verbs":["create","delete","deletecollection","get","list","patch","update","watch"],"shortNames":["svc"],"categories":["all"],"storageVersionHash":"0/CO1lhkEBI="},{"name":"services/proxy","singularName":"","namespaced":true,"kind":"ServiceProxyOptions","verbs":["create","delete","get","patch","update"]},{"name":"services/status","singularName":"","namespaced":true,"kind":"Service","verbs":["get","patch","update"]}]}
       

      Expected results:

      The normal oc request should be working.

      Additional info:

      There is no such issue for clusters with openshift-sdn with the same OpenShift version and same LoadBalancer service.
      
      We suspected that it might be related to the MTU setting, but this cannot explain why OpenShiftSDN works well.
      
      Another thing might be related is that the OpenShiftSDN is using iptables for service loadbalancing and OVN is dealing that within the OVN services.

       

      Please let me know if any debug log/info is needed.

        1. 2827.jpg
          2827.jpg
          4.50 MB
        2. akaris_mtu.png
          akaris_mtu.png
          53 kB

              akaris@redhat.com Andreas Karis
              bmeng_sre.openshift Bo Meng
              Zhanqi Zhao Zhanqi Zhao
              Andreas Karis
              Votes:
              2 Vote for this issue
              Watchers:
              19 Start watching this issue

                Created:
                Updated:
                Resolved: