Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29511

OCP 4.14.8 responds with RST to all ip fragmented packets arriving to a pod

XMLWordPrintable

    • Moderate
    • No
    • SDN Sprint 249, SDN Sprint 250, SDN Sprint 251, SDN Sprint 252, SDN Sprint 253, SDN Sprint 254, SDN Sprint 255
    • 7
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, with the OVN-Kubernetes setting for routing-via-host set to shared gateway mode, its default value, OVN-Kubernetes did not correctly handle traffic streams that mixed non-fragmented and fragmented packets from the IP layer on cluster ingress. This caused connection resets or packet drops. With this release, OVN-Kubernetes correctly reassembles and handles external traffic IP packet fragments on ingress. (link:https://issues.redhat.com/browse/OCPBUGS-29511[*OCPBUGS-29511*])
      Show
      * Previously, with the OVN-Kubernetes setting for routing-via-host set to shared gateway mode, its default value, OVN-Kubernetes did not correctly handle traffic streams that mixed non-fragmented and fragmented packets from the IP layer on cluster ingress. This caused connection resets or packet drops. With this release, OVN-Kubernetes correctly reassembles and handles external traffic IP packet fragments on ingress. (link: https://issues.redhat.com/browse/OCPBUGS-29511 [* OCPBUGS-29511 *])
    • Bug Fix
    • Done
    • fragmented TCP traffic to an external IP is the scope; workaround is routingViaHost=true; needs update as of 18-March

      Description of problem:

      When external TCP traffic is IP fragmented with no DF flag set and is targeted to a pod external IP, the fragmented packets are responded by RST and are not delivered to the PODs application socket.
         
      Version-Release number of selected component (if applicable):

      $ oc version
      Client Version: 4.14.8
      Kustomize Version: v5.0.1
      Server Version: 4.14.7
      Kubernetes Version: v1.27.8+4fab27b
           
      How reproducible:

      I built a reproducer for this issue on KVM hosted OCP claster.
      I can simulate the same traffic as can be seen in the customer's network.
      So we do have a solid reproducer for the issue.
      Details are in the JIRA updates.
           
      Steps to Reproduce:
      I wrote a simple C-based tcp_server/tcp_client application for testing.
      The client simply sends a file towards the server from a networking namespace with
      disabled pmtu. The server app runs in a pod and simply waits for connections then reads the data from the socket and stores the received file into /tmp .
      There is along the way from the client namespace a veth pair with MTU 1000 since the
      path MTU is 1500.
      This is enough to get ip packets fragmented along the way from the client to the server.
      Details of the setup and testing steps are in the JIRA comments.  

      Actual results:

      $ oc get network.operator -o yaml | grep routingViaHost
                routingViaHost: false
      All fragmented packets are responded causing a TCP RST and are not delivered to the
      application socket in the pod.  

      Expected results:

      Fragmented packets are delivered to the application socket running in a pod with
      $ oc get network.operator -o yaml | grep routingViaHost
                routingViaHost: false
           

      Additional info:

      There is a WA to prevent the issue.
      $ oc get network.operator -o yaml | grep routingViaHost
                routingViaHost: true
      Makes the fragmented traffic arrive at the application socket in the pod.

      I can assist with the reproducer and testing on the test env.
      Regards Michal Tesar

              jcaamano@redhat.com Jaime Caamaño Ruiz
              rhn-support-mtesar Michal Tesar
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: