Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4119

Random SYN drops in OVS bridges of OVN-Kubernetes

XMLWordPrintable

    • Important
    • None
    • SDN Sprint 232, SDN Sprint 233
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated
    • Hide
      2/6: moved this to telco-4.13 to reflect the target version of this (the 4.12.z is OCPBUGS-6957)
      1/3: Moving this to 4.12 POSTGA for Telco - reported by E/// in 4.10, but not reported by Nokia or QE. Needs a release note for 4.12 GA.
      12/19: triage continues, pending additional information from originator / customer
      12/15: triage underway, still considering possibility that this may be a RHEL issue
      12/12: Telco team (YJ) to look into this further re: RHEL v OCP issue
      12/8: No change.
      12/5: Yellow as investigation is continuing.
      12/5: added to the Telco-Grade OCP 4.12 gating list as rank 2
      Rel Note for Telco: TBD
      Show
      2/6: moved this to telco-4.13 to reflect the target version of this (the 4.12.z is OCPBUGS-6957 ) 1/3: Moving this to 4.12 POSTGA for Telco - reported by E/// in 4.10, but not reported by Nokia or QE. Needs a release note for 4.12 GA. 12/19: triage continues, pending additional information from originator / customer 12/15: triage underway, still considering possibility that this may be a RHEL issue 12/12: Telco team (YJ) to look into this further re: RHEL v OCP issue 12/8: No change. 12/5: Yellow as investigation is continuing. 12/5: added to the Telco-Grade OCP 4.12 gating list as rank 2 Rel Note for Telco: TBD

      Description of problem:

      SYN packets for new tcp connections from inside the cluster to an external destination are dropped at random. After few seconds (i.e. few retries), they eventually succeed and no more packet drop happens. Hence, this is perceived as too long TCP connection establishment delay.
      

      Version-Release number of selected component (if applicable):

      4.10.0
      

      How reproducible:

      Frequently at a concrete cluster. Other clusters with apparently similar configuration don't show the issue.
      

      Steps to Reproduce:

      1. Establish TCP connection from pod to external destination.
      2.
      3.
      

      Actual results:

      SYN packets dropped, long TCP establishment time, leading to timeouts.
      

      Expected results:

      No drops
      

      Additional info:

      This becomes especially harmful because it impacts communication from openshift-apiserver (not to be confused with kube-apiserver) and etcd, because the former is inside the SDN and etcd isn't.
      
      More details will follow in comments.
      

              trozet@redhat.com Tim Rozet
              rhn-support-palonsor Pablo Alonso Rodriguez
              Ross Brattain Ross Brattain
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: