Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29073

OpenShiftSDN cluster network slowness on Nutanix

    XMLWordPrintable

Details

    • Important
    • No
    • SDN Sprint 250
    • 1
    • False
    • Hide

      None

      Show
      None
    • 18-March the remaining cu support case is waiting for verification that after the Nutanix upgrade the slow response time issues are not repeated. With a cu ack or no response this bug could be closed.
    • involves Nutanix CSI Operator; can close

    Description

      Description of problem:

      Customer is using OpenShiftSDN on 3 clusters installed on Nutanix. They are having low pod to pod bandwidth on the cluster network when the nodes hosting them are running in 2 different virtualization hosts, they are using regular MTU (1500 on the NIC, 1450 on the cluster network), we measured the bandwidth with iperf3
      Even though the bandwidth measured on the host network is ~24 Gbit/s, the bandwidth is lower when we are measuring it on the cluster network, this has been tested with some daemonsets running an image with iperf3 sitting both on the cluster network and the host network.
      
      - we were able to reach ~24 Gbit/s when the pods are running on the host network in 2 different virtualization hosts.
      - when the pods are on the cluster network and running on 2 different nodes in 2 different virtualization hosts the bandwidth have 12 Mbit/s and takes a couple of seconds to ramp up.
      - when the 2 pods are on the cluster network, 2 different nodes and the same virtualization hosts the bandwidth is ~6.5 Gbit/s.
      - one of the cluster is a test one, we migrated it to OVN kubernetes and reached ~7 Gbit/s pod to pod, 2 different virtualization hosts.
      

      Version-Release number of selected component (if applicable):

      OCP 4.12.35

      How reproducible:

      Customer has 3 clusters affected, the issue is exactly the same on all of their clusters.

      Steps to Reproduce:

      Only in customer environment so far.

      Actual results:

      cluster network is having slow performance.

      Expected results:

      We are expecting that the cluster network is not that slow when compare to the bandwidth available at host network level

      Additional info:

          

      Attachments

        Activity

          People

            bpickard@redhat.com Ben Pickard
            fcristin1@redhat.com Francesco Cristini
            Zhanqi Zhao Zhanqi Zhao
            Chris Fields
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: