Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29066

EFS CSI performance degradation due to CPU limits

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.13, 4.12, 4.11, 4.10, 4.14, 4.15, 4.16
    • Storage
    • None
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, CPU limits applied on the Amazon Elastic File System (EFS) Container Storage Interface (CSI) driver container caused performance degradation issues for I/O operations to EFS volumes. Now, the CPU limits for the EFS CSI driver are removed so the performance degradation issue no longer exist. (link:https://issues.redhat.com/browse/OCPBUGS-29066[*OCPBUGS-29066*])
      -----------------
      The CPU limits from the EFS CSI driver container were removed to prevent potential performance degradation.
      Show
      * Previously, CPU limits applied on the Amazon Elastic File System (EFS) Container Storage Interface (CSI) driver container caused performance degradation issues for I/O operations to EFS volumes. Now, the CPU limits for the EFS CSI driver are removed so the performance degradation issue no longer exist. (link: https://issues.redhat.com/browse/OCPBUGS-29066 [* OCPBUGS-29066 *]) ----------------- The CPU limits from the EFS CSI driver container were removed to prevent potential performance degradation.
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-28979. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-28823. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-28645. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-28551. The following is the description of the original issue:

      Description of problem:

      When a EFS based volume is mounted by the driver (csi-driver) in the daemonset aws-efs-ci-driver-node a new stunnel process is also launched. This process, used to encrypt the I/O traffic of the NFS filesystem, that can be CPU intensive under load conditions, becomes throttled by the the CPU limits configured on the csi-driver container (100m) https://github.com/openshift/aws-efs-csi-driver-operator/blob/release-4.16/assets/node.yaml#L81-L83
      
      This CPU throttling is leading to a high performance degradation of all volumes managed by the operator.

      How reproducible:

      Create a pod with a EFS pvc attached and run a simple performance test on this volume
      
      i.e:
      fio --ioengine=libaio --iodepth=4 --runtime=60 --bs=1MiB --time_based=1 --
      filename=file --rw=read --size=2GiB --name=readjob --direct=1
      
      Repeat the previous test after removing cpu limits of the csi-driver container of the daemonset aws-efs-ci-driver-node. This can be done by configuring the resource ClusterCSIDriver/efs.csi.aws.com to Unmanaged state
          

      Results using the default configuration:

      sh-5.2$ fio --ioengine=libaio --iodepth=4 --runtime=60 --bs=1MiB --time_based=1 --filename=file --rw=read --size=2GiB --name=readjob --direct=1
      readjob: (g=0): rw=read, bs=(R) 977KiB-977KiB, (W) 977KiB-977KiB, (T) 977KiB-977KiB, ioengine=libaio, iodepth=4
      <truncated>
      READ: bw=95.2MiB/s (99.9MB/s), 95.2MiB/s-95.2MiB/s (99.9MB/s-99.9MB/s), io=5717MiB (5995MB), run=60031-60031msec

       

      Results after removing cpu limits

      sh-5.2$ fio --ioengine=libaio --iodepth=4 --runtime=60 --bs=1MiB --time_based=1 --filename=file --rw=read --size=2GiB --name=readjob --direct=1
      readjob: (g=0): rw=read, bs=(R) 977KiB-977KiB, (W) 977KiB-977KiB, (T) 977KiB-977KiB, ioengine=libaio, iodepth=4
      <truncated>
      READ: bw=507MiB/s (532MB/s), 507MiB/s-507MiB/s (532MB/s-532MB/s), io=29.7GiB (31.9GB), run=60006-60006msec
      

              rhn-support-tsmetana Tomas Smetana
              openshift-crt-jira-prow OpenShift Prow Bot
              Rohit Patil Rohit Patil
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: