Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28645

EFS CSI performance degradation due to CPU limits

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.13, 4.12, 4.11, 4.10, 4.14, 4.15, 4.16
    • Storage
    • None
    • Important
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • Hide
      ----- edited -----
      * Previously, CPU limits for the AWS EFS CSI driver container could cause performance degradation of volumes managed by the AWS EFS CSI Driver Operator. With this release, the CPU limits from the AWS EFS CSI driver container have been removed to help prevent potential performance degradation.
      ----- original -----
      The CPU limits from the EFS CSI driver container were removed to prevent potential performance degradation.
      Show
      ----- edited ----- * Previously, CPU limits for the AWS EFS CSI driver container could cause performance degradation of volumes managed by the AWS EFS CSI Driver Operator. With this release, the CPU limits from the AWS EFS CSI driver container have been removed to help prevent potential performance degradation. ----- original ----- The CPU limits from the EFS CSI driver container were removed to prevent potential performance degradation.
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-28551. The following is the description of the original issue:

      Description of problem:

      When a EFS based volume is mounted by the driver (csi-driver) in the daemonset aws-efs-ci-driver-node a new stunnel process is also launched. This process, used to encrypt the I/O traffic of the NFS filesystem, that can be CPU intensive under load conditions, becomes throttled by the the CPU limits configured on the csi-driver container (100m) https://github.com/openshift/aws-efs-csi-driver-operator/blob/release-4.16/assets/node.yaml#L81-L83
      
      This CPU throttling is leading to a high performance degradation of all volumes managed by the operator.

      How reproducible:

      Create a pod with a EFS pvc attached and run a simple performance test on this volume
      
      i.e:
      fio --ioengine=libaio --iodepth=4 --runtime=60 --bs=1MiB --time_based=1 --
      filename=file --rw=read --size=2GiB --name=readjob --direct=1
      
      Repeat the previous test after removing cpu limits of the csi-driver container of the daemonset aws-efs-ci-driver-node. This can be done by configuring the resource ClusterCSIDriver/efs.csi.aws.com to Unmanaged state
          

      Results using the default configuration:

      sh-5.2$ fio --ioengine=libaio --iodepth=4 --runtime=60 --bs=1MiB --time_based=1 --filename=file --rw=read --size=2GiB --name=readjob --direct=1
      readjob: (g=0): rw=read, bs=(R) 977KiB-977KiB, (W) 977KiB-977KiB, (T) 977KiB-977KiB, ioengine=libaio, iodepth=4
      <truncated>
      READ: bw=95.2MiB/s (99.9MB/s), 95.2MiB/s-95.2MiB/s (99.9MB/s-99.9MB/s), io=5717MiB (5995MB), run=60031-60031msec

       

      Results after removing cpu limits

      sh-5.2$ fio --ioengine=libaio --iodepth=4 --runtime=60 --bs=1MiB --time_based=1 --filename=file --rw=read --size=2GiB --name=readjob --direct=1
      readjob: (g=0): rw=read, bs=(R) 977KiB-977KiB, (W) 977KiB-977KiB, (T) 977KiB-977KiB, ioengine=libaio, iodepth=4
      <truncated>
      READ: bw=507MiB/s (532MB/s), 507MiB/s-507MiB/s (532MB/s-532MB/s), io=29.7GiB (31.9GB), run=60006-60006msec
      

            rhn-support-tsmetana Tomas Smetana
            openshift-crt-jira-prow OpenShift Prow Bot
            Rohit Patil Rohit Patil
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: