Uploaded image for project: 'OpenShift Node'
  1. OpenShift Node
  2. OCPNODE-2864

Update cri-o to schedule exec process on the first CPU within the cgroups cpuset for container on isolated CPUs

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None

      The OCI runtime spec has been updated to allow a process to be scheduled on a specific CPU (or CPUs) in a container - see the new execCPUAffinity option in the process object (see OCPNODE-2320 for details).

      The crun runtime has now been updated to support the new execCPUAffinity option (see RUN-2374 for details). The equivalent runc change is in progress under RUN-2420.

      The final step is to update cri-o to add the execCPUAffinity to the process object when it handles an exec on an existing container. This would only be done for execs being run on containers that meet the following criteria:

      • Container is part of a pod with Guaranteed QoS (i.e. Memory/CPU requests/limits match for all containers).
      • Container uses a whole number of CPUs.

      In that case, the container is running on "isolated" CPUs and is likely running a latency sensitive application. In that case, the convention is that the first CPU in the container is used for housekeeping and so the execCPUAffinity should be set to the first CPU in the container's cpuset.

      There are additional criteria that should be discussed:

      • We could also require that the high performance hooks are installed and the cpu-load-balancing.crio.io annotation is set to "disable". Without that annotation, the execed process could get moved to one of the other CPUs anyways after it starts.
      • We could require a new annotation on the pod to trigger the use of execCpuAffinity. The drawback of this is it is inconvenient for the user as they must remember to add another annotation and they may forget, leading to difficult to diagnose issues.

      We should discuss the options here and then I will update the story.

      Also note that the background on why this change is necessary is discussed in RFE-5011 and in https://github.com/opencontainers/runc/issues/3922 - note that the RHEL kernel behaviour regarding whether random CPU assignment occurs has changed (and could change again). We shouldn't rely on this and should enforce where the execed processes are run.

       

              pehunt@redhat.com Peter Hunt
              bwensley@redhat.com Bart Wensley
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: