Type: Epic
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- crun
- runc

Epic Name:
Schedule crun process on the first CPU within the cgroups cpuset for container on isolated CPUs
Work Type:
BU Product Work
Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Epic Status:
To Do
Feature Link:
OCPSTRAT-1292 - Don't interrupt pinned CPU pods by exec probes
Parent Link:
OCPSTRAT-1292Don't interrupt pinned CPU pods by exec probes
Hierarchy Progress Bar:

38% To Do, 19% In Progress, 44% Done
Size:
L
Product Sponsor:
Telco 5G RAN

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Portfolio Solutions:

Intelligence Requested:
Market:

Epic Goal

This epic will introduce optional functionality in crun to direct exec operations to specific CPU in a guaranteed QoS class pod.

Originally, this was going to be a direct port of commit afc23e33 which was recently done in runc. However, it has now been decided that this was not the right approach (see https://github.com/opencontainers/runc/pull/4283 ). A new proposal to allow the CPU affinity to be specified for executed processes has been made for the runtime-spec (see https://github.com/opencontainers/runtime-spec/pull/1253 ) which would then be used by runc/crun to affine the new process to specific CPUs.

This change will likely require changes in cri-o to take advantage of the new runc/crun functionality and would also need a way to trigger this for a pod/container (e.g. a new annotation).

Why is this important?

Quoting from ~~RFE-5011~~:

When a partner develops a containerized DPDK application, it will want to give full exclusive CPU access to the busy-loop polling threads. It is also possible that some housekeeping process will be running on a separate CPU inside the same pod.

However, we have seen that certain common Kubernetes operations cannot be done when this type of configuration is run on the RT kernel. The list of these operations is:

Running oc exec / oc rsh / oc cp / oc rsync on the pod
Having exec probes for livenessProbe or readynessProbe
Having an exec postStart or preStop hook

Those operations cannot be done because the new processes started in the pod will run at a non-RT priority, and could land on one of the CPUs running the busy-loop polling threads. This can add latency to the DPDK application, and in a worse case scenario cause a deadlock between the non-RT process and some kernel thread. Several support cases have been opened, where the vmcore crash analysis showed the issue.

Currently, there is no control over the CPU(s) where a newly exec'ed process will be run on a pod. If we had a way to ensure the new process will not run on the CPUs owned by the busy-loop polling threads, we would be able to run those common admin tasks on the pod.

Scenarios

See above for a list of the operations which can cause an issue for a guaranteed QoS pod running an RT application with busy-loop threads.

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
Pods without the new annotation preserve the existing behavior - new processes in a pod are distributed to any CPU in the pod.
Pods with the new annotation have new processes assigned to the first CPU in the pod.

Dependencies (internal and external)

None

Previous Work:

runc commit afc23e33

Open questions::

Name for the annotation that will trigger this behavior.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work:

Open questions::

Done Checklist

Attachments

Easy Agile Planning Poker

Activity

People

Dates