Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.14.0
Affects Version/s: 4.13.0, 4.14.0
Component/s: Performance Addon Operator
Labels:

Severity:
Critical
Regression:
No
Sprint:
CNF Compute Sprint 238, CNF Compute Sprint 239, CNF Compute Sprint 240, CNF Compute Sprint 241, CNF Compute Sprint 242, CNF Compute Sprint 243
sprint_count:
6
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:
Latest Status Summary:

Hide
9/26: crun/runc patches actively worked on by node team - new PR posted
9/5: 9.3 kernel API change merged, waiting on QE to allow the 9.2 backport process to start
8/21: pending on fix for RHELPLAN-161539, which has RHEL 9 merge requests up; Green
7/25: workaround being used successfully, QE is unblocked & TCFP; full solution pending RHELPLAN-161539

Show
9/26: crun/runc patches actively worked on by node team - new PR posted 9/5: 9.3 kernel API change merged, waiting on QE to allow the 9.2 backport process to start 8/21: pending on fix for RHELPLAN-161539, which has RHEL 9 merge requests up; Green 7/25: workaround being used successfully, QE is unblocked & TCFP; full solution pending RHELPLAN-161539
RH Private Keywords:
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

All burstable pods run on cpus specified as reserved in the PerformanceProfile. This is really caused by the *systemd.cpu_affinity=<reserved>* kernel argument which is applied via the PerfProfile.

This is problematic, because reserved can be just 1 - 4 cpus and the node capacity will allow many dozens pods to be crammed there. Not speaking about the infrastructure components...

All pods should really run in the "isolated" space.

Version-Release number of selected component (if applicable):

4.14 CI builds of OCP as of today and yesterday (2023-06-15+) at least.

How reproducible:

Always

Steps to Reproduce:

1. oc debug node/<worker>
2. Find any burstable container process and run `taskset -pc <pid>`
3. Observe the cpu affinity contains all cpus on the node

4. Add a kernel argument systemd.cpu_affinity=0 (example MachineConfig is attached)
5. oc debug node/<worker>
6. Find any burstable container process again and run `taskset -pc <pid>`
7. Observe the cpu affinity again

Actual results:

Step 7. affinity matches the kernel argument value from step 4.

Expected results:

The step 7. affinity matches the affinity from steps 2-3.

Additional info:

The cgroup of the burstable pod is set up properly and contains all cpus (/sys/fs/cgroups/cpuset/kubepods.slice/.../cpuset.cpus) as expected.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mc-crioaff-1.yaml
1 kB
2023/07/18 10:54 AM
mc-systemdaffonly.yaml
0.3 kB
2023/06/16 6:32 PM

blocks

OCPBUGS-16904 All nodes Memory cgroup out of memory after stress testing in 4.13.x

Closed

OCPBUGS-20365 [4.13] All burstable pods run with the reserved cpu affinity mask when PerformanceProfile is applied

Closed

CNV-28721 [2196459] [DPDK checkup] Pods are scheduled on reserved instead of isolated CPUs

Closed

duplicates

OCPBUGS-14755 BestEffort pod has reserved cpuset affinity in 4.13 PerformanceProfile enabled

Closed

is cloned by

OCPBUGS-20365 [4.13] All burstable pods run with the reserved cpu affinity mask when PerformanceProfile is applied

Closed

relates to

OCPBUGS-27834 Burstable pods have reserved cpu affinity when performance profile is applied

Closed

links to

https://github.com/containers/crun/pull/1315

mentioned on

Merge request - Updated US source to: 39250f0 Merge pull request #1561 from tliu2021/OCPBUGS-13634

Merge request - Updated US source to: b4e02e0 Merge pull request #1569 from josephdrichard/enable_pins_directly

Merge request - Updated US source to: b46e825 Merge pull request #1568 from sabbir-47/add-deprecation-warning-extraManifestPath

Merge request - Updated US source to: b8559bc Merge pull request #1565 from tliu2021/SRIOV-related-kernel-arg

(1 relates to, 1 links to, 4 mentioned on)

Assignee:: Martin Sivak

Reporter:: Martin Sivak

QA Contact:: Mallapadi Niranjan

Contributors:: Peter Hunt

Votes:: 0 Vote for this issue

Watchers:: 21 Start watching this issue

Created:: 2023/06/16 6:32 PM

Updated:: 2024/01/31 1:00 PM

Resolved:: 2023/10/26 4:30 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates