-
Story
-
Resolution: Done
-
Critical
-
openshift-4.21
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
None
-
None
-
None
-
OCP Node Sprint 278 (blue)
1. Summary
The CNV Descheduler is currently failing due to a dependency on PSI metrics. These metrics were disabled for all types of OpenShift Nodes to ensure cyclictest for Real-Time (RT) kernels pass, creating a conflict between telco environments and CNV descheduler functionality. This regression has now triggered an investigation into conditionally re-enabling PSI metrics (probably based on the node type).
Update: On further discussion with bwensley@redhat.com from Telco, there are low latency applications on non-RT kernels and hence the fix should not based on kernelType.
2. History and Context
PSI metrics were deliberately disabled in the cluster to address cyclictest issues on RT kernels. This fix appears to have introduced a regression in the CNV descheduler which relies on this data.
- Original Reason for Disablement (
OCPBUGS-37271): PSI metrics caused unacceptable latency overhead for latency-sensitive applications (Telco RAN DU deployments) running on RT kernels (5.14.0-427.22.1.el9_4.x86_64+rt kernel).
- Related Issue: https://issues.redhat.com/browse/OCPBUGS-37271
- CNV Descheduler Dependency: The CNV descheduler was developed with the assumption that PSI metrics would be available and is currently having issues without them.
- Timeline: The decision and implementation occurred in July 2024. The fix (psi=0) was verified and included in OCP 4.17.0-0.nightly-2024-07-25-212849.
- Current Discussion: The CNV team is trying to use MCO's priority (97-worker-generated-kubelet MC containing psi=0) which makes manually overriding. But it seems difficult.
3. Steps to Reproduce
- Ensure the environment has the configuration fix from MCO PR #4470 applied (i.e., PSI metrics are disabled/removed from monitoring/collection via the kernel command line, resulting in no /proc/pressure/cpu output or equivalent).
- TODO: Get the steps from the CNV team about the descheduler dependency
4. Expected Behavior
- The descheduler should get PSI metrics
- The Telco team’s should still have it disabled
5. Next Steps
- Kubernetes 1.34 which will be merged into OCP 4.21 has changes to the flag KubeletPSI. It defaults to true (https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/). Ensure that this flag is not overridden anywhere in OCP to set to false.
- Revert psi=0 set by the PR https://github.com/openshift/machine-config-operator/pull/4470/files and ensure “psi=1” is used. By default PSI is disabled at the kernel level hence requires passing “psi=1”
6. Dependent teams
- Performance team: Work with Telco team to re-run cyclictest after k8s 1.34 merge. Run the test on RT and non-RT kernels. There is already documentation available here: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/scalability_and_performance/cnf-performing-platform-verification-latency-tests#cnf-performing-end-to-end-tests-running-the-tests_cnf-latency-tests
- CNV Team: After PSI metrics are enabled, remove any patches to re-enable it again.
Questions:
- Get clarity on the failure seen in the descheduler. Can the logic in the descheduler be independently tested by the Node Team?
- is depended on by
-
OCPBUGS-62301 Evaluation of platform default kernel psi argument impact and Kube Descheduler Guidance
-
- New
-
- relates to
-
OCPNODE-3818 Document Kernel PSI Enablement in OpenShift 4.21
-
- To Do
-
-
OCPNODE-3819 Benchmark testing for PSI in OpenShift 4.21 with k8s 1.34
-
- To Do
-
- links to