-
Bug
-
Resolution: Done-Errata
-
Normal
-
None
-
4.18
-
None
This is a clone of issue OCPBUGS-52853. The following is the description of the original issue:
—
Description of problem:
Back in 4.16.30 on Arm64 GraceHopper nodes in order for NVIDIA GPU validator to properly work when a performance profile was set on the system the following patch needed to be set:
apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
name: performance-patch
namespace: openshift-cluster-node-tuning-operator
spec:
profile:
- data: |
[main]
summary=Configuration changes profile inherited from performance created tuned
include=openshift-node-performance-openshift-node-performance-profile
[bootloader]
cmdline_iommu_arm=-iommu.passthrough=1
[service]
service.stalld=start,enable
name: performance-patch
recommend:
- machineConfigLabels:
machineconfiguration.openshift.io/role: master
priority: 19
profile: performance-patch
This is highlighted in KCS: https://access.redhat.com/solutions/7107635
However in 4.18 the above does not work when using SRIOV due to a recent commit in SRIOV: https://github.com/openshift/sriov-network-operator/blob/release-4.18/pkg/plugins/generic/generic_plugin.go#L441
Instead the following patch was required:
data: |
[main]
summary=Additional Cloud 5G RAN Application tuning
include=performance-patch
[bootloader]
# see https://github.com/openshift/cluster-node-tuning-operator/blob/release-4.18/assets/performanceprofile/tuned/openshift-node-performance#L172
cmdline_hugepages=default_hugepagesz=1G hugepagesz=1G hugepages=32
# DOES NOT WORK: based on KCS https://access.redhat.com/solutions/7107635 for GPU operator
# cmdline_iommu_arm=-iommu.passthrough=1
cmdline_iommu=-iommu.passthrough=1
cmdline_iommu=+ iommu.passthrough=0
We need a consistent patch method to ensure the validator issue is not hit.
Version-Release number of selected component (if applicable):4.18
How reproducible:
100%
Steps to Reproduce:
1. Install OCP
2. Install SRIOV + Performance Profile
3. Install NVIDIA GPU Operator and Cluster policy
Actual results:
Validator fails for GPU operator unless patch above is applied
Expected results:
GPU validator should just work
Additional info:
- clones
-
OCPBUGS-52853 iommu.passthrough for Arm64 GH nodes
-
- Closed
-
- is blocked by
-
OCPBUGS-52853 iommu.passthrough for Arm64 GH nodes
-
- Closed
-
- is depended on by
-
OCPBUGS-59290 [4.18] iommu.passthrough for Arm64 GH nodes
-
- POST
-
- links to
-
RHBA-2025:11363
OpenShift Container Platform 4.19.5 bug fix update