-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.13.0
-
Critical
-
Yes
-
False
-
Description of problem:
When deploy GPU operator on OCP4.13.0-rc1 on BM, the GPU Operator pod nvidia-driver-daemonset-413.92.202303190222-0-fwmcq throw below warning, it cause the nvidia-driver fail to deploy WARNING: broken driver toolkit detected, using entitlement-based fallback [root@openshift-qe-018 ~]# oc get pods -n nvidia-gpu-operator E0328 10:30:48.787723 2435120 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request NAME READY STATUS RESTARTS AGE gpu-feature-discovery-8b5lt 0/1 Init:0/1 0 26h gpu-operator-d75d4dcb5-85sg5 1/1 Running 0 28h nvidia-container-toolkit-daemonset-bn7d2 0/1 Init:0/1 0 26h nvidia-dcgm-exporter-fhs26 0/1 Init:0/2 0 26h nvidia-device-plugin-daemonset-zplqh 0/1 Init:0/1 0 26h nvidia-driver-daemonset-413.92.202303190222-0-fwmcq 1/2 CrashLoopBackOff 275 (2m27s ago) 28h nvidia-operator-validator-xwtn8 0/1 Init:0/4 0 26h
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Setup OCP4.13.0-rc1 2. Deploy GPU Operator 3.
Actual results:
The GPU Operator fail to deploy
Expected results:
The GPU Operator deploy successfully
Additional info: