-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
RHOAI_2.8.0
-
False
-
-
False
-
No
-
No
-
-
-
Testable
ISSUE :
OpenShift AI operator deployments/pods are getting created without requests.memory, limits.memory, requests.cpu, limits.cpu
If there is a compute quota created in the project redhat-ods-applications , few deployments are failing to create pods and few other dependent pods are getting stuck with CrashLoopBackOff state.
More Details :
If customer creates a compute quota in project redhat-ods-applications , the pods like etcd and remove-deprecated-monitoring are not getting created as they do not have requests and limits defined for memory and cpu.
As a consequnce, all other OpenShift AI operator pods are stuck with CrashLoopBackOff state.
~~~
NAME READY UP-TO-DATE AVAILABLE AGE
deployment/data-science-pipelines-operator-controller-manager 0/1 1 0 4h
deployment/etcd 0/1 0 0 4h
deployment/modelmesh-controller 0/3 3 0 4h
deployment/odh-model-controller 0/3 3 0 4h
deployment/odh-notebook-controller-manager 0/1 1 0 4h
NAME COMPLETIONS DURATION AGE
job/remove-deprecated-monitoring 0/1 2d16h 4h
~~~
~~~
NAME READY STATUS RESTARTS AGE
data-science-pipelines-operator-controller-manager-6c7c69cq2m86 0/1 CrashLoopBackOff 50 4h
modelmesh-controller-68f7b58896-6htn8 0/1 CrashLoopBackOff 51 4h
modelmesh-controller-68f7b58896-9l4k2 0/1 CrashLoopBackOff 51 4h
modelmesh-controller-68f7b58896-rgpfd 0/1 CrashLoopBackOff 50 4h
odh-model-controller-546b6b7598-9zqp8 0/1 CrashLoopBackOff 50 4h
odh-model-controller-546b6b7598-pkbkb 0/1 CrashLoopBackOff 50 4h
odh-model-controller-546b6b7598-q8cjx 0/1 CrashLoopBackOff 50 4h
odh-notebook-controller-manager-8594b9fc66-nzlkv 0/1 CrashLoopBackOff 50 4h
~~~
Deployments/pods are failing with below error :
~~~
5m Warning FailedCreate replicaset/etcd-6d5f977bbf Error creating: pods "etcd-xxx" is forbidden: failed quota: quota-name: must specify limits.cpu for: etcd-secret-creator; limits.memory for: etcd-secret-creator; requests.cpu for: etcd-secret-creator; requests.memory for: etcd-secret-creator
3m7s Warning FailedCreate job/remove-deprecated-monitoring (combined from similar events): Error creating: pods "remove-deprecated-monitoring-xxx" is forbidden: failed quota: quota-name: must specify limits.cpu for: oc-cli; requests.cpu for: oc-cli
~~~
It is required that all the pod must contain requests.memory, limits.memory, requests.cpu, limits.cpu to resolve the issue.
As a temporary solution, taking backup and deleting the compute quota present in the project redhat-ods-applications helped.
However as a permanent fix, the ask from this bug is to set requests and limits for memory and cpu for all the pods of RHOAI operator deployments.