Uploaded image for project: 'Red Hat OpenShift AI Engineering'
  1. Red Hat OpenShift AI Engineering
  2. RHOAIENG-6460

RHOAI operator installation failing when compute quota is created in the project redhat-ods-applications in RHOCP4

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • No
    • No
    • Testable

      ISSUE :

      OpenShift AI operator deployments/pods are getting created without requests.memory, limits.memory, requests.cpu, limits.cpu

      If there is a compute quota created in the project redhat-ods-applications , few deployments are failing to create pods and few other dependent pods are getting stuck with CrashLoopBackOff state.

      More Details :

      If customer creates a compute quota in project redhat-ods-applications , the pods like etcd and remove-deprecated-monitoring are not getting created as they do not have requests and limits defined for memory and cpu.
      As a consequnce, all other OpenShift AI operator pods are stuck with CrashLoopBackOff state.
      ~~~
      NAME                                                            READY   UP-TO-DATE   AVAILABLE   AGE
      deployment/data-science-pipelines-operator-controller-manager   0/1     1            0           4h
      deployment/etcd                                                 0/1     0            0           4h
      deployment/modelmesh-controller                                 0/3     3            0           4h
      deployment/odh-model-controller                                 0/3     3            0           4h
      deployment/odh-notebook-controller-manager                      0/1     1            0           4h

      NAME                               COMPLETIONS   DURATION   AGE
      job/remove-deprecated-monitoring   0/1           2d16h      4h
      ~~~
      ~~~
      NAME                                                              READY   STATUS             RESTARTS   AGE
      data-science-pipelines-operator-controller-manager-6c7c69cq2m86   0/1     CrashLoopBackOff   50         4h
      modelmesh-controller-68f7b58896-6htn8                             0/1     CrashLoopBackOff   51         4h
      modelmesh-controller-68f7b58896-9l4k2                             0/1     CrashLoopBackOff   51         4h
      modelmesh-controller-68f7b58896-rgpfd                             0/1     CrashLoopBackOff   50         4h
      odh-model-controller-546b6b7598-9zqp8                             0/1     CrashLoopBackOff   50         4h
      odh-model-controller-546b6b7598-pkbkb                             0/1     CrashLoopBackOff   50         4h
      odh-model-controller-546b6b7598-q8cjx                             0/1     CrashLoopBackOff   50         4h
      odh-notebook-controller-manager-8594b9fc66-nzlkv                  0/1     CrashLoopBackOff   50         4h
      ~~~

      Deployments/pods are failing with below error :
      ~~~
      5m          Warning   FailedCreate     replicaset/etcd-6d5f977bbf                                            Error creating: pods "etcd-xxx" is forbidden: failed quota: quota-name: must specify limits.cpu for: etcd-secret-creator; limits.memory for: etcd-secret-creator; requests.cpu for: etcd-secret-creator; requests.memory for: etcd-secret-creator
      3m7s        Warning   FailedCreate     job/remove-deprecated-monitoring                                      (combined from similar events): Error creating: pods "remove-deprecated-monitoring-xxx" is forbidden: failed quota: quota-name: must specify limits.cpu for: oc-cli; requests.cpu for: oc-cli
      ~~~

      It is required that all the pod must contain requests.memory, limits.memory, requests.cpu, limits.cpu to resolve the issue.

      As a temporary solution, taking backup and deleting the compute quota present in the project redhat-ods-applications helped.

      However as a permanent fix, the ask from this bug is to set requests and limits for memory and cpu for all the pods of RHOAI operator deployments.

            Unassigned Unassigned
            rhn-support-sdharma Suruchi Dharma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: