Uploaded image for project: 'Open Data Hub'
  1. Open Data Hub
  2. ODH-404

In case there are multiple KfDefs on a cluster and one of them is faulty - operators gets stuck on the faulty one

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • operator
    • None

      Describe the bug
      In Operate First we use multiple `KfDef`s on the same cluster, when operator tries to apply them it basically loops through them. The problem is when one of the `KfDef`s is faulty and fails to apply (imagine a change/patch that is not allowed - like a change in storage class on a PVC), the operator gets stuck on this faulty `KfDef` and keeps retrying to apply it. That means it never gets to apply the other `KfDef`s and makes them wait to be processed - which never happens.

      time="2021-06-03T15:23:43Z" level=warning msg="Encountered error applying application jupyterhub: (kubeflow.error): Code 500 with message: Apply.Run : error when applying patch:\n\{\"metadata\":{\"annotations\":{\"kubectl.kubernetes.io/last-applied-configuration\":\"{\\\"apiVersion\\\":\\\"v1\\\",\\\"kind\\\":\\\"PersistentVolumeClaim\\\",\\\"metadata\\\":{\\\"annotations\\\":{\\\"kfctl.kubeflow.io/kfdef-instance\\\":\\\"opendatahub.opf-jupyterhub\\\",\\\"volume.beta.kubernetes.io/storage-class\\\":\\\"moc-nfs-csi\\\"},\\\"labels\\\":\{\\\"component.opendatahub.io/name\\\":\\\"jupyterhub\\\",\\\"opendatahub.io/component\\\":\\\"true\\\"},\\\"name\\\":\\\"jupyterhub-db\\\",\\\"namespace\\\":\\\"opf-jupyterhub\\\"},\\\"spec\\\":\{\\\"accessModes\\\":[\\\"ReadWriteOnce\\\"],\\\"resources\\\":{\\\"requests\\\":{\\\"storage\\\":\\\"1Gi\\\"}}}}\\n\",\"volume.beta.kubernetes.io/storage-class\":\"moc-nfs-csi\"}}}\nto:\nResource: \"/v1, Resource=persistentvolumeclaims\", GroupVersionKind: \"/v1, Kind=PersistentVolume...
      time="2021-06-03T15:23:43Z" level=warning msg="Will retry in 25 seconds."
      

      Expected behavior
      If operator faces an error, it should stop retrying to apply a faulty `KfDef` after a few attempts and not indefinitely.

      Screenshots
      https://github.com/opendatahub-io/opendatahub-operator/files/6592755/log.6.txt

      Additional context
      Related to: https://github.com/operate-first/support/issues/260

      Imported from: https://github.com/opendatahub-io/opendatahub-operator/issues/135

              llasmith@redhat.com Landon LaSmith
              tcoufal@redhat.com Tom Coufal
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: