Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-6370

[DSG] Starting workbenches should get the latest tolerations

    XMLWordPrintable

Details

    • False
    • None
    • False
    • Release Notes
    • Testable
    • No
    • 1.23.0
    • No
    • Hide
      == Workbenches failed to receive the latest toleration
      Previously, to acquire the latest toleration, users had to attempt to edit the relevant workbench, make no changes, and save the workbench again. Users can now apply the latest toleration change by stopping and then restarting their data science project's workbench.
      Show
      == Workbenches failed to receive the latest toleration Previously, to acquire the latest toleration, users had to attempt to edit the relevant workbench, make no changes, and save the workbench again. Users can now apply the latest toleration change by stopping and then restarting their data science project's workbench.
    • Bug Fix
    • No
    • Yes
    • None
    • RHODS 1.23

    Description

      Based on the discussion under RHODS-6052 we want to apply the latest tolerations on every workbench restart. Currently, users must either Edit the workbenches or re-create them.

      Sample scenario to be covered (by egranger@redhat.com )

      1. Admin has added a taint called "notebooksonly" to all nodes. 
      2. Admin then goes to the RHODS cluster settings and puts a toleration called "NotebooksOnly". 
      3. Admin then goes on vacation for a job well done. 
      4. Admin's cell phone starts blowing up with calls from users. 
      5. Nobody is able to spin up new workbenches because all nodes are tainted with "notebooksonly". 
      6. Admin's vacation is now ruined
      7. Admin goes back to RHODS cluster settings, facepalms, and changes the toleration from  "NotebooksOnly" to "notebooksonly". (or, changes the taint on the machine, to the reverse).
      8. All user pods that were pending get re-created with the newer toleration, and spin up successfully 
        (EG update 2023-01-17): Existing, already-running Notebook pods are not affected, not restarted.
        All user pods that were already running are not affected by the change (i.e., CR is not updated and the pod is not restarted).
        All user pods that were pending must be manually restarted by restarting the corresponding workbenches
        All user pods that were stopped before tolerations changes will get the new tolerations settings as soon they are started.

      Attachments

        Issue Links

          Activity

            People

              aballantyne Andrew Ballantyne
              rhn-support-bdattoma Berto D'Attoma
              Berto D'Attoma Berto D'Attoma
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: