Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-4364

Downtime during node restart

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False
    • No
    • No
    • No
    • None
    • High

      Description of problem:

      The restart of the node lasted 7 minutes and there were 7 minutes of downtime. It looks like this could happen because the pods of some services are all in the same node, causing a loss of service while the node is restarting. For example, in the test I did, the black-box pods were down, and also when trying to spawn a notebook, the notebook only started when the node started again. 

      Prerequisites (if any, like setup, operators/versions):

      RHODS installed in a cluster with 2 worker nodes

      Steps to Reproduce

      1. Install RHODS
      2. restart a node
      3. Verify availability with rhods_aggregate_availability

      Actual results:

      There is a downtime of 7 minutes

      Expected results:

      There is no downtime

      Reproducibility (Always/Intermittent/Only Once):

      intermittent depends on the node that you're deleting

      Build Details:

      Workaround:

      Additional info:

              Unassigned Unassigned
              pablo-rhods Pablo Felix (Inactive)
              Pablo Felix Pablo Felix (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: