Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-8917

Data Science Pipelines Application unable to start

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • RHODS_1.27.0_GA
    • Pipelines
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Testable
    • No
    • No
    • No
    • Pending
    • None

      Description of problem:

      As part of the first results of the pipelines scale test, we observed an issue with the DSPApplication:

      NAME                                                   READY   STATUS             RESTARTS       AGE     IP             NODE               NOMINATED NODE   READINESS GATES
      ds-pipeline-persistenceagent-sample-5ffbbb6664-wdlr6   1/1     Running            1 (2m2s ago)   4m14s   10.129.2.85    e26-h22-000-r650   <none>           <none>
      ds-pipeline-sample-689cd9c5c7-tc85t                    1/2     CrashLoopBackOff   6 (51s ago)    4m14s   10.129.2.84    e26-h22-000-r650   <none>           <none>
      ds-pipeline-scheduledworkflow-sample-54c794564-mkm8c   1/1     Running            0              4m14s   10.128.2.104   e26-h21-000-r650   <none>           <none>
      ds-pipeline-ui-sample-5496f698fd-dmb8v                 2/2     Running            0              4m13s   10.128.2.105   e26-h21-000-r650   <none>           <none>
      mariadb-sample-85d75766f8-wjt2g                        1/1     Running            0              4m14s   10.128.2.107   e26-h21-000-r650   <none>           <none>
      minio-deployment-84695968cd-kwt4p                      3/3     Running            0              5m17s   10.131.1.150   e26-h18-000-r650   <none>           <none> 

      the logs of the ds-pipeline-sample-689cd9c5c7-tc85t container says:

      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.411863       1 client_manager.go:160] Initializing client manager
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.411945       1 config.go:74] Config DBConfig.ExtraParams not specified, skipping
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.557862       1 client_manager.go:414] We already own sample
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558234       1 swf.go:64] (Expected when in cluster) Failed to create scheduled workflow client by out of cluster kubeconfig. Error: stat /.kube/config: no such file or directory
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558244       1 swf.go:66] Starting to create scheduled workflow client by in cluster config.
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558629       1 client_manager.go:203] Client manager initialized successfully
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558816       1 main.go:182] Samples already loaded in the past. Skip loading.
      [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] F0526 09:32:36.559805       1 main.go:70] Failed to create default experiment. Err: Failed to create default experiment. Err: Already exist error: Failed to create a new experiment. The name Default already exists. Please specify a new name. 

      The automation never passed the application deployment stage, so the pod didn't ever go in a sane state.
      The application is deployed in a freshly create namespace.
      EDIT: the application was not running from a fresh namespace, after double checking.
      This wasn't expected from the scale test, but it doesn't explain why the application doesn't load.

      Application deployment logs/artifacts are in this directory.

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1. Create a DSPApplication
      2. Wait for all the deployments to turn ready

      Actual results:

      Observe that the ds-pipeline deployment never turn ready

      Expected results:

      The deployment turns ready

      Reproducibility (Always/Intermittent/Only Once):

      intermittent

      Build Details:

      RHODS 1.27-2023-05-17

      Workaround:

      Additional info:

      automated run executed in a bare-metal cluster

              Unassigned Unassigned
              kpouget2 Kevin Pouget
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: