Description of problem:
As part of the first results of the pipelines scale test, we observed an issue with the DSPApplication:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ds-pipeline-persistenceagent-sample-5ffbbb6664-wdlr6 1/1 Running 1 (2m2s ago) 4m14s 10.129.2.85 e26-h22-000-r650 <none> <none> ds-pipeline-sample-689cd9c5c7-tc85t 1/2 CrashLoopBackOff 6 (51s ago) 4m14s 10.129.2.84 e26-h22-000-r650 <none> <none> ds-pipeline-scheduledworkflow-sample-54c794564-mkm8c 1/1 Running 0 4m14s 10.128.2.104 e26-h21-000-r650 <none> <none> ds-pipeline-ui-sample-5496f698fd-dmb8v 2/2 Running 0 4m13s 10.128.2.105 e26-h21-000-r650 <none> <none> mariadb-sample-85d75766f8-wjt2g 1/1 Running 0 4m14s 10.128.2.107 e26-h21-000-r650 <none> <none> minio-deployment-84695968cd-kwt4p 3/3 Running 0 5m17s 10.131.1.150 e26-h18-000-r650 <none> <none>
the logs of the ds-pipeline-sample-689cd9c5c7-tc85t container says:
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.411863 1 client_manager.go:160] Initializing client manager [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.411945 1 config.go:74] Config DBConfig.ExtraParams not specified, skipping [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.557862 1 client_manager.go:414] We already own sample [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558234 1 swf.go:64] (Expected when in cluster) Failed to create scheduled workflow client by out of cluster kubeconfig. Error: stat /.kube/config: no such file or directory [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558244 1 swf.go:66] Starting to create scheduled workflow client by in cluster config. [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558629 1 client_manager.go:203] Client manager initialized successfully [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558816 1 main.go:182] Samples already loaded in the past. Skip loading. [pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] F0526 09:32:36.559805 1 main.go:70] Failed to create default experiment. Err: Failed to create default experiment. Err: Already exist error: Failed to create a new experiment. The name Default already exists. Please specify a new name.
The automation never passed the application deployment stage, so the pod didn't ever go in a sane state.
The application is deployed in a freshly create namespace.
EDIT: the application was not running from a fresh namespace, after double checking.
This wasn't expected from the scale test, but it doesn't explain why the application doesn't load.
Application deployment logs/artifacts are in this directory.
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
- Create a DSPApplication
- Wait for all the deployments to turn ready
Actual results:
Observe that the ds-pipeline deployment never turn ready
Expected results:
The deployment turns ready
Reproducibility (Always/Intermittent/Only Once):
intermittent
Build Details:
RHODS 1.27-2023-05-17
Workaround:
Additional info:
automated run executed in a bare-metal cluster