Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: RHODS_1.27.0_GA
Component/s: Pipelines
Labels:
- MLOps
- groomed

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
None
Affects Testing:

Testable
Automated:
No
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Regression:
No
Target Release:

FUTURE_GA
Test Blocker:
No
Test Coverage:

Pending
Watchlist Impact:
None
Intelligence Requested:
Market:
PX Impact Score:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

As part of the first results of the pipelines scale test, we observed an issue with the DSPApplication:

NAME                                                   READY   STATUS             RESTARTS       AGE     IP             NODE               NOMINATED NODE   READINESS GATES
ds-pipeline-persistenceagent-sample-5ffbbb6664-wdlr6   1/1     Running            1 (2m2s ago)   4m14s   10.129.2.85    e26-h22-000-r650   <none>           <none>
ds-pipeline-sample-689cd9c5c7-tc85t                    1/2     CrashLoopBackOff   6 (51s ago)    4m14s   10.129.2.84    e26-h22-000-r650   <none>           <none>
ds-pipeline-scheduledworkflow-sample-54c794564-mkm8c   1/1     Running            0              4m14s   10.128.2.104   e26-h21-000-r650   <none>           <none>
ds-pipeline-ui-sample-5496f698fd-dmb8v                 2/2     Running            0              4m13s   10.128.2.105   e26-h21-000-r650   <none>           <none>
mariadb-sample-85d75766f8-wjt2g                        1/1     Running            0              4m14s   10.128.2.107   e26-h21-000-r650   <none>           <none>
minio-deployment-84695968cd-kwt4p                      3/3     Running            0              5m17s   10.131.1.150   e26-h18-000-r650   <none>           <none>

the logs of the ds-pipeline-sample-689cd9c5c7-tc85t container says:

[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.411863       1 client_manager.go:160] Initializing client manager
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.411945       1 config.go:74] Config DBConfig.ExtraParams not specified, skipping
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.557862       1 client_manager.go:414] We already own sample
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558234       1 swf.go:64] (Expected when in cluster) Failed to create scheduled workflow client by out of cluster kubeconfig. Error: stat /.kube/config: no such file or directory
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558244       1 swf.go:66] Starting to create scheduled workflow client by in cluster config.
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558629       1 client_manager.go:203] Client manager initialized successfully
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] I0526 09:32:36.558816       1 main.go:182] Samples already loaded in the past. Skip loading.
[pod/ds-pipeline-sample-689cd9c5c7-tc85t/ds-pipeline-api-server] F0526 09:32:36.559805       1 main.go:70] Failed to create default experiment. Err: Failed to create default experiment. Err: Already exist error: Failed to create a new experiment. The name Default already exists. Please specify a new name.

The automation never passed the application deployment stage, so the pod didn't ever go in a sane state.
~~The application is deployed in a freshly create namespace.~~
EDIT: the application was not running from a fresh namespace, after double checking.
This wasn't expected from the scale test, but it doesn't explain why the application doesn't load.

Application deployment logs/artifacts are in this directory.

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Create a DSPApplication
Wait for all the deployments to turn ready

Actual results:

Observe that the ds-pipeline deployment never turn ready

Expected results:

The deployment turns ready

Reproducibility (Always/Intermittent/Only Once):

intermittent

Build Details:

RHODS 1.27-2023-05-17

Workaround:

Additional info:

automated run executed in a bare-metal cluster

Assignee:: Unassigned

Reporter:: Kevin Pouget

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/05/26 1:10 PM

Updated:: 2025/06/11 11:40 PM

Details

Description

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Actual results:

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

Attachments

Easy Agile Planning Poker

Activity

People

Dates