Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Pipelines
Labels:

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

SFDC Cases Links:
SFDC Cases Counter:

Description

Deploy type

Manually deployed Kfdef

Version

v1.2.0

Environment

K8s Version: v1.25.11+1485cc9)
OCP Version: 4.12.26_1555

Current Behavior

Sometimes we see DSPA deployment get stalled at:

2023-09-07T14:37:45Z	INFO	Performing Database Health Check	{"namespace": "mission-code-data", "dspa_name": "pipelines-definition"}

Due to https://github.com/opendatahub-io/data-science-pipelines-operator/issues/280 after some time we see:

2023-09-07T14:39:57Z	INFO	Unable to connect to Database	{"namespace": "mission-code-data", "dspa_name": "pipelines-definition"}

And it just repeats. The mariadb pod was the one deployed by default, and I was able to successfully access the pod and the `mlpipeline` (default db) and run `SELECT 1` which seems to be our test for checking db connection.

~ $ oc port-forward -n mission-code-data service/mariadb-pipelines-definition 3306
Forwarding from 127.0.0.1:3306 -> 3306
Forwarding from [::1]:3306 -> 3306
Handling connection for 3306

~ $ mysql --host=127.0.0.1 --port=3306 --user=root
MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mlpipeline         |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.082 sec)

MariaDB [(none)]> use mlpipeline
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [mlpipeline]> select 1;
+---+
| 1 |
+---+
| 1 |
+---+
1 row in set (0.054 sec)

Expected Behavior

DSPA comes up just fine when default mariadb connects, with no failures to connect. We may expect to see "Performing Database Health Check", a couple of times IF the mariadb pod is still coming up, but if the pod is available, we expect this check to succeed relatively fast, in seconds.

Steps To Reproduce

Seems flaky, do not yet know how to consistently reproduce.

Workaround (if any)

# Set this to your dspa namespace (if using standalone) or your data science project (if using odh)
namespace=my-ds-project

dspa=pipelines-definition

patch='{"spec":{"database":{"disableHealthCheck":true}}}'
oc -n namespace patch dspa ${DSPA_NAME} --type=merge -p ${patch}

# Can wait for db connection timeout, takes ~5 min, or just delete the dsp operator pod
oc delete -n odh-applications pod $(oc get pods -n odh-applications -l app.kubernetes.io/name=data-science-pipelines-operator --no-headers=true | awk '{print $1}')

Anything else

Note that when using the odh-dashboard users are faced with the following prompt:

We encountered an error creating or loading your pipeline server. To continue, delete this pipeline server and create a new one. Deleting this pipeline server will delete all of its resources, including pipelines, runs, and jobs.

Migrated from GitHub: https://github.com/opendatahub-io/data-science-pipelines-operator/issues/320

Attachments

Issue Links

links to

opendatahub-io/data-science-pipelines-operator#320: [Bug]: DSPO getting stuck at "Performing Database Health Check"

Activity

People

Assignee:: Unassigned

Reporter:: Humair Khan

Team:: RHOAI Data Science Pipelines

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2024/01/11 5:29 PM

Updated:: 2024/02/07 10:59 PM

Resolved:: 2024/02/07 10:59 PM