-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
None
-
False
-
-
False
-
-
-
Low
Description of problem:
as part of the large scale testing (2000 users), we observe that when many users cannot get a notebook (because the cluster is under-sized), the Dashboard shows various errors highlighting that the control plane is overloaded.
Actual results:
- Stuck in the notebook creation page with the create workbench button disabled
(no message in the console, which confuses me)
- Unable to load the dashboard
with this message in the console:
{'level': 'SEVERE', 'message': 'https://rhods-dashboard-redhat-ods-applications.apps.odsci-pr665-sutest-1643308726075002880.psap.aws.rhperfscale.org/api/status - Failed to load resource: the server responded with a status of 504 (Gateway Time-out)', 'source': 'network', 'timestamp': 1680637096177}]
- Dashboard not loading
with this message in the console
{'level': 'SEVERE', 'message': 'https://rhods-dashboard-redhat-ods-applications.apps.odsci-pr665-sutest-1643308726075002880.psap.aws.rhperfscale.org/api/status - Failed to load resource: the server responded with a status of 504 (Gateway Time-out)', 'source': 'network', 'timestamp': 1680637592436}
- stuck in the project creation page
with the api/status 504 error
- [stuck waiting for resource list|]
with these messages in the console:
[{'level': 'SEVERE', 'message': 'https://rhods-dashboard-redhat-ods-applications.apps.odsci-pr665-sutest-1643308726075002880.psap.aws.rhperfscale.org/ - Failed to load resource: the server responded with a status of 403 (Forbidden)', 'source': 'network', 'timestamp': 1680636423860}, {'level': 'SEVERE', 'message': 'https://rhods-dashboard-redhat-ods-applications.apps.odsci-pr665-sutest-1643308726075002880.psap.aws.rhperfscale.org/app.bundle.js 1:827699 "Error fetching notebook events" wi: Call to /api/v1/namespaces/psapuser1000/events?fieldSelector=involvedObject.kind%3DPod%2CinvolvedObject.uid%3Dd796366f-e847-47eb-9b3b-5ed776e398e5 timed out after 60000ms\n at https://rhods-dashboard-redhat-ods-applications.apps.odsci-pr665-sutest-1643308726075002880.psap.aws.rhperfscale.org/app.bundle.js:2:121975', 'source': 'console-api', 'timestamp': 1680637037273},
- see also RHODS-7872, no error message shown when the notebook pod cannot be scheduled
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
working on a reproducer
Expected results:
- the dashboard does not overload the APIServer when Pods cannot be scheduled. We need to work on it together to better understand what's happening and how to prevent it
==> moved to a dedicated ticket RHODS-7874
- the dashboard shows better/more user-friendly errors when the APIServer returns 50x error codes
Reproducibility (Always/Intermittent/Only Once):
Build Details:
Workaround:
Additional info:
- is incorporated by
-
RHOAIENG-1154 Handle dashboard oauth cookie expiration while user is using dashboard
- Backlog
- relates to
-
RHOAIENG-915 Dashboard doesn't show any error message when a notebook pod cannot be scheduled
- Backlog