-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
None
-
False
-
None
-
False
-
No
-
No
-
No
-
None
Description of problem:
on a OSD cluster in stage with RHODS installed as an add-on (1.11.0-5) (together with the GPU add-on) I am seeing random but somewhat frequent restarts for JupyterHub and Traefik pods.
I was only able to see a restart happening live, and one of the Traefik containers in one of the three traefik pods failing seems to have caused a restart for the current JH leader pod.
There is also one pod in the GPU add-on namespace (controller-manager) with roughly the same number of restarts. I am attaching its logs as well, as it might be what is causing the issue (given the fact that we have not seen these restarts in clusters where the gpu add-on was not installed).
Prerequisites (if any, like setup, operators/versions):
RHODS 1.11.0-5 installed as add-on on OSD
Steps to Reproduce
- Install RHODS
- (Maybe install GPU add-on?)
- Keep using as normal, restarts seem to have started a few hours (4?) after RHODS was first installed
Actual results:
Multiple restarts (5+) on JH and Traefik pods, at seemingly random times
Expected results:
No restarts for JH, a ~couple of restarts for Traefik during install
Reproducibility (Always/Intermittent/Only Once):
Observed on one cluster only
Build Details:
OSD running OCP 4.10 latest, RHODS 1.11.0-5, GPU add-on v.1.10.1