-
Bug
-
Resolution: Done
-
Blocker
-
RHOAI_2.9.0
-
2
-
False
-
-
False
-
No
-
No
-
Automated
-
-
-
2
-
RHOAI DW - 2
-
Testable
Ray job submitted to Ray cluster fails to initialize, showing Failed state with error:
"Unexpected error occurred: The actor died unexpectedly before finishing this task."
Unfortunately I wasn't able to reliably reproduce this behavior. It happens occasionally when running SDK example https://github.com/project-codeflare/codeflare-sdk/blob/main/demo-notebooks/guided-demos/2_job_client.ipynb
From logs it seems to be an issue with worker node not able to connect to head node, see attachment.