XMLWordPrintable

    • 2
    • RHOAI DW - 2
    • Testable

      Ray job submitted to Ray cluster fails to initialize, showing Failed state with error:
      "Unexpected error occurred: The actor died unexpectedly before finishing this task."

      Unfortunately I wasn't able to reliably reproduce this behavior. It happens occasionally when running SDK example https://github.com/project-codeflare/codeflare-sdk/blob/main/demo-notebooks/guided-demos/2_job_client.ipynb

       

      From logs it seems to be an issue with worker node not able to connect to head node, see attachment.

            mcampbel@redhat.com Mark Campbell
            ksuta Karel Suta
            Karel Suta Karel Suta
            RHOAI Distributed Workloads
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: