Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8946

[QA][PyTorch UT][CPU, sGPU] test/test_dataloader.py - TestDataLoaderPersistentWorkers failures

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False

      Description

      Summary: DataLoader persistent workers tests failing with worker process errors on CPU and sGPU platforms.

      Test Class: test/test_dataloader.py::TestDataLoaderPersistentWorkers
      Number of Failing Tests: 2
      Platform: CPU, sGPU (CUDA)
      Test Type: Unit Test

      Version Information:

      • PyTorch Commit: 6bdd8c9
      • Branch: main
      • Test Date: 2026-01-14
      • Sprint: Sprint 24

      Failure Pattern:
      Tests failing with related errors - likely common root cause (same as TestDataLoader)

      Common Error:
      code
      ValueError: Caught ValueError in DataLoader worker process 0.
      Original Traceback (most recent call last):
      File "/miniconda/envs/cuda_torch_build/lib/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 358, in _worker_loop
      data = fetcher.fetch(index)
      code

      Failing Tests:
      1. test_multiprocessing_iterdatapipe
      2. test_segfault

      Steps to Reproduce:
      code
      TEST_CONFIG=cpu python3 test/run_test.py -i test_dataloader
      TEST_CONFIG=cuda python3 test/run_test.py -i test_dataloader
      code

      Expected Result:
      Tests should pass without worker process errors

      Actual Result:
      Tests fail with ValueError in DataLoader worker process with persistent workers

      Root Cause Analysis:
      Same root cause as TestDataLoader - worker processes encountering errors related to numpy/pandas binary incompatibility when using persistent workers.

      Priority: P3

              rh-ee-sugeorge Subin George
              pytorch-engineering PyTorch Engineering
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: