Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-5359

GitLab CI pipeline fails with permission denied error when different runners try to create shared directories

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False
    • PyTorch Sprint 15
    • Important

      Description of problem:

      The PyTorch CI pipeline fails when different GitLab runners (running under different user accounts) attempt to create directories in the shared temporary storage location /tmp/pytorch-ci-shared/. This results in permission denied errors and pipeline failures.    

      Root Cause:

      Multiple GitLab runners registered under different users (rrathaur, gitlab-runner, root) are trying to write to the same shared directory path, causing permission conflicts.

      Steps to Reproduce:

      1. Run a GitLab CI pipeline on a runner registered under user A (e.g., rrathaur)
      2. The pipeline creates /tmp/pytorch-ci-shared/ owned by user A
      3. Run another pipeline on a runner registered under user B (e.g., gitlab-runner)
      4. Pipeline fails when trying to create subdirectories with mkdir: cannot create directory '/tmp/pytorch-ci-shared/13106808': Permission denied     

      Actual results:

      $ mkdir -p ${SHARED_DIR}
      mkdir: cannot create directory '/tmp/pytorch-ci-shared/13106808': Permission denied
      ERROR: Job failed: exit status 1    

      Expected results:

      All GitLab runners should be able to create and access shared directories regardless of which user account they're running under.    

      Error Details:

      Running with gitlab-runner 18.3.1 (5a021a1c)
        on intel-eaglestream-spr-16.khw.eng.rdu2.dc.redhat.com yhkyQxg2S
      Executing "step_script" stage of the job script
      $ echo "Create shared directory"
      $ mkdir -p ${SHARED_DIR}
      mkdir: cannot create directory '/tmp/pytorch-ci-shared/13106808': Permission denied
      ERROR: Job failed: exit status 1 

              rh-ee-rrathaur Rohit Singh Rathaur
              rh-ee-rrathaur Rohit Singh Rathaur
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: