Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
- pytorch_ci

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
AIPCC-8378
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Objective

Set up test infrastructure using self-hosted runner with H200 GPU.

Work Completed

Configured self-hosted runner: test-runner-git-109-vpc
Set up Podman container runtime (Docker emulation)
Configured GPU passthrough for CUDA tests
Implemented test sharding (initially 3 shards, optimized to 5)
Added test artifact collection and reporting
Set up --keep-going flag for comprehensive coverage

Test Execution

Full PyTorch test suite: ~20,400 tests
Test shards: 5 (optimized from 3)
Parallel execution within shards
Comprehensive logging and artifact collection

Deliverables

[x] Self-hosted runner operational
[x] GPU passthrough working
[x] Test execution reliable
[x] Artifact collection functional

References

Workflow: .github/workflows/rhel-build-test.yml
Test script: .ci/pytorch/test.sh

Assignee:: Subin George

Reporter:: Subin George

Team:: PyTorch Infrastructure

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/12 5:39 AM

Updated:: 2026/01/22 9:35 AM

Resolved:: 2026/01/12 5:39 AM