-
Task
-
Resolution: Done
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
Objective
Set up test infrastructure using self-hosted runner with H200 GPU.
Work Completed
- Configured self-hosted runner: test-runner-git-109-vpc
- Set up Podman container runtime (Docker emulation)
- Configured GPU passthrough for CUDA tests
- Implemented test sharding (initially 3 shards, optimized to 5)
- Added test artifact collection and reporting
- Set up --keep-going flag for comprehensive coverage
Test Execution
- Full PyTorch test suite: ~20,400 tests
- Test shards: 5 (optimized from 3)
- Parallel execution within shards
- Comprehensive logging and artifact collection
Deliverables
- [x] Self-hosted runner operational
- [x] GPU passthrough working
- [x] Test execution reliable
- [x] Artifact collection functional
References
- Workflow: .github/workflows/rhel-build-test.yml
- Test script: .ci/pytorch/test.sh