-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
Can and should we reduce TORCH_CUDA_ARCH_LIST to speed up build time and reduce size?
Our CUDA arch list:
TORCH_CUDA_ARCH_LIST=7.5 8.0 8.6 8.7 8.9 9.0 10.0 12.0+PTX
Torch 2.7.1 upstream CUDA arch list for CUDA 12.8, https://github.com/pytorch/pytorch/blob/v2.7.1/.ci/manywheel/build_cuda.sh#L57
TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;9.0;10.0;12.0+PTX"
Upstream does not build dedicated Kernerls for for 8.7 (Jetson Orin, Jetson AGX) and 8.9 (Ada L40S, L40, L20, L4, L2).