Loading...

XML

Word

Printable

Type: Story
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Accelerator Enablement
Labels:
- cuda
- performance

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Build for fewer GPU archs
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Can and should we reduce TORCH_CUDA_ARCH_LIST to speed up build time and reduce size?

Our CUDA arch list:

TORCH_CUDA_ARCH_LIST=7.5 8.0 8.6 8.7 8.9 9.0 10.0 12.0+PTX

Torch 2.7.1 upstream CUDA arch list for CUDA 12.8, https://github.com/pytorch/pytorch/blob/v2.7.1/.ci/manywheel/build_cuda.sh#L57

TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;9.0;10.0;12.0+PTX"

Upstream does not build dedicated Kernerls for for 8.7 (Jetson Orin, Jetson AGX) and 8.9 (Ada L40S, L40, L20, L4, L2).

Assignee:: Unassigned

Reporter:: Christian Heimes

Team:: Frank's Team

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/07/04 11:25 AM

Updated:: 2025/10/28 4:04 PM

Resolved:: 2025/10/28 4:04 PM