-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
-
- 🐛 Describe the bug
-
PyTorch compilation fails when building with USE_NCCL=0 due to nccl_dev_cap.hpp attempting to include NCCL headers even when NCCL is explicitly disabled.
I will raise a PR to fix this in some time.
-
-
- Versions
-
Collecting environment information...
PyTorch version: 2.11.0a0+gitc22a1b4
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A
OS: Fedora Linux 42 (Cloud Edition) (x86_64)
GCC version: (GCC) 15.2.1 20250808 (Red Hat 15.2.1-1)
Clang version: 20.1.8 (Fedora 20.1.8-4.fc42)
CMake version: version 4.2.1
Libc version: glibc-2.41
Python version: 3.14.2 | packaged by Anaconda, Inc. | (main, Dec 19 2025, 11:49:32) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-6.16.7-200.fc42.x86_64-x86_64-with-glibc2.41
Is CUDA available: True
CUDA runtime version: 13.0.88
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA L4
Nvidia driver version: 580.82.09
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7R13 Processor
CPU family: 25
Model: 1
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 1
BogoMIPS: 5299.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save vaes vpclmulqdq rdpid
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 512 KiB (16 instances)
L1i cache: 512 KiB (16 instances)
L2 cache: 8 MiB (16 instances)
L3 cache: 64 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Old microcode: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsa: Mitigation; Clear CPU buffers
Vulnerability Tsx async abort: Not affected
Vulnerability Vmscape: Not affected
Versions of relevant libraries:
[pip3] intel-cmplr-lib-ur==2025.3.1
[pip3] intel-openmp==2025.3.1
[pip3] mkl-include==2025.3.0
[pip3] mkl-static==2025.3.0
[pip3] numpy==2.4.0
[pip3] onemkl-license==2025.3.0
[pip3] optree==0.18.0
[pip3] tbb==2022.3.0
[pip3] tbb-devel==2022.3.0
[pip3] tcmlib==1.4.1
[pip3] torch==2.11.0a0+gitc22a1b4
[pip3] umf==1.0.2
[conda] blas 1.0 mkl
[conda] intel-cmplr-lib-ur 2025.3.1 pypi_0 pypi
[conda] intel-openmp 2025.3.1 pypi_0 pypi
[conda] mkl 2025.0.0 hacee8c2_941
[conda] mkl-devel 2025.0.0 h3a03a7a_941
[conda] mkl-include 2025.3.0 pypi_0 pypi
[conda] mkl-static 2025.3.0 pypi_0 pypi
[conda] numpy 2.4.0 pypi_0 pypi
[conda] onemkl-license 2025.3.0 pypi_0 pypi
[conda] optree 0.18.0 pypi_0 pypi
[conda] tbb 2022.3.0 pypi_0 pypi
[conda] tbb-devel 2022.3.0 pypi_0 pypi
[conda] tcmlib 1.4.1 pypi_0 pypi
[conda] torch 2.11.0a0+gitc22a1b4 pypi_0 pypi
[conda] umf 1.0.2 pypi_0 pypi
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @malfet @seemethere
- clones
-
AIPCC-7409 Inconsistent Error Handling Between CPU and CUDA for `cosine_similarity`
-
- Closed
-
- is cloned by
-
AIPCC-8344 Negative values in stride causing error in `avg_pool2d` (on both CPU and CUDA)
-
- New
-