-
Spike
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
-
-
AIPCC Accelerators 16
- Test Imports
By using upstream test folder to extract the packages imported https://github.com/pytorch/pytorch/tree/main/test,
import file: torch_imports_only.txt - Test probes for Import file torch_imports_only.txt
^^
- Bench marking - Compare with our torch wheels with Upstream
- Compare and contrasts the build and install dependency with upstream and our wheel and flag difference for possible mistmatch
NOTE: This tests has to be run before with upstream torch so as to confirm the performance as well test validation from upstream.
Please Verify the TEST run on ACCELERATORS and not CPU.
Please check the torch_import_only file for functions to be tested
Category | Script / Command | Purpose | Main Arguments / Options | Notes / Reference |
---|---|---|---|---|
Test Imports | upstream test folder | Extract imported packages | N/A | Upstream tests |
Data Benchmarks | python samplers_benchmark.py | Benchmark data samplers | N/A | Data README |
Override Benchmarks | python bench.py | Run override benchmarks | N/A | Override README |
Distributed / DDP | torchrun --nproc_per_node=1 benchmark.py | Distributed benchmark (ResNet50) | --model resnet50 --world-size 1 --distributed-backend nccl --master-addr localhost --master-port 12355 | Single-node example |
Distributed / DDP | python benchmark.py | Alternative run | --world-size 1 --master-addr localhost --master-port 12355 | Single-node example |
Framework Overhead | python3 framework_overhead_benchmark.py | Benchmark framework overhead | --op add_op --num-warmup-iters 10 --num-iters 100 --use-throughput-benchmark --save | Framework Overhead README |
Fuser | python3 run_benchmarks.py | Benchmark fused operators | --operators add,sub,mul,div,... --shapes scalar,small,small_2d,... | Fuser README |
GPT Fast | python benchmark.py | Benchmark GPT models | N/A | GPT Fast README |
Inductor Backends | N/A | Benchmark inductor backends | N/A | Inductor Backends |
Inference | ./runner.sh <EXP_NAME>exp1 | Benchmark inference performance | N/A | Can be time-consuming |
Instruction Counts | python main.py | Benchmark instruction counts | N/A | Can be time-consuming |
Nested | python nested_bmm_bench.py | Nested batch matrix multiplication benchmarks | N/A | Nested README |
Profiler | python resnet_memory_profiler.py | Profile ResNet memory usage | Modify script to use CUDA | |
Profiler | python3 profiler_bench.py | Profile GPU performance | --with-cuda --use-kineto --profiling-tensor-size 1024 --internal-iter 256 --with-stack --use-script | |
Serialization | .py files | Test serialization | N/A | Just run the files |
Sparse (DLMC) | python3 -m dlmc.matmul_bench | Sparse matrix benchmarks | --path <dataset_path> --dataset magnitude_pruning --operation sparse@dense --with-cuda | Requires dataset |
Transformer | attention_bias_benchmarks.py | Benchmark attention bias | N/A | |
Transformer | better_transformer_vs_mha_functional.py | Compare BetterTransformer vs MHA | N/A |
- is related to
-
AIPCC-3785 build torch-2.8
-
- In Progress
-