Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.5 Toolkit (#375)
Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit. GPUs under test: NVIDIA A100 NVIDIA A2 NVIDIA TitanV NVIDIA GeForce 2080 Ti
This commit is contained in:
parent
6b69c79ac3
commit
5fe09c2d67
|
@ -51,7 +51,7 @@ CUTLASS 2.8 is an update to CUTLASS adding:
|
|||
|
||||
# Performance
|
||||
|
||||
<p align="center"><img src=/media/images/cutlass-performance-plot.png></p>
|
||||
<p align="center"><img src=/media/images/cutlass-2.8-gemm-performance.png></p>
|
||||
|
||||
CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
|
||||
they exhibit performance comparable to cuBLAS for scalar GEMM
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 124 KiB |
Loading…
Reference in New Issue