Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.5 Toolkit (#375)

Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.

GPUs under test:

    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti
This commit is contained in:
Andrew Kerr 2021-12-06 14:21:33 -05:00 committed by GitHub
parent 6b69c79ac3
commit 5fe09c2d67
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 1 additions and 1 deletions

View File

@ -51,7 +51,7 @@ CUTLASS 2.8 is an update to CUTLASS adding:
# Performance
<p align="center"><img src=/media/images/cutlass-performance-plot.png></p>
<p align="center"><img src=/media/images/cutlass-2.8-gemm-performance.png></p>
CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
they exhibit performance comparable to cuBLAS for scalar GEMM

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB