Commit Graph

500 Commits

Author SHA1 Message Date
Nick John Eliopoulos 637b159063
Fix C++17 version detection in helper_macros.hpp (#1479)
* It seems that __cplusplus can be inconsistent with _MSVC_LANG when discerning C++17 version. See https://github.com/NVIDIA/cutlass/issues/1474. Added switch to check _MSVC_LANG in addition to __cplusplus

* Fixed typo.

* Oops, another typo.

* Changed incorrect logic, ifndef to ifdef

* Define CUTLAS_CPLUSPLUS for language version testing

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>

---------

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>
2024-05-28 11:00:51 -04:00
Manish Gupta 033d9efd2d
[Documentation] Fixes the confusion between concatenated vs. composed layout in CuTe documentation (#1498)
* Update 02_layout_algebra.md

* Update 02_layout_algebra.md
2024-05-02 15:35:12 -04:00
Sin acc3ee18a1
Fix typos in cute docs (#1486)
* fix typos in 02_layout_algebra.md

* fix typos in 03_tensor.md
2024-05-02 15:34:36 -04:00
djns99 5c447dd84f
Update packed_stride.hpp to add CUTLASS_HOST_DEVICE decorator to new functions (#1495) 2024-04-19 12:07:57 -04:00
Vijay Thakkar 7d49e6c7e2
Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
Mehdi Yazdani a40e08e9d5
Update 02_layout_algebra.md (#1451)
change line 348 to reflect correct layout.
2024-04-10 10:57:57 -04:00
lzw 8e7d9f483d
add missing header for size_t in `numeric_types.h` (#1420)
* add missing header for size_t in `numeric_types.h`

* make nvrtc happy

* add missing header for int types in `cutlass/arch/memory.h`

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-04-09 14:15:48 -04:00
reed 19f3cc33f1
Fix uint128 operator add (#1400)
* fix uint128 operator add for 64-bit hilo implemenation

* add uint128 test for operator add

* make clang happy

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-04-02 13:32:18 -04:00
jeromeku f9ece1b42c
Python `Gemm` `tile_descriptions` fix (#1439)
* fix python gemm tile descriptions

* fix formatting

* fix math_operation filtering

* fix formatting
2024-03-30 09:00:46 -04:00
reed 28cbacbf64
fix stride compilation warning (#1415) 2024-03-29 23:50:33 -04:00
Tom Tan 8f7d2789b8
[NFC] improve doc: fix typo in mma doc (#1417) 2024-03-27 14:07:20 -04:00
seventh c4e3e122e2
group gemm set stride L = cute::Int<0> (#1416) 2024-03-20 17:31:14 -04:00
Vijay Thakkar 629f4653c3
CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
lorenzo chelini ffa34e7075
(NFC) improve doc: Add missing verb to sentence (#1377)
Co-authored-by: lorenzo chelini <lchelini@nvidia.com>
2024-03-04 15:30:10 -05:00
LiYu Lu a8f2c80db0
fix `tile_size(TiledCopy<Args...> const&)` error (#1357) 2024-02-24 00:33:01 -05:00
ANIKET SHIVAM bbe579a9e3
Updates for CUTLASS 3.4.1 (#1346)
* Updates for CUTLASS 3.4.1

* minor epi change
2024-02-15 15:48:34 -05:00
Driss Guessous 47a3ebbea9
Add a missing platform include (#1328) 2024-02-03 01:30:32 -05:00
Chenggang Zhao 57e01e1a6b
Fix missing include file (#1318) 2024-02-03 01:29:32 -05:00
xws117 6e3df975a2
Modify comments in code examples/08_turing_tensorop_gemm/turing_tensorop_gemm.cu (#1325) 2024-01-31 21:41:30 -05:00
reed 8825fbf1ef
fix unrecognized print format specifier for int8/uint8 (#1303)
* fix unrecognized print format specifier for int8/uint8

* use c++ static_cast instead of c cast style
2024-01-29 21:22:40 -05:00
reed 092f14db05
fix tile_size_mnk compilation warning (#1294) 2024-01-29 21:21:15 -05:00
Haicheng Wu 9385141f19
Update PUBLICATIONS.md
ptq paper from goog
2024-01-19 14:17:55 -05:00
Haicheng Wu b4b5b11070
Update PUBLICATIONS.md
add odyssey llm paper from metuan
2024-01-18 10:30:21 -05:00
jayhshah 139b93db61
update publications (#1308) 2024-01-17 14:06:46 -05:00
Aleksandar Samardžić ca37d632c9
Remove sparse GEMM with row broadcasted bias vector (#1302)
This reverts commit d3e72719b4.

Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>
2024-01-17 14:06:27 -05:00
Chengquan Jiang 362abbf274
Support ElementD to be void for tma (#1153)
* Support void D with AuxStore

* refine get_element_aux
2024-01-16 18:15:42 -05:00
ANIKET SHIVAM 751eb9a885
Update license year (#1306) 2024-01-16 14:37:22 -05:00
ANIKET SHIVAM 2f589ffa76
Updates for 3.4 release. (#1305) 2024-01-16 13:42:51 -05:00
Tianao Ge acba5beee5
Fix flops calculation and tensor b stride calculation in the example 36 (#1278)
* Fix flops calculation and tensor b stride calculation in the example 36

* Fix datatype

* Update gather_scatter_fusion.cu
2024-01-08 17:27:30 -05:00
Eugene Zhulenev 74d1f3e63a
Fix cute::array<T, 0> iterator (#1273) 2024-01-08 17:10:09 -05:00
Kun Wu 8ac2edc810
expose stream API in python kernel call interfaces (#1287)
* expose stream API in python kernel call interfaces

* add stream to ReductionArguments; document stream arg

* add stream argument to GemmGroupedArguments
2024-01-05 08:27:45 -05:00
Ali Hassani d4be5ab5d7
Allow per-column bias in EpilogueTensorBroadcast (#1275)
* Allow per-column bias in EpilogueTensorBroadcast

EpilogueTensorBroadcast only supports per-row vector broadcast, because
the bias stride is hardcoded.

It can easily support both if the bias stride is made conditional, and
the original behavior is maintained by defaulting to per-row.

* Add unit test for EpilogueTensorBroadcast with per-col bias

---------

Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
Co-authored-by: Ali Hassani <ali@hippoml.com>
2024-01-04 12:48:31 -05:00
Jee Li c9591a694d
fix typo (#1279) 2024-01-04 12:41:39 -05:00
Aleksandar Samardžić 5c756eb774
Add support for sparse GEMM with visitor epilogue (#1189)
* Add support for sparse GEMM with visitor epilogue

* Refactor changes at the kernel level
2024-01-04 12:38:11 -05:00
Pradeep Ramani 8236f30675
CUTLASS 3.4.0 (#1286)
* CUTLASS 3.4.0

* Update CHANGELOG.md

---------

Co-authored-by: Pradeep Ramani <prramani@nvidia.com>
2023-12-29 15:21:31 -05:00
Christian Sigg b7508e3379
Fix inline ptx escaping for predicates. (#1264)
* Fix inline ptx escaping for predicates.

Prevents `error: invalid % escape in inline assembly string` when compiling with clang.

* More double-quoting.
2023-12-14 11:16:15 -05:00
Gregory Meyer (gregjm) f60786b536
Remove undefined behavior from default constructor of PredicatedTileAccessIteratorParams. (#1258)
Currently, the default constructor of
`PredicatedTileAccessIteratorParams` will invoke undefined behavior in
its invocation of the `initialize` function. Specifically, it will
attempt to read from the uninitialized variables
`desc.element_size_bits` and `desc.advance_rank`. This commit changes
the default constructors of both `*Params` and `*Desc` to
zero-initialize all uninitialized members.
2023-12-11 23:01:53 -05:00
Andrey Portnoy 30ec1a4649
Use size_t index to iterate up to std::vector::size() (#1251)
Fixes a different signedness compare warning.
2023-12-09 08:44:31 -05:00
Christian Sigg e1483d5fa0
Collection of changes to fix clang build. (#1200)
* Remove unused variables

* Qualify calls to make_fragment_? from templated base class.

Fixes clang build error.

* Add missing `#include <cstdio>`

* Various changes to fix clang compile errors.

* More changes to fix clang build.

Remaining issues:

- `params` initializer of `CollectiveEpilogue`.
- `ops` initializer of `Sm90VisitorImplBase`.
- `__usAtomicCAS` needs to be added to clang upstream.

* Fix remaining clang build issues.

* Qualify `cute::rank()` calls.

* Qualify some more calls that are otherwise ambiguous between `cute` and `std` namespace.

* Double-escape special registers in inline asm.

* small change

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-12-08 14:42:12 -05:00
Ali Hassani f4a0216601
Fix bug in single source GEMM with residual + streamk (#1249)
Followup to #1224.

A change in the stream-k threadblock swizzle ctor since 3.3 breaks
single source GEMM with fused epilogue and stream-k. Multi-source was
already corrected.

Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
2023-12-07 11:12:02 -05:00
Valeriy Fedyunin f188f9b709
Fix typo in quickstart.md (#1257) 2023-12-07 09:49:52 -05:00
Haicheng Wu 9c9b51d35c
Update PUBLICATIONS.md 2023-12-07 00:02:36 -05:00
Ali Hassani a75b4ac483
Fix Stream-K reduce bug in epilogue with broadcast (#1224)
Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
2023-12-05 15:35:41 -05:00
Pradeep Ramani e9e30c2304
Updates and Bug fixes to CUTLASS 3.3 (#1232) 2023-12-05 09:50:49 -05:00
Haicheng Wu 4a1709e17e
Fixed illegal PTX syntax (#1225) 2023-12-01 12:29:48 -05:00
Christian Sigg bef1fbcbe6
Add missing `#include <cstdio>` (#1197)
* Add missing `#include <cstdio>`

* move to non nvrtc part

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-12-01 11:58:53 -05:00
Christian Sigg 2375a07d01
Qualify calls to make_fragment_? from templated base class. (#1196)
Fixes clang build error.
2023-12-01 09:52:57 -05:00
Christian Sigg 60c8251b72
Remove unused variables (#1195) 2023-12-01 09:52:19 -05:00
cyyever 10b850f9c7
Fix some sign conversion warnings (#1172)
* Fix sign conversion warnings

* Fix type conversion warnings

* Fix sign conversion warnings

* Change smem_size_ to constexpr

* clang warnings

* undo cast change

* one miss change

* missing part

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-11-30 00:28:40 -05:00
Christian Sigg 99c4eebe3b
Explicitly cast `blockIdx` to `uint3` (#1192)
This works around a clang issue where blockIdx is of a different type.
2023-11-30 00:26:23 -05:00