Commit Graph

  • d39c4cd606 Add scheduler mapping for raster order in the kernels. #1113 Aniket Shivam 2023-09-26 13:15:41 -0700
  • 18def5f048 Minor fix in gemm op profiler for raster order. Aniket Shivam 2023-09-26 11:55:15 -0700
  • 63404ec330 Updates for 3.2.1 release. Aniket Shivam 2023-09-26 10:11:14 -0700
  • 9d9f64dc91 Allow changing epsilon parameter in RMS norm kernel #1112 Masahiro Masuda 2023-09-27 04:25:48 +0900
  • b5fbe979d8 restore debug files and remove prints #1109 Manish Gupta 2023-09-26 00:22:38 +0000
  • c7ffde143a Debug and fix for parallel split-k in profiler Manish Gupta 2023-09-26 00:11:13 +0000
  • af0b442f17 fix error print dispatch #1104 reed-lau 2023-09-20 05:23:59 +0000
  • e0aaa3c3b3
    fix GmmaDescriptor print format string error (#1102) reed 2023-09-20 11:27:58 +0800
  • 1beab64062 fix GmmaDescriptor print format string error #1102 reed-lau 2023-09-20 01:59:51 +0000
  • c005cb2eb9 suppress missing return warning for cute::get #1101 reed-lau 2023-09-19 12:06:42 +0000
  • e74fde651e fix infinite print call #1100 reed-lau 2023-09-19 11:54:25 +0000
  • 8783c41851
    Replace 0x1f with 0xffffffff in __shfl_sync (#1097) Vadim Markovtsev 2023-09-19 01:58:19 +0200
  • eb01ba5500 Replace 0x1f with 0xffffffff in __shfl_sync #1097 Vadim Markovtsev 2023-09-15 16:00:05 +0200
  • 478de56119 Change the position of minus sign in line1549 array.h #1091 xuhaoran 2023-09-13 18:02:15 +0800
  • a6adbe2585 [fix] fix comparison operator for integer_subbyte #1090 zhengsize 2023-09-12 22:33:11 +0000
  • 6407bcdf0a
    fix matrix B indices (#1089) Yujia Zhai 2023-09-12 11:04:18 -0700
  • 58e4d11fc6
    fix matrix B indices #1089 Yujia Zhai 2023-09-12 11:01:43 -0700
  • a77b2c9cb8
    style(examples): typo (#1080) tpoisonooo 2023-09-11 22:13:22 +0800
  • e515aeba9b great-equal instead of equal to arch version #1082 Fabian Schuetze 2023-09-10 15:53:09 +0000
  • 45def052c3
    Update ampere_gemm_operand_reduction_fusion.cu #1080 tpoisonooo 2023-09-09 21:51:59 +0800
  • 4c7624a92c
    Update ampere_tensorop_conv2dfprop.cu tpoisonooo 2023-09-09 20:40:19 +0800
  • 34bbadd3ff
    standarize fp8 generator (#1078) ANIKET SHIVAM 2023-09-07 11:36:33 -0700
  • 15a093eb17
    standarize fp8 generator #1078 ANIKET SHIVAM 2023-09-07 11:30:06 -0700
  • 88c0d7c726
    make only visible on device (#1071) Driss Guessous 2023-09-07 13:00:46 -0400
  • e01b9b5029
    Shard gemm reference templates into multiple TUs for parallel compilation (#1043) Vijay Thakkar 2023-08-30 16:46:30 -0400
  • 81c717ce22 remove some redundant kernels #1043 Vijay Thakkar 2023-08-30 13:34:29 -0700
  • f71f653fcf set kIsHeavy=false for HardSwish Fabian Schuetze 2023-08-30 10:43:38 +0000
  • 34fd98056b
    fix cinttypes issue with STDC_FORMAT_MACROS (#1068) Aman Gupta Karmani 2023-08-29 14:59:33 -0400
  • 779f44a3f2
    Update mma_sm90_desc.hpp #1068 Haicheng Wu 2023-08-29 14:59:16 -0400
  • 54957dd861
    Update mma_sm90_desc.hpp Haicheng Wu 2023-08-29 14:58:44 -0400
  • 67483b720a Use relative import in files under tools/library/scripts/. #1072 Ying Zhang 2023-08-28 21:27:03 -0700
  • 70e2125b12 make only visible on device #1071 drisspg 2023-08-28 09:59:51 -0700
  • 52aac34118 fix cinttypes issue with STDC_FORMAT_MACROS Aman Karmani 2023-08-28 00:47:06 +0000
  • 3a8f57a3c8
    Add simple hash and eq methods for gemm_operations. (#1053) v3.2.0 #1073 Ying Zhang 2023-08-27 17:41:57 -0700
  • 6673df0e48
    fix typos (#1059) reed 2023-08-27 12:49:26 +0800
  • 7618e9bfd8
    Fix numeric conversion warning (#1021) Lufang Chen 2023-08-27 12:42:44 +0800
  • a88c41cf8d
    Updates for 3.2 release (#1065) ANIKET SHIVAM 2023-08-25 17:05:46 -1000
  • 5f18eb0426 Updates for 3.2 release #1065 Aniket Shivam 2023-08-25 10:26:24 -0700
  • 24cbc67e4f fix typos #1059 reed-lau 2023-08-24 04:50:37 +0000
  • e49a3d19ea Add simple hash and eq methods for gemm_operations. #1053 Ying Zhang 2023-08-17 17:44:06 -0700
  • 27de343535
    Add one Publication which is inspired by cutlass (#1022) reed 2023-08-22 22:00:17 +0800
  • 2a9fa23e06
    Avoid cute::print compiler warnings with -Wformat-security (#1041) Allard Hendriksen 2023-08-18 20:38:27 +0200
  • 2e56cfabee
    fix typo (#1047) zhu jianjiang 2023-08-19 02:08:26 +0800
  • 3930f709ce
    Fix typo in `0x_gemm_tutorial.md` (#1035) lorenzo chelini 2023-08-17 16:52:20 +0200
  • 902f182f36 remove auto fp8 kernels Vijay Thakkar 2023-08-16 16:09:30 -0700
  • 1c893975b2 remove 3 new added refcheck kernels and some un-necessary fp8 library instances to reduce lib size Vijay Thakkar 2023-08-16 14:37:49 -0700
  • 7e5ee8b7bf
    [doc] fix: fix typos in the comment (#1049) Haibin Lin 2023-08-16 08:39:25 -0700
  • 2d9a557427
    torch.bfloat16 support in cutlass python (#1037) Sophia Wisdom 2023-08-16 08:38:53 -0700
  • 8ec15cf6e1
    [doc] fix: fix typos in the comment #1049 Haibin Lin 2023-08-14 15:04:50 -0700
  • 47343b5fc9 better balancing of ref kernels across TUs Vijay Thakkar 2023-08-14 11:37:56 -0700
  • 9d30c10266 fix typo #1047 zjjott 2023-08-14 15:01:56 +0800
  • bb996a1e55 remove old files Vijay Thakkar 2023-08-12 09:45:59 -0700
  • fb572ddda3 Split apart gemm reference templates into multiple TUs for parallel compilation Vijay Thakkar 2023-08-12 09:37:00 -0700
  • 9a1dbd9bd6
    Update datatypes.py #1037 Sophia Wisdom 2023-08-11 16:29:51 -0700
  • 0d2cce0aa4
    Avoid cute::print compiler warnings with -Wformat-security #1041 Allard Hendriksen 2023-08-11 16:16:00 +0200
  • fa25c18a43
    torch.bfloat16 support in cutlass python Sophia Wisdom 2023-08-10 16:06:08 -0700
  • 199393a935 Fix typo in `0x_gemm_tutorial.md` #1035 Lorenzo Chelini 2023-08-09 18:42:24 +0200
  • 4575443d44
    CUTLASS 3.2 (#1024) ANIKET SHIVAM 2023-08-07 14:50:32 -1000
  • a0ad21b3b7 minor comment fix #1024 Aniket Shivam 2023-08-03 21:13:07 -0700
  • 3e22deeae4 CUTLASS 3.2 Aniket Shivam 2023-08-03 20:02:17 -0700
  • 4c0c7414f5 update #1021 Lufang CHEN 陈橹方 2023-08-02 15:26:07 +0000
  • d226e24104 fix numeric conversion unused var Lufang CHEN 陈橹方 2023-08-02 09:26:10 +0000
  • bedd7b61ab Add one Publication which is inspired by cutlass #1022 reed-lau 2023-08-02 09:21:52 +0000
  • a0d787b746
    Fix one publication (#1019) Xianyao Zhang 2023-07-28 17:40:17 +0200
  • 9e12eb32f8
    Fix one publication #1019 Xianyao Zhang 2023-07-28 13:42:16 +0200
  • 190af21f75 correct kIsHeavy value for Tanh Fabian Schuetze 2023-07-23 11:07:28 +0000
  • aa4db41329 set kIsHeavy member variables Fabian Schuetze 2023-07-23 11:01:18 +0000
  • d20f3a9542
    spelling (#1007) Sophia Wisdom 2023-07-20 11:41:11 -0700
  • 6bbc2b0e80
    spelling #1007 Sophia Wisdom 2023-07-20 11:23:29 -0700
  • 8e85580859
    fix layout bug (#1006) Tianqi Zhang (张天启) 2023-07-20 02:26:01 +0800
  • 099c4f064c
    fix layout bug #1006 Tianqi Zhang (张天启) 2023-07-19 15:44:29 +0800
  • 146d314057
    Update fMHA kernels (#992) dan_the_3rd 2023-07-13 04:30:46 +0200
  • 3e535f4065 make var work #992 Haicheng Wu 2023-07-12 18:07:05 -0700
  • f71b7fa695 update warpgroup_wait static_assert message #996 Matthias Jouanneaux 2023-07-11 10:12:24 -0700
  • d5d56b7ac4 minor changes Haicheng Wu 2023-07-10 19:57:26 -0700
  • f679663224
    Add RMS norm (#979) masahi 2023-07-11 10:31:27 +0900
  • e066ced33b
    fix epilogue iterator error (#995) ChangyouSiom 2023-07-11 09:30:31 +0800
  • be749b382d Merge branch 'main' of github.com:ChangyouSiom/cutlass into main #995 maxiao 2023-07-10 22:50:33 +0800
  • a263d18e98 fix epilogue iterator error maxiao 2023-07-10 22:47:48 +0800
  • 26a3a9aa32 fix epilogue iterator error #993 maxiao 2023-07-09 11:33:08 +0800
  • dbbf199071 Update fMHA kernels danthe3rd 2023-07-07 12:38:59 +0000
  • 9b923dd4c4
    fix minor typos (#984) Nathan Wang 2023-07-05 06:23:01 -0700
  • dce8c1fe72
    fix minor typos #984 Nathan Wang 2023-06-30 12:01:48 -0700
  • 6a5d84b9c6 Add RMS norm #979 Masahiro Masuda 2023-06-22 16:23:24 +0900
  • f6d42f2dd0
    add library_dirs (#977) q.yao 2023-06-15 00:09:12 +0800
  • 6ed4fbe4d2 add library_dirs #977 grimoire 2023-06-14 17:39:45 +0800
  • 473a67073e
    Fix Int8 and TF32 generator (#976) ANIKET SHIVAM 2023-06-12 09:32:52 -0700
  • e648d3fcbc Fix Int8 and TF32 generator #976 Aniket Shivam 2023-06-09 14:57:58 -0700
  • 87349d3496
    Add grouped b2b GEMM (#970) Jack Kosaian 2023-06-05 17:16:57 -0400
  • 8125849501 Add grouped b2b GEMM #970 Jack Kosaian 2023-06-02 12:06:14 -0700
  • fde824af21
    Update Hopper performance plot for CUTLASS 3.1 + CTK 12.1 (#967) Vijay Thakkar 2023-06-01 11:52:40 -0700
  • 91241aaca1 Update Hopper performance plot for CUTLASS 3.1 + CTK 12.1 #967 Vijay Thakkar 2023-06-01 08:13:02 -0700
  • 7dbf423763
    Add conversion from ElementBias to ElementCompute (#961) Jack Kosaian 2023-05-26 23:08:36 -0400
  • 077c6824e5 Add conversion from ElementBias to ElementCompute #961 Jack Kosaian 2023-05-26 19:13:39 -0700
  • 6f47420213
    Update README.md v3.1.0 Haicheng Wu 2023-05-24 12:40:31 -0400
  • 4638250469
    Update CHANGELOG.md Haicheng Wu 2023-05-24 12:39:42 -0400
  • 7859fe322a
    Update PUBLICATIONS.md Haicheng Wu 2023-05-24 12:36:12 -0400
  • d3e72719b4
    Add support for sparse GEMM with row broadcasted bias vector (#951) Aleksandar Samardžić 2023-05-24 16:25:05 +0200
  • b4ab501767
    Adds CUDA path for x86-64 (#957) Ali Hassani 2023-05-24 07:21:25 -0700
  • f079619f5e
    More updates for 3.1 (#958) ANIKET SHIVAM 2023-05-24 07:17:16 -0700