254 lines
7.2 KiB
Plaintext
254 lines
7.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "5d24a692",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Example of using elementwise activation functions in the CUTLASS Python interface\n",
|
|
"This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs with different epilogues.\n",
|
|
"\n",
|
|
"[](https://colab.research.google.com/github/NVIDIA/cutlass/blob/main/examples/python/01_epilogue.ipynb)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "28c916da",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Prerequisites for running on Colab\n",
|
|
"This notebook requires an NVIDIA GPU. If `nvidia-smi` fails, go to Runtime -> Change runtime type -> Hardware accelerator and confirm a GPU is selected."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0fcea8ea",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!#nvidia-smi"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7ec60b57",
|
|
"metadata": {},
|
|
"source": [
|
|
"If running on Colab, you will need to install the CUTLASS Python interface. To do so, uncomment the following line and run the cell:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1db9e51c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"!#pip install nvidia-cutlass"
|
|
]
|
|
},
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"id": "962324fd",
|
|
"metadata": {},
|
|
"source": [
|
|
"## General setup\n",
|
|
"We first import various packages needed for the example and construct the input and output tensors that will be used in our example."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "63a70a3c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"import cutlass\n",
|
|
"\n",
|
|
"# This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to\n",
|
|
"# omit this information.\n",
|
|
"print_module = True\n",
|
|
"\n",
|
|
"m = 256\n",
|
|
"n = m\n",
|
|
"k = m\n",
|
|
"\n",
|
|
"type_A = np.float16\n",
|
|
"type_B = np.float16\n",
|
|
"type_C = np.float16\n",
|
|
"type_D = np.float16\n",
|
|
"\n",
|
|
"np.random.seed(1234)\n",
|
|
"scope_min = -4\n",
|
|
"scope_max = 4\n",
|
|
"tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n",
|
|
"tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n",
|
|
"tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n",
|
|
"\n",
|
|
"alpha = np.float16(1.)\n",
|
|
"beta = np.float16(0.)\n",
|
|
"\n",
|
|
"tensor_D = np.zeros(tensor_C.shape).astype(type_D)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "1eb0d95b",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Run a GEMM with an identity activation function\n",
|
|
"To begin, we simply run a default GEMM with an identity activation function. This performs the well-known operation `D = alpha * (A @ B) + beta * C`. This is the default activation function used, and does not need to be specified."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "8d257833",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)\n",
|
|
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "54961694",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Run a GEMM with a ReLU element-wise activation function\n",
|
|
"CUTLASS makes it easy to support other element-wise activation functions. This results in performing an element-wise after the generic linear combination performed in a GEMM. If we call such an activation function `act`, the resulting formulation is:\n",
|
|
"```\n",
|
|
"D = alpha * (A @ B) + beta * C\n",
|
|
"D = act(D)\n",
|
|
"```\n",
|
|
"\n",
|
|
"Here, we will add a ReLU activation function. Given an input `x`, ReLU returns `max(x, 0)`.\n",
|
|
"\n",
|
|
"This is easy to do in CUTLASS. One only needs to set the plan's `activation` field."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "5fe49443",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"tensor_D_relu = np.zeros(tensor_C.shape).astype(type_D)\n",
|
|
"plan.activation = \"relu\"\n",
|
|
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D_relu, print_module=print_module)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "455d0a37",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can now verify that the result of the GEMM that used a ReLU activation function:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e32e7798",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"relu_ref = (tensor_D >= 0).astype(type_D) * tensor_D\n",
|
|
"np.testing.assert_array_equal(relu_ref, tensor_D_relu)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "cf959171",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Other element-wise activation functions\n",
|
|
"CUTLASS supports a variety of widely-used element-wise activation functions. We can obtain a list of these functions via the `get_activations()` method."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9e17d730",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"activations = plan.activations()\n",
|
|
"for activation in activations:\n",
|
|
" print(activation)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0e4599fa",
|
|
"metadata": {},
|
|
"source": [
|
|
"We can then run each of them:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9c3598c9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"for activation in activations:\n",
|
|
" print('=============================================================================================')\n",
|
|
" print(f'Compiling and running activation {activation}')\n",
|
|
" print('=============================================================================================')\n",
|
|
" plan.activation = activation\n",
|
|
" plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "18828622",
|
|
"metadata": {},
|
|
"source": [
|
|
"To add an activation with parameter such as `leaky_relu`, a tuple should be provided containing the activation function name and the (or a list of) parameter."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "53108eae",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"negative_slope = 0.5\n",
|
|
"plan.activation = (\"leaky_relu\", negative_slope)\n",
|
|
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.10"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|