hanchenye-scalehls/README.md

# ScaleHLS Project (scalehls)

ScaleHLS is a next-generation HLS compilation flow, on top of a multi-level compiler infrastructure called MLIR. ScaleHLS is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels. On top of the library, we also build an automated DSE engine to explore the multi-dimensional design space efficiently. In addition, we develop an HLS C front-end and a C/C++ emission back-end to translate HLS designs into/from MLIR for enabling the end-to-end ScaleHLS flow. Experimental results show that, comparing to the baseline designs only optimized by Xilinx Vivado HLS, ScaleHLS improves the performances with amazing quality-of-results – up to 768.1× better on computation kernel level programs and up to 3825.0× better on neural network models.

Please check out our [arXiv paper](https://arxiv.org/abs/2107.11673) for more details.

## Quick Start

### 0. Download ScaleHLS
```sh
$ git clone --recursive git@github.com:hanchenye/scalehls.git
```

### 1. Install ScaleHLS
To enable the Python binding feature, please make sure the `pybind11` has been installed. To build MLIR and ScaleHLS, run (note that the `-DLLVM_PARALLEL_LINK_JOBS` option can be tuned to reduce the memory usage):
```sh
$ mkdir scalehls/build
$ cd scalehls/build
$ cmake -G Ninja ../polygeist/llvm-project/llvm \
    -DLLVM_ENABLE_PROJECTS="mlir;clang" \
    -DLLVM_EXTERNAL_PROJECTS="scalehls" \
    -DLLVM_EXTERNAL_SCALEHLS_SOURCE_DIR=$PWD/.. \
    -DLLVM_TARGETS_TO_BUILD="host" \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DCMAKE_BUILD_TYPE=DEBUG \
    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
    -DSCALEHLS_ENABLE_BINDINGS_PYTHON=ON \
    -DLLVM_PARALLEL_LINK_JOBS= \
    -DLLVM_USE_LINKER=lld \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++
$ ninja
$ ninja check-scalehls
$ export PATH=$PATH:$PWD/bin
$ export PYTHONPATH=$PYTHONPATH:$PWD/tools/scalehls/python_packages/scalehls_core
```

ScaleHLS exploits the `mlir-clang` tool of Polygeist as the C front-end. To build Polygeist, run:
```sh
$ mkdir scalehls/polygeist/build
$ cd scalehls/polygeist/build
$ cmake -G Ninja .. \
    -DMLIR_DIR=$PWD/../../build/lib/cmake/mlir \
    -DCLANG_DIR=$PWD/../../build/lib/cmake/clang \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DCMAKE_BUILD_TYPE=DEBUG \
    -DLLVM_USE_LINKER=lld \
    -DCMAKE_C_COMPILER=clang \
    -DCMAKE_CXX_COMPILER=clang++
$ ninja check-mlir-clang
$ export PATH=$PATH:$PWD/mlir-clang
```

### 2. Try ScaleHLS
After the installation and regression test successfully completed, you should be able to play with:
```sh
$ cd scalehls

$ # HLS C programs parsing and automatic kernel-level design space exploration.
$ mlir-clang samples/polybench/gemm/gemm_32.c -function=gemm_32 -memref-fullrank -raise-scf-to-affine -S | \
    scalehls-opt -dse="top-func=gemm_32 output-path=./ target-spec=samples/polybench/target-spec.ini" \
    -debug-only=scalehls > /dev/null
$ scalehls-translate -emit-hlscpp gemm_32_pareto_0.mlir > gemm_32_pareto_0.cpp

$ # Loop and directive-level optimizations, QoR estimation, and C++ code generation.
$ scalehls-opt samples/polybench/syrk/syrk_32.mlir \
    -affine-loop-perfection -affine-loop-order-opt -remove-variable-bound \
    -partial-affine-loop-tile="tile-size=2" -legalize-to-hlscpp="top-func=syrk_32" \
    -loop-pipelining="pipeline-level=3 target-ii=2" -canonicalize -simplify-affine-if \
    -affine-store-forward -simplify-memref-access -array-partition -cse -canonicalize \
    -qor-estimation="target-spec=samples/polybench/target-spec.ini" \
    | scalehls-translate -emit-hlscpp
```

## Integration with ONNX-MLIR
If you have installed ONNX-MLIR or established ONNX-MLIR docker to `$ONNXMLIR_DIR` following the instruction from (https://github.com/onnx/onnx-mlir), you should be able to run the following integration test:
```sh
$ cd scalehls/samples/onnx-mlir/resnet18

$ # Export PyTorch model to ONNX.
$ python export_resnet18.py

$ # Parse ONNX model to MLIR.
$ $ONNXMLIR_DIR/build/bin/onnx-mlir -EmitONNXIR resnet18.onnx

$ # Lower from ONNX dialect to Affine dialect.
$ $ONNXMLIR_DIR/build/bin/onnx-mlir-opt resnet18.onnx.mlir \
    -shape-inference -convert-onnx-to-krnl -pack-krnl-constants \
    -convert-krnl-to-affine > resnet18.mlir

$ # (Optional) Print model graph.
$ scalehls-opt resnet18.tmp -print-op-graph 2> resnet18.gv
$ dot -Tpng resnet18.gv > resnet18.png

$ # Legalize the output of ONNX-MLIR, optimize and emit C++ code.
$ scalehls-opt resnet18.mlir -allow-unregistered-dialect -legalize-onnx \
    -affine-loop-normalize -canonicalize -legalize-dataflow="insert-copy=true min-gran=3" \
    -split-function -convert-linalg-to-affine-loops -legalize-to-hlscpp="top-func=main_graph" \
    -affine-loop-perfection -affine-loop-order-opt -loop-pipelining -simplify-affine-if \
    -affine-store-forward -simplify-memref-access -array-partition -cse -canonicalize \
    | scalehls-translate -emit-hlscpp > resnet18.cpp
```

Please refer to the `samples/onnx-mlir` folder for more test cases, and `sample/onnx-mlir/ablation_int_test.sh` for how to conduct the graph, loop, and directive optimizations.

## References
1. [MLIR](https://mlir.llvm.org): Multi-Level Intermediate Representation
2. [NPComp](https://github.com/llvm/mlir-npcomp): MLIR based compiler toolkit for numerical python programs
3. [ONNX-MLIR](https://github.com/onnx/onnx-mlir): The Open Neural Network Exchange implementation in MLIR
4. [CIRCT](https://github.com/llvm/circt): Circuit IR Compilers and Tools
5. [COMBA](https://github.com/zjru/COMBA): A Model-Based Analysis Framework for High Level Synthesis on FPGAs
-												update readme

											
										
										
											2020-09-14 12:45:52 +08:00
+								# ScaleHLS Project (scalehls)
-												add overview and benchmarks

											
										
										
											2020-05-13 12:28:39 +08:00
-												Update README.md
											
										
										
											2021-07-28 02:00:10 +08:00
+								ScaleHLS is a next-generation HLS compilation flow, on top of a multi-level compiler infrastructure called MLIR. ScaleHLS is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels. On top of the library, we also build an automated DSE engine to explore the multi-dimensional design space efficiently. In addition, we develop an HLS C front-end and a C/C++ emission back-end to translate HLS designs into/from MLIR for enabling the end-to-end ScaleHLS flow. Experimental results show that, comparing to the baseline designs only optimized by Xilinx Vivado HLS, ScaleHLS improves the performances with amazing quality-of-results – up to 768.1× better on computation kernel level programs and up to 3825.0× better on neural network models.
 								Please check out our [arXiv paper](https://arxiv.org/abs/2107.11673) for more details.
-												initial commit

											
										
										
											2020-04-21 05:25:12 +08:00
-												build code structure; update readme; remove pymlir directory

											
										
										
											2020-08-26 03:11:30 +08:00
+								## Quick Start
-												updated doc

											
										
										
											2021-05-01 01:34:36 +08:00
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								### 0. Download ScaleHLS
-												[Readme] update install and scalehls-clang instructions

											
										
										
											2021-05-01 14:28:38 +08:00
+								```sh
-												using llvm-project of Polygeist

											
										
										
											2021-08-24 02:26:18 +08:00
+								$ git clone --recursive git@github.com:hanchenye/scalehls.git
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								```
 								### 1. Install ScaleHLS
 								To enable the Python binding feature, please make sure the `pybind11` has been installed. To build MLIR and ScaleHLS, run (note that the `-DLLVM_PARALLEL_LINK_JOBS` option can be tuned to reduce the memory usage):
 								```sh
-												Update README; Enable output QoR estimation in DSE; Register LLVM dialect

											
										
										
											2021-09-30 17:44:47 +08:00
+								$ mkdir scalehls/build
 								$ cd scalehls/build
-												Update readme

											
										
										
											2021-10-01 03:07:10 +08:00
+								$ cmake -G Ninja ../polygeist/llvm-project/llvm \
-												update README; update polybench C test cases

											
										
										
											2021-08-24 05:16:20 +08:00
+								    -DLLVM_ENABLE_PROJECTS="mlir;clang" \
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								    -DLLVM_EXTERNAL_PROJECTS="scalehls" \
 								    -DLLVM_EXTERNAL_SCALEHLS_SOURCE_DIR=$PWD/.. \
-												Unified install for MLIR, Clang, Polygeist, and ScaleHLS, prepare for Python binding feature

											
										
										
											2021-09-30 12:33:33 +08:00
+								    -DLLVM_TARGETS_TO_BUILD="host" \
-												[Readme] add LLVM install instruction

											
										
										
											2020-11-06 07:56:34 +08:00
+								    -DLLVM_ENABLE_ASSERTIONS=ON \
-												Support to import scalehls in python; Move analysis utils to support folder; Move qor estimator to transform folder

											
										
										
											2021-09-30 15:18:31 +08:00
+								    -DCMAKE_BUILD_TYPE=DEBUG \
 								    -DMLIR_ENABLE_BINDINGS_PYTHON=ON \
-												Update README; Enable output QoR estimation in DSE; Register LLVM dialect

											
										
										
											2021-09-30 17:44:47 +08:00
+								    -DSCALEHLS_ENABLE_BINDINGS_PYTHON=ON \
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								    -DLLVM_PARALLEL_LINK_JOBS= \
 								    -DLLVM_USE_LINKER=lld \
-												Update README; Enable output QoR estimation in DSE; Register LLVM dialect

											
										
										
											2021-09-30 17:44:47 +08:00
+								    -DCMAKE_C_COMPILER=clang \
 								    -DCMAKE_CXX_COMPILER=clang++
-												[Readme] add LLVM install instruction

											
										
										
											2020-11-06 07:56:34 +08:00
+								$ ninja
 								$ ninja check-scalehls
-												Update readme

											
										
										
											2021-10-03 23:19:58 +08:00
+								$ export PATH=$PATH:$PWD/bin
-												Export PYTHONPATH in the installation instruction

											
										
										
											2021-10-08 02:53:18 +08:00
+								$ export PYTHONPATH=$PYTHONPATH:$PWD/tools/scalehls/python_packages/scalehls_core
-												[Readme] add LLVM install instruction

											
										
										
											2020-11-06 07:56:34 +08:00
+								```
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								ScaleHLS exploits the `mlir-clang` tool of Polygeist as the C front-end. To build Polygeist, run:
 								```sh
 								$ mkdir scalehls/polygeist/build
 								$ cd scalehls/polygeist/build
 								$ cmake -G Ninja .. \
 								    -DMLIR_DIR=$PWD/../../build/lib/cmake/mlir \
 								    -DCLANG_DIR=$PWD/../../build/lib/cmake/clang \
 								    -DLLVM_ENABLE_ASSERTIONS=ON \
 								    -DCMAKE_BUILD_TYPE=DEBUG \
 								    -DLLVM_USE_LINKER=lld \
 								    -DCMAKE_C_COMPILER=clang \
 								    -DCMAKE_CXX_COMPILER=clang++
 								$ ninja check-mlir-clang
-												Update readme

											
										
										
											2021-10-03 23:19:58 +08:00
+								$ export PATH=$PATH:$PWD/mlir-clang
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								```
-												Unified install for MLIR, Clang, Polygeist, and ScaleHLS, prepare for Python binding feature

											
										
										
											2021-09-30 12:33:33 +08:00
+								### 2. Try ScaleHLS
-												Update installation instruction

											
										
										
											2021-10-02 02:22:03 +08:00
+								After the installation and regression test successfully completed, you should be able to play with:
-												update readme

											
										
										
											2020-09-06 16:25:26 +08:00
+								```sh
-												update README; update polybench C test cases

											
										
										
											2021-08-24 05:16:20 +08:00
+								$ cd scalehls
-												[Readme] update instructions for testing ScaleHLS; [QoREstimation] remove getOpII method

											
										
										
											2020-12-20 13:31:52 +08:00
-												[scalehls-clang] Update test cases; [SCFToAffine] Solve an issue when loop bound is induction variable

											
										
										
											2021-08-12 11:29:30 +08:00
+								$ # HLS C programs parsing and automatic kernel-level design space exploration.
-												Add pyscalehls tool

											
										
										
											2021-10-07 03:44:35 +08:00
+								$ mlir-clang samples/polybench/gemm/gemm_32.c -function=gemm_32 -memref-fullrank -raise-scf-to-affine -S | \
-												Remove config folder; Update -multiple-level-dse to -dse

											
										
										
											2021-09-30 17:57:45 +08:00
+								    scalehls-opt -dse="top-func=gemm_32 output-path=./ target-spec=samples/polybench/target-spec.ini" \
-												[scalehls-clang] Update test cases; [SCFToAffine] Solve an issue when loop bound is induction variable

											
										
										
											2021-08-12 11:29:30 +08:00
+								    -debug-only=scalehls > /dev/null
-												Update README; Enable output QoR estimation in DSE; Register LLVM dialect

											
										
										
											2021-09-30 17:44:47 +08:00
+								$ scalehls-translate -emit-hlscpp gemm_32_pareto_0.mlir > gemm_32_pareto_0.cpp
-												[Readme] update install and scalehls-clang instructions

											
										
										
											2021-05-01 14:28:38 +08:00
-												[Readme] update readme for using instructions

											
										
										
											2021-04-27 09:54:10 +08:00
+								$ # Loop and directive-level optimizations, QoR estimation, and C++ code generation.
 								$ scalehls-opt samples/polybench/syrk/syrk_32.mlir \
-												[MultipleLevelDSE] support to emit scalehls debug info; fix bugs that AffineApplyOp cannot be correctly folded; update readme accordingly

											
										
										
											2021-01-22 04:52:30 +08:00
+								    -affine-loop-perfection -affine-loop-order-opt -remove-variable-bound \
-												[Readme] update readme for using instructions

											
										
										
											2021-04-27 09:54:10 +08:00
+								    -partial-affine-loop-tile="tile-size=2" -legalize-to-hlscpp="top-func=syrk_32" \
-												[QoREstimation] improve estimation speed by using parallel loop information; [Transforms] update utils to apply LegalizeToHLSCpp after loop tiling; [MultipleLevelDSE] add topFunc as pass option

											
										
										
											2021-03-01 13:30:57 +08:00
+								    -loop-pipelining="pipeline-level=3 target-ii=2" -canonicalize -simplify-affine-if \
-												[Readme] update readme for using instructions

											
										
										
											2021-04-27 09:54:10 +08:00
+								    -affine-store-forward -simplify-memref-access -array-partition -cse -canonicalize \
-												Update readme

											
										
										
											2021-10-03 23:19:58 +08:00
+								    -qor-estimation="target-spec=samples/polybench/target-spec.ini" \
-												[QoREstimation] support function call estimation, a known issue is CallOps inside of loops are not comprehensively considered; estimation refinement for multiple loops and select op (#5); fix related bugs

											
										
										
											2020-12-22 09:02:39 +08:00
+								    | scalehls-translate -emit-hlscpp
-												update readme

											
										
										
											2020-09-06 16:25:26 +08:00
+								```
-												[README] add detailed build & test instruction; update TODOs list

											
										
										
											2020-09-15 13:57:44 +08:00
-												[Readme] add onnx-mlir integration test instructions; add mnist.onnx, resnet18 test case; [EmitHLSCpp] support cast ops emission

											
										
										
											2020-12-24 14:15:47 +08:00
+								## Integration with ONNX-MLIR
-												[Readme] update onnx-mlir integration test instruction; [Samples] add onnx-mlir and PolyBench folder, add export_resnet18.py script

											
										
										
											2020-12-26 06:18:38 +08:00
+								If you have installed ONNX-MLIR or established ONNX-MLIR docker to `$ONNXMLIR_DIR` following the instruction from (https://github.com/onnx/onnx-mlir), you should be able to run the following integration test:
-												[Readme] add onnx-mlir integration test instructions; add mnist.onnx, resnet18 test case; [EmitHLSCpp] support cast ops emission

											
										
										
											2020-12-24 14:15:47 +08:00
+								```sh
-												update README; update polybench C test cases

											
										
										
											2021-08-24 05:16:20 +08:00
+								$ cd scalehls/samples/onnx-mlir/resnet18
-												[Readme] update onnx-mlir integration test instruction; [Samples] add onnx-mlir and PolyBench folder, add export_resnet18.py script

											
										
										
											2020-12-26 06:18:38 +08:00
 								$ # Export PyTorch model to ONNX.
 								$ python export_resnet18.py
-												[Readme] add onnx-mlir integration test instructions; add mnist.onnx, resnet18 test case; [EmitHLSCpp] support cast ops emission

											
										
										
											2020-12-24 14:15:47 +08:00
 								$ # Parse ONNX model to MLIR.
-												[Readme] update onnx-mlir integration test instruction; [Samples] add onnx-mlir and PolyBench folder, add export_resnet18.py script

											
										
										
											2020-12-26 06:18:38 +08:00
+								$ $ONNXMLIR_DIR/build/bin/onnx-mlir -EmitONNXIR resnet18.onnx
-												[Readme] add onnx-mlir integration test instructions; add mnist.onnx, resnet18 test case; [EmitHLSCpp] support cast ops emission

											
										
										
											2020-12-24 14:15:47 +08:00
 								$ # Lower from ONNX dialect to Affine dialect.
-												[Readme] update onnx-mlir integration test instruction; [Samples] add onnx-mlir and PolyBench folder, add export_resnet18.py script

											
										
										
											2020-12-26 06:18:38 +08:00
+								$ $ONNXMLIR_DIR/build/bin/onnx-mlir-opt resnet18.onnx.mlir \
 								    -shape-inference -convert-onnx-to-krnl -pack-krnl-constants \
 								    -convert-krnl-to-affine > resnet18.mlir
 								$ # (Optional) Print model graph.
 								$ scalehls-opt resnet18.tmp -print-op-graph 2> resnet18.gv
 								$ dot -Tpng resnet18.gv > resnet18.png
-												[Readme] add onnx-mlir integration test instructions; add mnist.onnx, resnet18 test case; [EmitHLSCpp] support cast ops emission

											
										
										
											2020-12-24 14:15:47 +08:00
 								$ # Legalize the output of ONNX-MLIR, optimize and emit C++ code.
-												[Readme] update references

											
										
										
											2021-04-27 10:05:18 +08:00
+								$ scalehls-opt resnet18.mlir -allow-unregistered-dialect -legalize-onnx \
-												[Readme] update install and scalehls-clang instructions

											
										
										
											2021-05-01 14:28:38 +08:00
+								    -affine-loop-normalize -canonicalize -legalize-dataflow="insert-copy=true min-gran=3" \
-												[Readme] update references

											
										
										
											2021-04-27 10:05:18 +08:00
+								    -split-function -convert-linalg-to-affine-loops -legalize-to-hlscpp="top-func=main_graph" \
-												[Readme] update readme for using instructions

											
										
										
											2021-04-27 09:54:10 +08:00
+								    -affine-loop-perfection -affine-loop-order-opt -loop-pipelining -simplify-affine-if \
 								    -affine-store-forward -simplify-memref-access -array-partition -cse -canonicalize \
-												[Samples] add scripts folder, polybench and resnet18 test script; [MultipleLevelDSE] randomize neighbor and pareto point selection

											
										
										
											2021-03-03 02:24:23 +08:00
+								    | scalehls-translate -emit-hlscpp > resnet18.cpp
-												[Readme] add onnx-mlir integration test instructions; add mnist.onnx, resnet18 test case; [EmitHLSCpp] support cast ops emission

											
										
										
											2020-12-24 14:15:47 +08:00
+								```
-												[Readme] update readme for using instructions

											
										
										
											2021-04-27 09:54:10 +08:00
+								Please refer to the `samples/onnx-mlir` folder for more test cases, and `sample/onnx-mlir/ablation_int_test.sh` for how to conduct the graph, loop, and directive optimizations.
-												update readme

											
										
										
											2020-04-21 05:29:04 +08:00
+								## References
-												[Readme] update references

											
										
										
											2021-04-27 10:05:18 +08:00
+. [MLIR](https://mlir.llvm.org): Multi-Level Intermediate Representation
 . [NPComp](https://github.com/llvm/mlir-npcomp): MLIR based compiler toolkit for numerical python programs
 . [ONNX-MLIR](https://github.com/onnx/onnx-mlir): The Open Neural Network Exchange implementation in MLIR
 . [CIRCT](https://github.com/llvm/circt): Circuit IR Compilers and Tools
 . [COMBA](https://github.com/zjru/COMBA): A Model-Based Analysis Framework for High Level Synthesis on FPGAs