Go to file

Hanchen Ye 85c47e98e3 [QoREstimation] support function call estimation, a known issue is CallOps inside of loops are not comprehensively considered; estimation refinement for multiple loops and select op (#5 ); fix related bugs		2020-12-21 19:02:39 -06:00
config	[QoREstimation] support function call estimation, a known issue is CallOps inside of loops are not comprehensively considered; estimation refinement for multiple loops and select op (#5 ); fix related bugs	2020-12-21 19:02:39 -06:00
include	[QoREstimation] support function call estimation, a known issue is CallOps inside of loops are not comprehensively considered; estimation refinement for multiple loops and select op (#5 ); fix related bugs	2020-12-21 19:02:39 -06:00
lib	[QoREstimation] support function call estimation, a known issue is CallOps inside of loops are not comprehensively considered; estimation refinement for multiple loops and select op (#5 ); fix related bugs	2020-12-21 19:02:39 -06:00
samples	[Passes] small bugs fixed; remove HLSCppAnalyzer; insert-pipeline-pragma to loop-pipelining	2020-12-14 13:45:13 -06:00
test	[EmitHLSCpp] support to emit line number and schedule information; [HLSCpp] update all code related to storage_type attribute to align with Vivado 2019.1 settings	2020-12-20 17:52:56 -06:00
tools	remove redundant includes in all files; [QoREstimation] refactor include structure	2020-12-18 21:16:22 -06:00
.clang-format	mechanical rename hlsld to scalehls; update file structure	2020-09-06 18:05:16 -05:00
.gitignore	[gitignore] update ignore list; [BenchmarkGen] add auto_gen_cnn.mlir test case	2020-12-18 20:22:38 -06:00
CMakeLists.txt	change lit report style	2020-09-14 19:56:06 -05:00
README.md	[QoREstimation] support function call estimation, a known issue is CallOps inside of loops are not comprehensively considered; estimation refinement for multiple loops and select op (#5 ); fix related bugs	2020-12-21 19:02:39 -06:00

README.md

ScaleHLS Project (scalehls)

This project aims to create a framework that ultimately converts an algorithm written in a high level language into an efficient hardware implementation. With multiple levels of intermediate representations (IRs), MLIR appears to be the ideal tool for exploring ways to optimize the eventual design at various levels of abstraction (e.g. various levels of parallelism). Our framework will be based on MLIR, it will incorporate a backend for high level synthesis (HLS) C/C++ code. However, the key contribution will be our parametrization and optimization of a tremendously large design space.

Quick Start

1. Install LLVM and MLIR

IMPORTANT This step assumes that you have cloned LLVM from (https://github.com/circt/llvm) to $LLVM_DIR. To build LLVM and MLIR, run

$ mkdir $LLVM_DIR/build
$ cd $LLVM_DIR/build
$ cmake -G Ninja ../llvm \
    -DLLVM_ENABLE_PROJECTS="mlir" \
    -DLLVM_TARGETS_TO_BUILD="X86;RISCV" \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DCMAKE_BUILD_TYPE=DEBUG
$ ninja
$ ninja check-mlir

2. Install ScaleHLS

This step assumes this repository is cloned to $SCALEHLS_DIR. To build and launch the tests, run

$ mkdir $SCALEHLS_DIR/build
$ cd $SCALEHLS_DIR/build
$ cmake -G Ninja .. \
    -DMLIR_DIR=$LLVM_DIR/build/lib/cmake/mlir \
    -DLLVM_DIR=$LLVM_DIR/build/lib/cmake/llvm \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DCMAKE_BUILD_TYPE=DEBUG
$ ninja check-scalehls

3. Try ScaleHLS

After the installation and test successfully completed, you should be able to play with

$ export PATH=$SCALEHLS_DIR/build/bin:$PATH
$ cd $SCALEHLS_DIR

$ # Benchmark generation, dataflow-level optimization, HLSKernel lowering and bufferization.
$ benchmark-gen -type "cnn" -config "config/cnn-config.ini" -number 1 \
    | scalehls-opt -legalize-dataflow -split-function \
    -hlskernel-bufferize -hlskernel-to-affine -func-bufferize -canonicalize

$ # Loop and pragma-level optimizations, performance estimation, and HLS C++ code generation.
$ scalehls-opt test/Conversion/HLSKernelToAffine/test_gemm.mlir -hlskernel-to-affine \
    -affine-loop-perfection -remove-var-loop-bound -affine-loop-normalize \
    -partial-affine-loop-tile="tile-level=1 tile-size=4" \
    -convert-to-hlscpp="top-function=test_gemm" -loop-pipelining="pipeline-level=1" \
    -store-op-forward -simplify-memref-access -array-partition -cse -canonicalize \
    -qor-estimation="target-spec=config/target-spec.ini" \
    | scalehls-translate -emit-hlscpp

$ # Put them together.
$ benchmark-gen -type "cnn" -config "config/cnn-config.ini" -number 1 \
    | scalehls-opt -legalize-dataflow -split-function \
    -hlskernel-bufferize -hlskernel-to-affine -func-bufferize \
    -affine-loop-perfection -affine-loop-normalize \
    -convert-to-hlscpp="top-function=auto_gen_cnn" \
    -store-op-forward -simplify-memref-access -cse -canonicalize \
    -qor-estimation="target-spec=config/target-spec.ini" \
    | scalehls-translate -emit-hlscpp

Ablation study

If Vivado HLS (2019.1 tested) is installed on your machine, running the following script will report the HLS results for some benchmarks (around 8 hours on AMD Ryzen7 3800X for all 33 tests).

For the ablation_test_run.sh script, -n determines the number of tests to be processed, the maximum supported value of which is 33; -c determines from which test to begin to rerun the C++ synthesis. The generated C++ source code will be written to sample/cpp_src; the Vivado HLS project will be established in sample/hls_proj; the collected report will be written to sample/test_results; the test summary will be generated to sample.

$ cd $SCALEHLS_DIR/sample
$ ./ablation_test_run.sh -n 33 -c 0