Go to file
Hanchen Ye ab22a45c20 remove redundant includes 2021-04-21 22:20:37 -05:00
config [Samples] add baseline and opt c++ files for all computation kernels and dnn models shown in the paper 2021-04-08 17:59:36 -05:00
include remove redundant includes 2021-04-21 22:20:37 -05:00
lib remove redundant includes 2021-04-21 22:20:37 -05:00
samples [Samples] add baseline and opt c++ files for all computation kernels and dnn models shown in the paper 2021-04-08 17:59:36 -05:00
test [MultipeLevelDSE, LoopPipeline] fix bugs; [QoREstimation] fix no_touch attribute support; [LegalizeToHLSCpp] split out callable API; 2021-04-21 21:59:47 -05:00
tools [HLSCpp] update all attributes; add attribute parser and printer; [QoREstimation] update with ScheduleAttr 2021-04-16 01:07:08 -05:00
.clang-format mechanical rename hlsld to scalehls; update file structure 2020-09-06 18:05:16 -05:00
.gitignore [Samples] implement polybench dse test script 2021-03-29 21:50:53 -05:00
CMakeLists.txt add pthread support in cmake; [Transforms] support apply per-loop opt strategy in Utils; [Samples] increase problem size of some polybench test cases 2021-02-10 01:20:00 -06:00
LICENSE add license 2021-01-06 11:31:03 -06:00
README.md [MultipleLevelDSE] evaluate the tile config that fully unrolls all sequential loops in the design space initialization 2021-03-02 18:51:48 -06:00

README.md

ScaleHLS Project (scalehls)

This project aims to create a framework that ultimately converts an algorithm written in a high level language into an efficient hardware implementation. With multiple levels of intermediate representations (IRs), MLIR appears to be the ideal tool for exploring ways to optimize the eventual design at various levels of abstraction (e.g. various levels of parallelism). Our framework will be based on MLIR, it will incorporate a backend for high level synthesis (HLS) C/C++ code. However, the key contribution will be our parameterization and optimization of a tremendously large design space.

Quick Start

1. Install LLVM and MLIR

IMPORTANT This step assumes that you have cloned LLVM from (https://github.com/circt/llvm/tree/main) to $LLVM_DIR and checked out the main branch. To build LLVM and MLIR, run:

$ mkdir $LLVM_DIR/build
$ cd $LLVM_DIR/build
$ cmake -G Ninja ../llvm \
    -DLLVM_ENABLE_PROJECTS="mlir" \
    -DLLVM_TARGETS_TO_BUILD="X86;RISCV" \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DCMAKE_BUILD_TYPE=DEBUG
$ ninja
$ ninja check-mlir

2. Install ScaleHLS

This step assumes this repository is cloned to $SCALEHLS_DIR. To build and launch the tests, run:

$ mkdir $SCALEHLS_DIR/build
$ cd $SCALEHLS_DIR/build
$ cmake -G Ninja .. \
    -DMLIR_DIR=$LLVM_DIR/build/lib/cmake/mlir \
    -DLLVM_DIR=$LLVM_DIR/build/lib/cmake/llvm \
    -DLLVM_ENABLE_ASSERTIONS=ON \
    -DCMAKE_BUILD_TYPE=DEBUG
$ ninja check-scalehls

3. Try ScaleHLS

After the installation and test successfully completed, you should be able to play with:

$ export PATH=$SCALEHLS_DIR/build/bin:$PATH
$ cd $SCALEHLS_DIR

$ # Automatic kernel-level design space exploration.
$ scalehls-opt samples/polybench/gemm.mlir \
    -multiple-level-dse="target-spec=config/target-spec.ini dump-file=gemm_dse.csv top-func=gemm" \
    -debug-only=scalehls | scalehls-translate -emit-hlscpp

$ # Loop and pragma-level optimizations, performance estimation, and C++ code generation.
$ scalehls-opt samples/polybench/syrk.mlir \
    -affine-loop-perfection -affine-loop-order-opt -remove-variable-bound \
    -partial-affine-loop-tile="tile-size=2" -legalize-to-hlscpp="top-func=syrk" \
    -loop-pipelining="pipeline-level=3 target-ii=2" -canonicalize -simplify-affine-if \
    -affine-store-forward -simplify-memref-access -cse -array-partition \
    -qor-estimation="target-spec=config/target-spec.ini" \
    | scalehls-translate -emit-hlscpp

$ # Benchmark generation, dataflow-level optimization, HLSKernel lowering and bufferization.
$ benchmark-gen -type "cnn" -config "config/cnn-config.ini" -number 1 \
    | scalehls-opt -legalize-dataflow="min-gran=2 insert-copy=true" -split-function \
    -hlskernel-bufferize -hlskernel-to-affine -func-bufferize -canonicalize

Integration with ONNX-MLIR

If you have installed ONNX-MLIR or established ONNX-MLIR docker to $ONNXMLIR_DIR following the instruction from (https://github.com/onnx/onnx-mlir), you should be able to run the following integration test:

$ cd $SCALEHLS_DIR/samples/onnx-mlir

$ # Export PyTorch model to ONNX.
$ python export_resnet18.py

$ # Parse ONNX model to MLIR.
$ $ONNXMLIR_DIR/build/bin/onnx-mlir -EmitONNXIR resnet18.onnx

$ # Lower from ONNX dialect to Affine dialect.
$ $ONNXMLIR_DIR/build/bin/onnx-mlir-opt resnet18.onnx.mlir \
    -shape-inference -convert-onnx-to-krnl -pack-krnl-constants \
    -convert-krnl-to-affine > resnet18.mlir

$ # (Optional) Print model graph.
$ scalehls-opt resnet18.tmp -print-op-graph 2> resnet18.gv
$ dot -Tpng resnet18.gv > resnet18.png

$ # Legalize the output of ONNX-MLIR, optimize and emit C++ code.
$ scalehls-opt resnet18.mlir -allow-unregistered-dialect \
    -legalize-onnx -affine-loop-normalize -canonicalize \
    -legalize-dataflow="min-gran=3 insert-copy=true" -split-function \
    -convert-linalg-to-affine-loops -affine-loop-order-opt \
    -legalize-to-hlscpp="top-func=main_graph" -loop-pipelining -canonicalize \
    | scalehls-translate -emit-hlscpp > resnet18.cpp

References

  1. MLIR documents
  2. mlir-npcomp github
  3. onnx-mlir github
  4. circt github
  5. dahlia github
  6. comba github