学习TVM,从环境搭建开始。TVM的环境搭建包括Share Library的搭建和python环境的配置。主要参考TVM
1. 准备
- 使用的OS: Ubuntu18.04
- 
    Clone Git Project # Clone project with submodule git clone --recursive https://github.com/dmlc/tvm # Update git submodule update git submodule init
- TVM的环境搭建主要有两个步骤:
    - 首先从C ++代码构建共享库(libtvm.so用于linux,libtvm.dylib用于macOS,libtvm.dll用于windows),本文使用Linux作为开发环境。
- Python语言包的设置(eg. Python)
 
2. Share Library
这一步的目的是构建共享库:
- 在Linux上, 目标库是libtvm.so,libtvm_topi.so
- 在macOS上,目标库是libtvm.dylib,libtvm_topi.dylib
- 在Windows上,目标库是libtvm.dll,libtvm_topi.dll
- 
    安装依赖库 sudo apt-get update sudo apt-get install -y python python-dev python-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake相关的要求如下: - C++编译器支持C++11标准 (>= g++4.8)
- cmake >= 3.5
- llvm >= 4(如果编译特定的Function)
- 如果只是有Cuda/OpenCL,则可以在没有LLVM依赖关系的情况下构建。
- TVM如果想使用NNVM编译器,则需要LLVM.
 
- 
    CMAKE 使用cmake来构建库。可以通过config.cmake修改TVM的配置。 首先在tvm根目录下创建build目录,并将config.cmake复制到build目录下。 cd $TVM mkdir build cp cmake/config.cmake build可以通过修改cmake.config内容配置编译。 使用的PC上无法支持Cuda,因此我们需要 set(USE_CUDA OFF), 如果需要使用Cuda,那么可以set(USE_CUDA ON). 其他的配置也是如此,(eg. OpenCL, RCOM, METAL, VULKAN, …).具体配置变量如下: 编辑build/config.cmake文件,里面有一些功能开关,这些配置有: USE_CUDA # NVIDIA的GPU计算; USE_ROCM # 通用的GPU计算,AMD提出,目的很显然...; USE_SDACCEL # FPGA计算; USE_AOCL # Intel FPGA SDK for OpenCL (AOCL) runtime; USE_OPENCL # 异构平台编写程序的框架,异构平台可由CPU、GPU、DSP、FPGA或其他类型的处理器与硬件加速器所组成; USE_METAL # iOS上的GPU计算; USE_VULKAN # 新一代的openGL,Android 7.x开始支持(iOS不支持,因为有自己的metal2); USE_OPENGL # 2D/3D渲染库标准,显卡厂家负责实现和支持; USE_SGX # Intel SGX ; USE_RPC # 远程调用,电脑和手机可以通过网络联调; USE_STACKVM_RUNTIME # embed stackvm into the runtime; USE_GRAPH_RUNTIME # enable tiny embedded graph runtime; USE_GRAPH_RUNTIME_DEBUG # enable additional graph debug functions; USE_LLVM # llvm support; USE_BLAS # API标准,规范发布基础线性代数操作的数值库(如矢量或矩阵乘法),不同的实现有openblas, mkl, atlas, apple USE_RANDOM # contrib.random运行时; USE_NNPACK # USE_CUDNN # CUDNN USE_CUBLAS # USE_MIOPEN # USE_MPS # USE_ROCBLAS # USE_SORT # 使用contrib sort; USE_ANTLR # USE_VTA_TSIM, # VTA USE_RELAY_DEBUG # Relay debug模式
- 
    LLVM配置 CPU的Codegen需要通过LLVM实现。因此需要配置LLVM.要求如下: - >= 4.0
- 
        在Ubuntu下可以直接通过apt,安装: sudo apt install llvm-8 llvm-8-dev llvm-8-runtime
- 在build/cmake.config中设置:set(USE_LLVM /path/to/your/llvm/bin/llvm-config),比如:set(USE_LLVM /usr/lib/llvm-8/bin/llvm-config)
 
- 
    编译 做好上面的配置,可是编译了,如下: cd build # cmake -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON ## for verbose, by civilnet cmake .. make -j4最终链接出以下so库: - libvta_tsim.so
- libvta_fsim.so
- libtvm_runtime.so
- libtvm.so
- libtvm_topi.so
- libnnvm_compiler.so
- libvta_hw.so
 具体如下:[1] - 
        libvta_tsim.soVTA–Versatile Tensor Accelerator,参考VTA,该库为其Cycle级仿真库,由以下这几个编译单元生成: vta/src/device_api.cc vta/src/runtime.cc vta/src/tsim/tsim_driver.cc vta/src/dpi/module.cc
- 
        libvta_fsim.so该库为其快速仿真库,由以下这几个编译单元生成: vta/src/device_api.cc vta/src/runtime.cc vta/src/sim/sim_driver.cc
- 
        libtvm_runtime.so顾名思义,tvm的运行时,实际上,这个库是TVM运行时的一个最小化库,由“Minimum runtime related codes”编译而成——也即下面的这些源文件: src/runtime/builtin_fp16.cc src/runtime/c_dsl_api.cc src/runtime/c_runtime_api.cc src/runtime/cpu_device_api.cc src/runtime/dso_module.cc src/runtime/file_util.cc src/runtime/module.cc src/runtime/module_util.cc src/runtime/ndarray.cc src/runtime/registry.cc src/runtime/system_lib_module.cc src/runtime/thread_pool.cc src/runtime/threading_backend.cc src/runtime/vm/memory_manager.cc src/runtime/vm/object.cc src/runtime/vm/vm.cc src/runtime/workspace_pool.cc 3rdparty/bfloat16/bfloat16.cc src/runtime/rpc/*.cc src/runtime/graph/graph_runtime.cc src/contrib/sort/sort.cc
- 
        libtvm.so完整的tvm,由编译时、运行时、rpc部分等组成: - common: Internal common utilities.
- api: API function registration.
- lang: The definition of DSL related data structure.
- arithmetic: Arithmetic expression and set simplification.
- op: The detail implementations about each operation(compute, scan, placeholder).
- schedule: The operations on the schedule graph before converting to IR.
- pass: The optimization pass on the IR structure.
- codegen: The code generator.
- runtime: Minimum runtime related codes.
- autotvm: The auto-tuning module.
- relay: Implementation of Relay. The second generation of NNVM, a new IR for deep learning frameworks.
- contrib: Contrib extension libraries.
 这个库比较大,有200多个编译单元: src/api/*.cc src/arithmetic/*.cc src/autotvm/*.cc src/codegen/*.cc src/lang/*.cc src/op/*.cc src/pass/*.cc src/schedule/*.cc src/relay/backend/*.cc src/relay/ir/*.cc src/relay/op/*.cc src/relay/pass/*.cc 3rdparty/HalideIR/src/*.cpp src/runtime/stackvm/*.cc src/codegen/opt/*.cc src/codegen/llvm/*.cc src/runtime/*.cc src/contrib/hybrid/codegen_hybrid.cc 3rdparty/bfloat16/bfloat16.cc src/contrib/sort/sort.cc
- 
        libtvm_topi.soTOPI(TVM OP Inventory),is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels。由下面的编译单元生成: topi/src/topi.cc
- 
        libnnvm_compiler.soNNVM编译器,由以下编译单元生成: nnvm/src/c_api/*.cc nnvm/src/compiler/*.cc nnvm/src/core/*.cc nnvm/src/pass/*.cc nnvm/src/top/nn/*.cc nnvm/src/top/tensor/*.cc nnvm/src/top/vision/nms.cc nnvm/src/top/vision/ssd/mutibox_op.cc nnvm/src/top/vision/yolo/reorg.cc nnvm/src/top/image/resize.cc
- 
        libvta_hw.so在使用vta的TSIM仿真器时,需要通过使用hardware-chisel编译的硬件动态链接库 libvta_hw.so。其具体的编译流程如下:首先进入$TVM_HOME/vta/hardware/chisel下,在Makefile文件中添加编译参数 -fPIC如下:cxx_flags += -fPIC-fPIC: Thefis the gcc prefix for options that “control the interface conventions used in code generation”. ThePICstands for “Position Independent Code”, it is a specialization of the fpic for m68K and SPARC.[2]然后,通过make编译即可。 添加编译参数的目的是避免如下错误. [3] relocation R_X86_64_32 against '.rodata' can not be used when making a shared object;
 
3. Python环境
Python环境搭建比较容易,只需要添加Python库的PATH,如下:
export TVM_HOME=/home/gemfield/github/Gemfield/tvm/
export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:$TVM_HOME/nnvm/python:${PYTHONPATH}
值得注意的是,TVM放弃了对Python2的支持,需要Python>=3.5.
安装python依赖:
- 
    Necessary dependencies: pip3 install --user numpy decorator attrs
- 
    If you want to use RPC Tracker pip3 install --user tornado
- 
    If you want to use auto-tuning module pip3 install --user tornado psutil xgboost
- 
    If you want to parse Relay text format progams, you must use Python 3 and run the following pip3 install --user mypy orderedset antlr4-python3-runtime
- 
    安装onnx pip3 install onnx
Reference
[1] Gemfield, PyTorch转TVM, [OL], 2019-08-04 https://zhuanlan.zhihu.com/p/58995914
[2] What does -fPIC mean when building a shared library? [OL], https://stackoverflow.com/questions/966960/what-does-fpic-mean-when-building-a-shared-library
[3][OL], https://blog.csdn.net/u010312436/article/details/52486811
更新日志
- 2019.08.27: 代码更新libvta_tsim.so和libvta_fsim.so.
- 2019.08.28: 添加libvta_hw.so说明.
- 2019.08.01: 添加-fPIC说明。
