f2a4f7fe64 | ||
---|---|---|
.github/workflows | ||
common | ||
compilation | ||
examples/vecadd | ||
external | ||
runtime | ||
.gitignore | ||
.gitmodules | ||
.pre-commit-config.yaml | ||
CMakeLists.txt | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md |
README.md
CuPBoP: Cuda for Parallelized and Broad-range Processors
Introduction
CuPBoP is a framework which support executing unmodified CUDA source code on non-NVIDIA devices. Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V. Supporting the RISC-V GPU Vortex is working in progress.
Install
Prerequisites
- Linux system
- LLVM 14.0.1
Installation
-
Clone from github
git clone --recursive https://github.com/drcut/CuPBoP cd CuPBoP export CuPBoP_PATH=`pwd` export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
-
As CuPBoP relies on CUDA structures, we need to download the CUDA header file
wget https://www.dropbox.com/s/r18io0zu3idke5p/cuda-header.tar.gz?dl=1 tar -xzf 'cuda-header.tar.gz?dl=1' cp -r include/* runtime/threadPool/include/
-
Other CUDA files are also required for compiling CUDA source code to LLVM IR
wget https://www.dropbox.com/s/4pckqsjnl920gpn/cuda-10.1.tar.gz?dl=1 tar -xzf 'cuda-10.1.tar.gz?dl=1'
-
Build CuPBoP
mkdir build && cd build cmake .. -DLLVM_CONFIG_PATH=`which llvm-config` # need path to llvm-config make
Run Vector Addition example
cd examples/vecadd
# Compile CUDA source code (both host and kernel) to bitcode files
clang++ -std=c++11 vecadd.cu \
-I../.. --cuda-path=$CuPBoP_PATH/cuda-10.1 \
--cuda-gpu-arch=sm_50 -L$CuPBoP_PATH/cuda-10.1/lib64 \
-lcudart_static -ldl -lrt -pthread -save-temps -v || true
# Apply compilation transformations on the kernel bitcode file
$CuPBoP_PATH/build/compilation/kernelTranslator \
vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
# Apply compilation transformations on the host bitcode file
$CuPBoP_PATH/build/compilation/hostTranslator \
vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
# Generate object files
llc --relocation-model=pic --filetype=obj kernel.bc
llc --relocation-model=pic --filetype=obj host.bc
# Link with runtime libraries and generate the executable file
g++ -o vecadd -fPIC -no-pie \
-I$CuPBoP_PATH/runtime/threadPool/include \
-L$CuPBoP_PATH/build/runtime \
-L$CuPBoP_PATH/build/runtime/threadPool \
host.o kernel.o \
-I../.. -lpthread -lc -lx86Runtime -lthreadPool
# Execute
./vecadd
How to contribute?
Any kinds of contributions are welcome. Please refer to Contribution.md for more detail.
Related publications
- COX: Exposing CUDA Warp-Level Functions to CPUs ACM Transactions on Architecture and Code Optimization link
- CuPBoP: CUDA for Parallelized and Broad-range Processors arxiv preprint link
Contributors
- Ruobing Han
- Jun Chen
- Bhanu Garg
- Xule Zhou
- John Lu
- Chihyo Ahn
- Haotian Sheng
- Blaise Tine
- Hyesoon Kim