2022-05-05 09:11:25 +08:00
|
|
|
# CuPBoP: Cuda for Parallelized and Broad-range Processors
|
2022-01-12 00:01:42 +08:00
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
2022-05-05 09:11:25 +08:00
|
|
|
CuPBoP is a framework which support executing unmodified CUDA source code
|
|
|
|
on non-NVIDIA devices.
|
|
|
|
Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V.
|
2022-09-22 23:20:50 +08:00
|
|
|
Supporting [Vortex](https://vortex.cc.gatech.edu/) (a RISC-V GPU) is working in progress.
|
2022-01-12 00:01:42 +08:00
|
|
|
|
|
|
|
## Install
|
|
|
|
|
|
|
|
### Prerequisites
|
|
|
|
|
2022-09-15 23:15:21 +08:00
|
|
|
- Linux system
|
2022-09-08 07:21:14 +08:00
|
|
|
- [LLVM 14.0.1](https://github.com/llvm/llvm-project/releases/tag/llvmorg-14.0.1)
|
2022-09-22 23:20:50 +08:00
|
|
|
- CUDA Toolkit
|
|
|
|
|
|
|
|
Although CuPBoP does not require NVIDIA GPUs,
|
|
|
|
it needs CUDA to compile the source programs to NVVM/LLVM IRs.
|
|
|
|
CUDA toolkit can be built on machines without NVIDIA GPUs.
|
|
|
|
For building CUDA toolkit, please refer to <https://developer.nvidia.com/cuda-downloads>.
|
2022-01-12 00:01:42 +08:00
|
|
|
|
|
|
|
### Installation
|
|
|
|
|
|
|
|
1. Clone from github
|
|
|
|
|
2022-05-05 09:11:25 +08:00
|
|
|
```bash
|
2023-03-15 08:39:51 +08:00
|
|
|
git clone --recursive https://github.com/cupbop/CuPBoP
|
2022-05-05 09:11:25 +08:00
|
|
|
cd CuPBoP
|
|
|
|
export CuPBoP_PATH=`pwd`
|
|
|
|
export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
|
2022-09-22 23:20:50 +08:00
|
|
|
export CUDA_PATH=/usr/local/cuda-11.7 # set to your own location
|
2022-05-05 09:11:25 +08:00
|
|
|
```
|
2022-01-12 00:01:42 +08:00
|
|
|
|
2022-09-22 23:20:50 +08:00
|
|
|
2. Build CuPBoP
|
2022-01-12 00:01:42 +08:00
|
|
|
|
2022-05-05 09:11:25 +08:00
|
|
|
```bash
|
|
|
|
mkdir build && cd build
|
2022-09-22 23:20:50 +08:00
|
|
|
#set -DDEBUG=ON for debugging
|
|
|
|
cmake .. \
|
|
|
|
-DLLVM_CONFIG_PATH=`which llvm-config` \
|
|
|
|
-DCUDA_PATH=$CUDA_PATH
|
2022-05-05 09:11:25 +08:00
|
|
|
make
|
|
|
|
```
|
2022-01-12 00:01:42 +08:00
|
|
|
|
2022-09-26 09:34:01 +08:00
|
|
|
3. (Optional) Use CuPBoP to execute Hetero-mark benchmark for verification
|
|
|
|
|
|
|
|
```bash
|
|
|
|
make test
|
|
|
|
```
|
|
|
|
|
2022-09-15 23:15:21 +08:00
|
|
|
## Run Vector Addition example
|
2022-05-05 09:11:25 +08:00
|
|
|
|
2022-09-26 09:34:01 +08:00
|
|
|
In this section, we provide an example of how to use CuPBoP to execute a CUDA program.
|
|
|
|
|
2022-05-05 09:11:25 +08:00
|
|
|
```bash
|
2022-09-15 23:15:21 +08:00
|
|
|
cd examples/vecadd
|
|
|
|
# Compile CUDA source code (both host and kernel) to bitcode files
|
|
|
|
clang++ -std=c++11 vecadd.cu \
|
2022-09-22 23:20:50 +08:00
|
|
|
-I../.. --cuda-path=$CUDA_PATH \
|
|
|
|
--cuda-gpu-arch=sm_50 -L$CUDA_PATH/lib64 \
|
2022-09-15 23:15:21 +08:00
|
|
|
-lcudart_static -ldl -lrt -pthread -save-temps -v || true
|
|
|
|
# Apply compilation transformations on the kernel bitcode file
|
|
|
|
$CuPBoP_PATH/build/compilation/kernelTranslator \
|
|
|
|
vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
|
|
|
|
# Apply compilation transformations on the host bitcode file
|
|
|
|
$CuPBoP_PATH/build/compilation/hostTranslator \
|
|
|
|
vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
|
|
|
|
# Generate object files
|
2022-05-05 09:11:25 +08:00
|
|
|
llc --relocation-model=pic --filetype=obj kernel.bc
|
|
|
|
llc --relocation-model=pic --filetype=obj host.bc
|
2022-09-15 23:15:21 +08:00
|
|
|
# Link with runtime libraries and generate the executable file
|
|
|
|
g++ -o vecadd -fPIC -no-pie \
|
|
|
|
-I$CuPBoP_PATH/runtime/threadPool/include \
|
|
|
|
-L$CuPBoP_PATH/build/runtime \
|
|
|
|
-L$CuPBoP_PATH/build/runtime/threadPool \
|
|
|
|
host.o kernel.o \
|
2022-09-22 23:20:50 +08:00
|
|
|
-I../.. -lc -lx86Runtime -lthreadPool -lpthread
|
2022-09-15 23:15:21 +08:00
|
|
|
# Execute
|
|
|
|
./vecadd
|
2022-05-05 09:11:25 +08:00
|
|
|
```
|
2022-05-07 04:08:28 +08:00
|
|
|
|
|
|
|
## How to contribute?
|
|
|
|
|
2022-05-07 04:15:17 +08:00
|
|
|
Any kinds of contributions are welcome.
|
|
|
|
Please refer to [Contribution.md](./CONTRIBUTING.md) for more detail.
|
2022-09-08 07:21:14 +08:00
|
|
|
|
|
|
|
## Related publications
|
|
|
|
|
2022-09-23 02:53:32 +08:00
|
|
|
If you want to refer CuPBoP in your projects, please cite the related
|
|
|
|
papers:
|
|
|
|
|
|
|
|
- [COX: Exposing CUDA Warp-Level Functions to CPUs](https://dl.acm.org/doi/abs/10.1145/3554736)
|
|
|
|
- [CuPBoP: CUDA for Parallelized and Broad-range Processors](https://arxiv.org/abs/2206.07896)
|
2022-09-08 07:21:14 +08:00
|
|
|
|
|
|
|
## Contributors
|
|
|
|
|
|
|
|
- [Ruobing Han](https://drcut.github.io/)
|
|
|
|
- Jun Chen
|
|
|
|
- Bhanu Garg
|
|
|
|
- Xule Zhou
|
|
|
|
- John Lu
|
|
|
|
- [Chihyo Ahn](https://upcp.ece.gatech.edu/2021/09/01/chihyo-ahn/)
|
|
|
|
- Haotian Sheng
|
|
|
|
- Blaise Tine
|
|
|
|
- [Hyesoon Kim](https://faculty.cc.gatech.edu/~hyesoon/)
|
2022-09-28 21:54:12 +08:00
|
|
|
|
|
|
|
## Acknowledgements
|
|
|
|
|
|
|
|
- [POCL](http://portablecl.org/) is an open-source
|
|
|
|
OpenCL implementations that based on LLVM.
|
|
|
|
We reuse some code from it
|
|
|
|
(e.g., apply optimizations, load/store LLVM IRs).
|
|
|
|
- [Hetero-Mark](https://github.com/NUCAR-DEV/Hetero-Mark)
|
|
|
|
and [Rodinia Benchmark](https://github.com/yuhc/gpu-rodinia)
|
|
|
|
are two benchmark suites
|
|
|
|
for heterogeneous system computation.
|
|
|
|
CuPBoP uses them as integrated test to verify the correctness.
|
2022-09-28 21:54:30 +08:00
|
|
|
- [moodycamel::ConcurrentQueue](<https://github.com/cameron314/concurrentqueue/tree/master>)
|
2022-09-28 21:54:12 +08:00
|
|
|
is a fast multi-producer,
|
|
|
|
multi-consumer lock-free concurrent queue for C++11.
|
|
|
|
CuPBoP uses it as the task queue for launching and executing kernels.
|