CuPBoP/README.md

# CuPBoP: Cuda for Parallelized and Broad-range Processors

## Introduction

CuPBoP is a framework which support executing unmodified CUDA source code
on non-NVIDIA devices.
Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V.
Supporting [Vortex](https://vortex.cc.gatech.edu/) (a RISC-V GPU) is working in progress.

## Install

### Prerequisites

- Linux system
- [LLVM 14.0.1](https://github.com/llvm/llvm-project/releases/tag/llvmorg-14.0.1)
- CUDA Toolkit

Although CuPBoP does not require NVIDIA GPUs,
it needs CUDA to compile the source programs to NVVM/LLVM IRs.
CUDA toolkit can be built on machines without NVIDIA GPUs.
For building CUDA toolkit, please refer to <https://developer.nvidia.com/cuda-downloads>.

### Installation

1. Clone from github

   ```bash
   git clone --recursive https://github.com/drcut/CuPBoP
   cd CuPBoP
   export CuPBoP_PATH=`pwd`
   export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
   export CUDA_PATH=/usr/local/cuda-11.7 # set to your own location
   ```

2. Build CuPBoP

   ```bash
   mkdir build && cd build
   #set -DDEBUG=ON for debugging
   cmake .. \
      -DLLVM_CONFIG_PATH=`which llvm-config` \
      -DCUDA_PATH=$CUDA_PATH
   make
   ```

3. (Optional) Use CuPBoP to execute Hetero-mark benchmark for verification

   ```bash
   make test
   ```

## Run Vector Addition example

In this section, we provide an example of how to use CuPBoP to execute a CUDA program.

```bash
cd examples/vecadd
# Compile CUDA source code (both host and kernel) to bitcode files
clang++ -std=c++11 vecadd.cu \
      -I../.. --cuda-path=$CUDA_PATH \
      --cuda-gpu-arch=sm_50 -L$CUDA_PATH/lib64 \
      -lcudart_static -ldl -lrt -pthread -save-temps -v  || true
# Apply compilation transformations on the kernel bitcode file
$CuPBoP_PATH/build/compilation/kernelTranslator \
      vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
# Apply compilation transformations on the host bitcode file
$CuPBoP_PATH/build/compilation/hostTranslator \
      vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
# Generate object files
llc --relocation-model=pic --filetype=obj  kernel.bc
llc --relocation-model=pic --filetype=obj  host.bc
# Link with runtime libraries and generate the executable file
g++ -o vecadd -fPIC -no-pie \
      -I$CuPBoP_PATH/runtime/threadPool/include \
      -L$CuPBoP_PATH/build/runtime  \
      -L$CuPBoP_PATH/build/runtime/threadPool \
      host.o kernel.o \
      -I../.. -lc -lx86Runtime -lthreadPool -lpthread
# Execute
./vecadd
```

## How to contribute?

Any kinds of contributions are welcome.
Please refer to [Contribution.md](./CONTRIBUTING.md) for more detail.

## Related publications

If you want to refer CuPBoP in your projects, please cite the related
papers:

- [COX: Exposing CUDA Warp-Level Functions to CPUs](https://dl.acm.org/doi/abs/10.1145/3554736)
- [CuPBoP: CUDA for Parallelized and Broad-range Processors](https://arxiv.org/abs/2206.07896)

## Contributors

- [Ruobing Han](https://drcut.github.io/)
- Jun Chen
- Bhanu Garg
- Xule Zhou
- John Lu
- [Chihyo Ahn](https://upcp.ece.gatech.edu/2021/09/01/chihyo-ahn/)
- Haotian Sheng
- Blaise Tine
- [Hyesoon Kim](https://faculty.cc.gatech.edu/~hyesoon/)
update READMe.md 2022-05-05 09:11:25 +08:00			`# CuPBoP: Cuda for Parallelized and Broad-range Processors`
add backbone, including basic features for compilation 2022-01-12 00:01:42 +08:00
			`## Introduction`

update READMe.md 2022-05-05 09:11:25 +08:00			`CuPBoP is a framework which support executing unmodified CUDA source code`
			`on non-NVIDIA devices.`
			`Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V.`
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`Supporting [Vortex](https://vortex.cc.gatech.edu/) (a RISC-V GPU) is working in progress.`
add backbone, including basic features for compilation 2022-01-12 00:01:42 +08:00
			`## Install`

			`### Prerequisites`

add vecadd example and update README.md 2022-09-15 23:15:21 +08:00			`- Linux system`
edit README.md 2022-09-08 07:21:14 +08:00			`- [LLVM 14.0.1](https://github.com/llvm/llvm-project/releases/tag/llvmorg-14.0.1)`
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`- CUDA Toolkit`

			`Although CuPBoP does not require NVIDIA GPUs,`
			`it needs CUDA to compile the source programs to NVVM/LLVM IRs.`
			`CUDA toolkit can be built on machines without NVIDIA GPUs.`
			`For building CUDA toolkit, please refer to <https://developer.nvidia.com/cuda-downloads>.`
add backbone, including basic features for compilation 2022-01-12 00:01:42 +08:00
			`### Installation`

			`1. Clone from github`

update READMe.md 2022-05-05 09:11:25 +08:00			```bash
update commands in CI/CD 2022-09-08 07:42:59 +08:00			`git clone --recursive https://github.com/drcut/CuPBoP`
update READMe.md 2022-05-05 09:11:25 +08:00			`cd CuPBoP`
			export CuPBoP_PATH=`pwd`
			`export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH`
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`export CUDA_PATH=/usr/local/cuda-11.7 # set to your own location`
update READMe.md 2022-05-05 09:11:25 +08:00			```
add backbone, including basic features for compilation 2022-01-12 00:01:42 +08:00
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`2. Build CuPBoP`
add backbone, including basic features for compilation 2022-01-12 00:01:42 +08:00
update READMe.md 2022-05-05 09:11:25 +08:00			```bash
			`mkdir build && cd build`
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`#set -DDEBUG=ON for debugging`
			`cmake .. \`
			-DLLVM_CONFIG_PATH=`which llvm-config` \
			`-DCUDA_PATH=$CUDA_PATH`
update READMe.md 2022-05-05 09:11:25 +08:00			`make`
			```
add backbone, including basic features for compilation 2022-01-12 00:01:42 +08:00
update README with make test 2022-09-26 09:34:01 +08:00			`3. (Optional) Use CuPBoP to execute Hetero-mark benchmark for verification`

			```bash
			`make test`
			```

add vecadd example and update README.md 2022-09-15 23:15:21 +08:00			`## Run Vector Addition example`
update READMe.md 2022-05-05 09:11:25 +08:00
update README with make test 2022-09-26 09:34:01 +08:00			`In this section, we provide an example of how to use CuPBoP to execute a CUDA program.`

update READMe.md 2022-05-05 09:11:25 +08:00			```bash
add vecadd example and update README.md 2022-09-15 23:15:21 +08:00			`cd examples/vecadd`
			`# Compile CUDA source code (both host and kernel) to bitcode files`
			`clang++ -std=c++11 vecadd.cu \`
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`-I../.. --cuda-path=$CUDA_PATH \`
			`--cuda-gpu-arch=sm_50 -L$CUDA_PATH/lib64 \`
add vecadd example and update README.md 2022-09-15 23:15:21 +08:00			`-lcudart_static -ldl -lrt -pthread -save-temps -v \|\| true`
			`# Apply compilation transformations on the kernel bitcode file`
			`$CuPBoP_PATH/build/compilation/kernelTranslator \`
			`vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc`
			`# Apply compilation transformations on the host bitcode file`
			`$CuPBoP_PATH/build/compilation/hostTranslator \`
			`vecadd-host-x86_64-unknown-linux-gnu.bc host.bc`
			`# Generate object files`
update READMe.md 2022-05-05 09:11:25 +08:00			`llc --relocation-model=pic --filetype=obj kernel.bc`
			`llc --relocation-model=pic --filetype=obj host.bc`
add vecadd example and update README.md 2022-09-15 23:15:21 +08:00			`# Link with runtime libraries and generate the executable file`
			`g++ -o vecadd -fPIC -no-pie \`
			`-I$CuPBoP_PATH/runtime/threadPool/include \`
			`-L$CuPBoP_PATH/build/runtime \`
			`-L$CuPBoP_PATH/build/runtime/threadPool \`
			`host.o kernel.o \`
update CMake to use official CUDA toolkit 2022-09-22 23:20:50 +08:00			`-I../.. -lc -lx86Runtime -lthreadPool -lpthread`
add vecadd example and update README.md 2022-09-15 23:15:21 +08:00			`# Execute`
			`./vecadd`
update READMe.md 2022-05-05 09:11:25 +08:00			```
update how to contribute 2022-05-07 04:08:28 +08:00
			`## How to contribute?`

update README.md 2022-05-07 04:15:17 +08:00			`Any kinds of contributions are welcome.`
			`Please refer to [Contribution.md](./CONTRIBUTING.md) for more detail.`
edit README.md 2022-09-08 07:21:14 +08:00
			`## Related publications`

update README 2022-09-23 02:53:32 +08:00			`If you want to refer CuPBoP in your projects, please cite the related`
			`papers:`

			`- [COX: Exposing CUDA Warp-Level Functions to CPUs](https://dl.acm.org/doi/abs/10.1145/3554736)`
			`- [CuPBoP: CUDA for Parallelized and Broad-range Processors](https://arxiv.org/abs/2206.07896)`
edit README.md 2022-09-08 07:21:14 +08:00
			`## Contributors`

			`- [Ruobing Han](https://drcut.github.io/)`
			`- Jun Chen`
			`- Bhanu Garg`
			`- Xule Zhou`
			`- John Lu`
			`- [Chihyo Ahn](https://upcp.ece.gatech.edu/2021/09/01/chihyo-ahn/)`
			`- Haotian Sheng`
			`- Blaise Tine`
			`- [Hyesoon Kim](https://faculty.cc.gatech.edu/~hyesoon/)`