Go to file

Ruobing Han c2222f2e39 update CMake to use official CUDA toolkit		2022-09-22 11:20:50 -04:00
.github/workflows	update CMake to use official CUDA toolkit	2022-09-22 11:20:50 -04:00
common	update HostTranslator with debug tools	2022-09-15 18:19:13 -04:00
compilation	fix bug for dynamic shared memory	2022-09-15 20:38:48 -04:00
examples	add static/dynamic shared memory example	2022-09-15 20:51:53 -04:00
external	add external party for lock-free queue	2022-09-07 19:23:51 -04:00
runtime	implement multistream APIs for CPU backend	2022-09-19 10:41:40 -04:00
.gitignore	fix linting issues	2022-05-24 20:43:47 -04:00
.gitmodules	add external party for lock-free queue	2022-09-07 19:23:51 -04:00
.pre-commit-config.yaml	add CI	2022-01-13 13:30:45 -05:00
CMakeLists.txt	update CMake to use official CUDA toolkit	2022-09-22 11:20:50 -04:00
CONTRIBUTING.md	update how to contribute	2022-05-06 16:08:28 -04:00
LICENSE	add backbone, including basic features for compilation	2022-01-11 11:01:42 -05:00
README.md	update CMake to use official CUDA toolkit	2022-09-22 11:20:50 -04:00

README.md

CuPBoP: Cuda for Parallelized and Broad-range Processors

Introduction

CuPBoP is a framework which support executing unmodified CUDA source code on non-NVIDIA devices. Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V. Supporting Vortex (a RISC-V GPU) is working in progress.

Install

Prerequisites

Linux system
LLVM 14.0.1
CUDA Toolkit

Although CuPBoP does not require NVIDIA GPUs, it needs CUDA to compile the source programs to NVVM/LLVM IRs. CUDA toolkit can be built on machines without NVIDIA GPUs. For building CUDA toolkit, please refer to https://developer.nvidia.com/cuda-downloads.

Installation

Clone from github

git clone --recursive https://github.com/drcut/CuPBoP
cd CuPBoP
export CuPBoP_PATH=`pwd`
export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
export CUDA_PATH=/usr/local/cuda-11.7 # set to your own location

Build CuPBoP

mkdir build && cd build
#set -DDEBUG=ON for debugging
cmake .. \
   -DLLVM_CONFIG_PATH=`which llvm-config` \
   -DCUDA_PATH=$CUDA_PATH
make

Run Vector Addition example

cd examples/vecadd
# Compile CUDA source code (both host and kernel) to bitcode files
clang++ -std=c++11 vecadd.cu \
      -I../.. --cuda-path=$CUDA_PATH \
      --cuda-gpu-arch=sm_50 -L$CUDA_PATH/lib64 \
      -lcudart_static -ldl -lrt -pthread -save-temps -v  || true
# Apply compilation transformations on the kernel bitcode file
$CuPBoP_PATH/build/compilation/kernelTranslator \
      vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
# Apply compilation transformations on the host bitcode file
$CuPBoP_PATH/build/compilation/hostTranslator \
      vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
# Generate object files
llc --relocation-model=pic --filetype=obj  kernel.bc
llc --relocation-model=pic --filetype=obj  host.bc
# Link with runtime libraries and generate the executable file
g++ -o vecadd -fPIC -no-pie \
      -I$CuPBoP_PATH/runtime/threadPool/include \
      -L$CuPBoP_PATH/build/runtime  \
      -L$CuPBoP_PATH/build/runtime/threadPool \
      host.o kernel.o \
      -I../.. -lc -lx86Runtime -lthreadPool -lpthread
# Execute
./vecadd

How to contribute?

Any kinds of contributions are welcome. Please refer to Contribution.md for more detail.

COX: Exposing CUDA Warp-Level Functions to CPUs ACM Transactions on Architecture and Code Optimization link
CuPBoP: CUDA for Parallelized and Broad-range Processors arxiv preprint link

Contributors

Ruobing Han
Jun Chen
Bhanu Garg
Xule Zhou
John Lu
Chihyo Ahn
Haotian Sheng
Blaise Tine
Hyesoon Kim