Go to file
Ruobing Han 5f5a0e78d2 add acknowledge in README 2022-09-28 09:54:12 -04:00
.github/workflows add verbose for CTest 2022-09-25 13:18:47 -04:00
common update HostTranslator with debug tools 2022-09-15 18:19:13 -04:00
compilation avoid unnecessary extend arrays 2022-09-23 09:15:10 -04:00
examples add static/dynamic shared memory example 2022-09-15 20:51:53 -04:00
external add external party for lock-free queue 2022-09-07 19:23:51 -04:00
runtime implement multistream APIs for CPU backend 2022-09-19 10:41:40 -04:00
test update CMakelists 2022-09-25 21:22:05 -04:00
.gitignore fix linting issues 2022-05-24 20:43:47 -04:00
.gitmodules add CMake test 2022-09-23 13:08:28 -04:00
.pre-commit-config.yaml add CI 2022-01-13 13:30:45 -05:00
CMakeLists.txt change tests to test 2022-09-25 13:37:51 -04:00
CONTRIBUTING.md update how to contribute 2022-05-06 16:08:28 -04:00
LICENSE Create LICENSE 2022-09-23 09:28:00 -04:00
README.md add acknowledge in README 2022-09-28 09:54:12 -04:00

README.md

CuPBoP: Cuda for Parallelized and Broad-range Processors

Introduction

CuPBoP is a framework which support executing unmodified CUDA source code on non-NVIDIA devices. Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V. Supporting Vortex (a RISC-V GPU) is working in progress.

Install

Prerequisites

Although CuPBoP does not require NVIDIA GPUs, it needs CUDA to compile the source programs to NVVM/LLVM IRs. CUDA toolkit can be built on machines without NVIDIA GPUs. For building CUDA toolkit, please refer to https://developer.nvidia.com/cuda-downloads.

Installation

  1. Clone from github

    git clone --recursive https://github.com/drcut/CuPBoP
    cd CuPBoP
    export CuPBoP_PATH=`pwd`
    export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
    export CUDA_PATH=/usr/local/cuda-11.7 # set to your own location
    
  2. Build CuPBoP

    mkdir build && cd build
    #set -DDEBUG=ON for debugging
    cmake .. \
       -DLLVM_CONFIG_PATH=`which llvm-config` \
       -DCUDA_PATH=$CUDA_PATH
    make
    
  3. (Optional) Use CuPBoP to execute Hetero-mark benchmark for verification

    make test
    

Run Vector Addition example

In this section, we provide an example of how to use CuPBoP to execute a CUDA program.

cd examples/vecadd
# Compile CUDA source code (both host and kernel) to bitcode files
clang++ -std=c++11 vecadd.cu \
      -I../.. --cuda-path=$CUDA_PATH \
      --cuda-gpu-arch=sm_50 -L$CUDA_PATH/lib64 \
      -lcudart_static -ldl -lrt -pthread -save-temps -v  || true
# Apply compilation transformations on the kernel bitcode file
$CuPBoP_PATH/build/compilation/kernelTranslator \
      vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
# Apply compilation transformations on the host bitcode file
$CuPBoP_PATH/build/compilation/hostTranslator \
      vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
# Generate object files
llc --relocation-model=pic --filetype=obj  kernel.bc
llc --relocation-model=pic --filetype=obj  host.bc
# Link with runtime libraries and generate the executable file
g++ -o vecadd -fPIC -no-pie \
      -I$CuPBoP_PATH/runtime/threadPool/include \
      -L$CuPBoP_PATH/build/runtime  \
      -L$CuPBoP_PATH/build/runtime/threadPool \
      host.o kernel.o \
      -I../.. -lc -lx86Runtime -lthreadPool -lpthread
# Execute
./vecadd

How to contribute?

Any kinds of contributions are welcome. Please refer to Contribution.md for more detail.

If you want to refer CuPBoP in your projects, please cite the related papers:

Contributors

Acknowledgements

  • POCL is an open-source OpenCL implementations that based on LLVM. We reuse some code from it (e.g., apply optimizations, load/store LLVM IRs).
  • Hetero-Mark and Rodinia Benchmark are two benchmark suites for heterogeneous system computation. CuPBoP uses them as integrated test to verify the correctness.
  • [moodycamel::ConcurrentQueue] (https://github.com/cameron314/concurrentqueue/tree/master) is a fast multi-producer, multi-consumer lock-free concurrent queue for C++11. CuPBoP uses it as the task queue for launching and executing kernels.