Go to file
Ruobing Han f2a4f7fe64 update HostTranslator with debug tools 2022-09-15 18:19:13 -04:00
.github/workflows update CI/CD 2022-09-07 19:50:49 -04:00
common update HostTranslator with debug tools 2022-09-15 18:19:13 -04:00
compilation update HostTranslator with debug tools 2022-09-15 18:19:13 -04:00
examples/vecadd remove useless examples 2022-09-15 11:31:58 -04:00
external add external party for lock-free queue 2022-09-07 19:23:51 -04:00
runtime fix bug for segfault if without cudaSetDevice 2022-09-15 11:10:44 -04:00
.gitignore fix linting issues 2022-05-24 20:43:47 -04:00
.gitmodules add external party for lock-free queue 2022-09-07 19:23:51 -04:00
.pre-commit-config.yaml add CI 2022-01-13 13:30:45 -05:00
CMakeLists.txt update compilation with DEBUG mode 2022-09-15 12:33:28 -04:00
CONTRIBUTING.md update how to contribute 2022-05-06 16:08:28 -04:00
LICENSE add backbone, including basic features for compilation 2022-01-11 11:01:42 -05:00
README.md add vecadd example and update README.md 2022-09-15 11:15:21 -04:00

README.md

CuPBoP: Cuda for Parallelized and Broad-range Processors

Introduction

CuPBoP is a framework which support executing unmodified CUDA source code on non-NVIDIA devices. Currently, CuPBoP support serveral CPU backends, including x86, AArch64, and RISC-V. Supporting the RISC-V GPU Vortex is working in progress.

Install

Prerequisites

Installation

  1. Clone from github

    git clone --recursive https://github.com/drcut/CuPBoP
    cd CuPBoP
    export CuPBoP_PATH=`pwd`
    export LD_LIBRARY_PATH=$CuPBoP_PATH/build/runtime:$CuPBoP_PATH/build/runtime/threadPool:$LD_LIBRARY_PATH
    
  2. As CuPBoP relies on CUDA structures, we need to download the CUDA header file

    wget https://www.dropbox.com/s/r18io0zu3idke5p/cuda-header.tar.gz?dl=1
    tar -xzf 'cuda-header.tar.gz?dl=1'
    cp -r include/* runtime/threadPool/include/
    
  3. Other CUDA files are also required for compiling CUDA source code to LLVM IR

    wget https://www.dropbox.com/s/4pckqsjnl920gpn/cuda-10.1.tar.gz?dl=1
    tar -xzf 'cuda-10.1.tar.gz?dl=1'
    
  4. Build CuPBoP

    mkdir build && cd build
    cmake .. -DLLVM_CONFIG_PATH=`which llvm-config` # need path to llvm-config
    make
    

Run Vector Addition example

cd examples/vecadd
# Compile CUDA source code (both host and kernel) to bitcode files
clang++ -std=c++11 vecadd.cu \
      -I../.. --cuda-path=$CuPBoP_PATH/cuda-10.1 \
      --cuda-gpu-arch=sm_50 -L$CuPBoP_PATH/cuda-10.1/lib64 \
      -lcudart_static -ldl -lrt -pthread -save-temps -v  || true
# Apply compilation transformations on the kernel bitcode file
$CuPBoP_PATH/build/compilation/kernelTranslator \
      vecadd-cuda-nvptx64-nvidia-cuda-sm_50.bc kernel.bc
# Apply compilation transformations on the host bitcode file
$CuPBoP_PATH/build/compilation/hostTranslator \
      vecadd-host-x86_64-unknown-linux-gnu.bc host.bc
# Generate object files
llc --relocation-model=pic --filetype=obj  kernel.bc
llc --relocation-model=pic --filetype=obj  host.bc
# Link with runtime libraries and generate the executable file
g++ -o vecadd -fPIC -no-pie \
      -I$CuPBoP_PATH/runtime/threadPool/include \
      -L$CuPBoP_PATH/build/runtime  \
      -L$CuPBoP_PATH/build/runtime/threadPool \
      host.o kernel.o \
      -I../.. -lpthread -lc -lx86Runtime -lthreadPool
# Execute
./vecadd

How to contribute?

Any kinds of contributions are welcome. Please refer to Contribution.md for more detail.

  • COX: Exposing CUDA Warp-Level Functions to CPUs ACM Transactions on Architecture and Code Optimization link
  • CuPBoP: CUDA for Parallelized and Broad-range Processors arxiv preprint link

Contributors