complete custom op readme

This commit is contained in:
zhouheng.zheng 2023-02-06 17:23:49 +08:00 committed by Sven
parent 5e7f5cecea
commit a21214f76a
1 changed files with 98 additions and 24 deletions

View File

@ -5,15 +5,15 @@
- [Layout Inference {todo}](#layout-inference-todo) - [Layout Inference {todo}](#layout-inference-todo)
- [**Customized opencl operator**](#customized-opencl-operator) - [**Customized opencl operator**](#customized-opencl-operator)
- [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple) - [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
- [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size) - [How to initialize](#how-to-determine-parameter-list-in-a-tuple) [custom operation](#extend-tim-vx-with-customized-operator)
- [How to complete custom operation functions](#how-to-determine-parameter-list-in-a-tuple)
- [Layout Inference {todo}](#layout-inference-todo-1) - [Layout Inference {todo}](#layout-inference-todo-1)
# Extend tim-vx with customized operator # Extend tim-vx with customized operator
tim-vx will provide two different approches supporting extend AI operators besides built-in ops. tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
* Compose new operation with builtin ops. example: RNNCell
* Compose new operation with builtin ops. example: RNNCell * Register opencl kernel as customized operator
* Register opencl kernel as customized operator
# User stories # User stories
As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks. As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
@ -22,7 +22,6 @@ As **application developer**, I want to **be able to create my own opeartor with
# Design overview # Design overview
![extend.tim-vx.operators](image/extend.tim-vx.operators.png) ![extend.tim-vx.operators](image/extend.tim-vx.operators.png)
* Green components implemented as a public API of tim-vx. * Green components implemented as a public API of tim-vx.
* Red components could be implemented outside of tim-vx. * Red components could be implemented outside of tim-vx.
* Gray components implemented as a private code inside tim-vx. * Gray components implemented as a private code inside tim-vx.
@ -58,7 +57,7 @@ Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built
1. OpenCL kernel stream as source code; 1. OpenCL kernel stream as source code;
2. Kernel enqueue configuration for global_work_size and local_work_size; 2. Kernel enqueue configuration for global_work_size and local_work_size;
3. Scalar parameter list defined as a std::tuple; 3. Scalar parameter list defined as a std::tuple;
3. Readable operator name; 4. Readable operator name;
TIM-VX provide two different approach to integrate user's operator: TIM-VX provide two different approach to integrate user's operator:
1. Build from source : build tim-vx source and user operators' implementation as single library; 1. Build from source : build tim-vx source and user operators' implementation as single library;
@ -96,27 +95,102 @@ __kernel void hswish_F32toF32(
float outputZP) float outputZP)
``` ```
C++ paramter list defined by user ### How to initialize custom operation
The custom operation class can be defined as:
```c++ ```c++
namespace user { CustomOpClass(Graph*graph, ParamTuple tuple_list, uint32_t input_num,uint32_t output_num)
class customized_hswish : public tim::vx::CustomizeOpBase { : CustomOpBase(graph, input_num, output_num, CustomOpClass::kernel_id_, CustomOpClass::kernel_name_,.../*any other parameter required by c++ code, not relevant to cl kernel**/){
using param_types = std::tuple<float/*inputScale*/, float/*inputTail*/, float/*outputScale*/, float/*outputZP*/>; tuple_list_.swap(tuple_list);
customized_hswish(std::shared_ptr<tim::vx::Graph> g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) { param_transform(tuple_list_, param_list_);
} kernel_resource_="...";
protected:
auto clone(std::shared_ptr<tim::vx::Graph> g) { ParamTuple tuple_list_;
return g->CreateOperation<user::customized_hswish>(g, this->params/*others*/); static const char* kernel_name_;
} static int32_t kernel_id_;
};
} }
``` ```
### How to config global_work_size and local_work_size
Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL; 1.ParamTuple tuple_list_: scalar parameters tuple list in CL kernel signature, we provide param_transform() function to transform tuple_list_ to param_list_.
2.uint32_t input_num/output_num: the number of kernel operation inputs/outputs.
3.static const char* kernel_name_: OpenCL kernel name defined by users, which is unique.
4.static int32_t kernel_id_:OpenCL kernel id is defined as
```c++
int32_t CustomOpClass::kernel_id_ = -1 * (++gobal_kernel_id_).
```
5.const char* kernel_resource_: OpenCL kernel registration should be defined in custom op class initialization function. It can contain multi functions adaptd to servel situations. For example:
```c++
kernel_resource_ = "__kernel void hswish_BF16toBF16(\n\
__read_only image2d_array_t input,\n\
__write_only image2d_array_t output,\n\
float beta\n\
)\n\
{\n\
/*kernel funtion resource*/\n\
}\n\
\n\
__kernel void hswish_BF32toF32(\n\
__read_only image2d_array_t input,\n\
__write_only image2d_array_t output,\n\
float inputScale,\n\
float inputTail,\n\
float outputScale,\n\
float outputZP\n\
)\n\
{\n\
/*kernel funtion resource*/ \n\
}\n\";
```
### How to complete custom operation functions
1.SetupShapeInfor: the function for output tensor size.
```c++
void SetupShapeInfor() override {
outputs_size_[0].push_back(...);
...
}
```
2.SetupParams: the function for kernel select and build option. The func_name_ is the selected function name provided by kernel_resource_, is used to determine which kernel function to be applied. build_option is the compiler options when compile custom op resource.
```c++
void SetupParams(
std::vector<tim::vx::DataType> input_types,
std::string& build_option) override {
if(...){
func_name_ = "..."/*it MUST provided in kernel_source_ */;
build_option = "..."/*compile paramters*/;
}else{
...
}
}
```
3.SetupEnqueue: the function for kernel local size and gobal size.
```c++
void SetupEnqueue(uint32_t& dim, std::vector<size_t>& global_size,
std::vector<size_t>& local_size) {
dim = .../*kernel dim*/;
local_size[0] = .../*kernel local size*/;
global_size[0] = .../*kernel global size*/;
}
```
local_size and global_size are similar features as **clEnqueueNDRangeKernel** in standard OpenCL.
Some tips for work_size: Some tips for work_size:
HWThreadCount = 4 HWThreadCount = 4
### Layout Inference {todo} 4.Clone: the function for operation clone.
```c++
std::shared_ptr<tim::vx::Operation> Clone(
std::shared_ptr<tim::vx::Graph>& graph) const override{
return graph->CreateOperation<user::custom_operation>(graph,this->params/*others*/);
}
```
### Layout Inference
so far we don't support this feature. User should take care of the layout transform carefully. so far we don't support this feature. User should take care of the layout transform carefully.
TODO: vsi will rework the framework so that any customized op can work properly in layout transform. TODO: vsi will rework the framework so that any customized op can work properly in layout transform.