complete custom op readme
This commit is contained in:
parent
5e7f5cecea
commit
a21214f76a
|
|
@ -5,15 +5,15 @@
|
||||||
- [Layout Inference {todo}](#layout-inference-todo)
|
- [Layout Inference {todo}](#layout-inference-todo)
|
||||||
- [**Customized opencl operator**](#customized-opencl-operator)
|
- [**Customized opencl operator**](#customized-opencl-operator)
|
||||||
- [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
|
- [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
|
||||||
- [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size)
|
- [How to initialize](#how-to-determine-parameter-list-in-a-tuple) [custom operation](#extend-tim-vx-with-customized-operator)
|
||||||
|
- [How to complete custom operation functions](#how-to-determine-parameter-list-in-a-tuple)
|
||||||
- [Layout Inference {todo}](#layout-inference-todo-1)
|
- [Layout Inference {todo}](#layout-inference-todo-1)
|
||||||
|
|
||||||
# Extend tim-vx with customized operator
|
# Extend tim-vx with customized operator
|
||||||
|
|
||||||
tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
|
tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
|
||||||
|
* Compose new operation with builtin ops. example: RNNCell
|
||||||
* Compose new operation with builtin ops. example: RNNCell
|
* Register opencl kernel as customized operator
|
||||||
* Register opencl kernel as customized operator
|
|
||||||
|
|
||||||
# User stories
|
# User stories
|
||||||
As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
|
As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
|
||||||
|
|
@ -22,7 +22,6 @@ As **application developer**, I want to **be able to create my own opeartor with
|
||||||
|
|
||||||
# Design overview
|
# Design overview
|
||||||

|

|
||||||
|
|
||||||
* Green components implemented as a public API of tim-vx.
|
* Green components implemented as a public API of tim-vx.
|
||||||
* Red components could be implemented outside of tim-vx.
|
* Red components could be implemented outside of tim-vx.
|
||||||
* Gray components implemented as a private code inside tim-vx.
|
* Gray components implemented as a private code inside tim-vx.
|
||||||
|
|
@ -58,7 +57,7 @@ Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built
|
||||||
1. OpenCL kernel stream as source code;
|
1. OpenCL kernel stream as source code;
|
||||||
2. Kernel enqueue configuration for global_work_size and local_work_size;
|
2. Kernel enqueue configuration for global_work_size and local_work_size;
|
||||||
3. Scalar parameter list defined as a std::tuple;
|
3. Scalar parameter list defined as a std::tuple;
|
||||||
3. Readable operator name;
|
4. Readable operator name;
|
||||||
|
|
||||||
TIM-VX provide two different approach to integrate user's operator:
|
TIM-VX provide two different approach to integrate user's operator:
|
||||||
1. Build from source : build tim-vx source and user operators' implementation as single library;
|
1. Build from source : build tim-vx source and user operators' implementation as single library;
|
||||||
|
|
@ -73,14 +72,14 @@ void CreateGraphWithCustomizedOperator() {
|
||||||
auto post_detect = graph->CreateOperation<3rd_party::DetectionPostProcess>(...);
|
auto post_detect = graph->CreateOperation<3rd_party::DetectionPostProcess>(...);
|
||||||
post_detect.BindInput(...);
|
post_detect.BindInput(...);
|
||||||
post_detect.BindOutput(...);
|
post_detect.BindOutput(...);
|
||||||
|
|
||||||
graph->Compile();
|
graph->Compile();
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### How to determine parameter list in a tuple
|
### How to determine parameter list in a tuple
|
||||||
Usually, kernel take two different kinds of paramter: "tensor-like" and scalar; The tensor-like parameters usually is the output-tensor from other operators or input for other operator.
|
Usually, kernel take two different kinds of paramter: "tensor-like" and scalar; The tensor-like parameters usually is the output-tensor from other operators or input for other operator.
|
||||||
In the operator's paramter list, only scalar parameters should be defined. "tensor-like" operand should provied by bindInput/bindOutput.
|
In the operator's paramter list, only scalar parameters should be defined. "tensor-like" operand should provied by bindInput/bindOutput.
|
||||||
|
|
||||||
The scalar paramters **MUST** provided at kernel registration.
|
The scalar paramters **MUST** provided at kernel registration.
|
||||||
|
|
||||||
|
|
@ -96,27 +95,102 @@ __kernel void hswish_F32toF32(
|
||||||
float outputZP)
|
float outputZP)
|
||||||
```
|
```
|
||||||
|
|
||||||
C++ paramter list defined by user
|
### How to initialize custom operation
|
||||||
|
The custom operation class can be defined as:
|
||||||
```c++
|
```c++
|
||||||
namespace user {
|
CustomOpClass(Graph*graph, ParamTuple tuple_list, uint32_t input_num,uint32_t output_num)
|
||||||
class customized_hswish : public tim::vx::CustomizeOpBase {
|
: CustomOpBase(graph, input_num, output_num, CustomOpClass::kernel_id_, CustomOpClass::kernel_name_,.../*any other parameter required by c++ code, not relevant to cl kernel**/){
|
||||||
using param_types = std::tuple<float/*inputScale*/, float/*inputTail*/, float/*outputScale*/, float/*outputZP*/>;
|
tuple_list_.swap(tuple_list);
|
||||||
customized_hswish(std::shared_ptr<tim::vx::Graph> g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) {
|
param_transform(tuple_list_, param_list_);
|
||||||
}
|
kernel_resource_="...";
|
||||||
|
protected:
|
||||||
auto clone(std::shared_ptr<tim::vx::Graph> g) {
|
ParamTuple tuple_list_;
|
||||||
return g->CreateOperation<user::customized_hswish>(g, this->params/*others*/);
|
static const char* kernel_name_;
|
||||||
}
|
static int32_t kernel_id_;
|
||||||
};
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
### How to config global_work_size and local_work_size
|
|
||||||
|
|
||||||
Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL;
|
1.ParamTuple tuple_list_: scalar parameters tuple list in CL kernel signature, we provide param_transform() function to transform tuple_list_ to param_list_.
|
||||||
|
2.uint32_t input_num/output_num: the number of kernel operation inputs/outputs.
|
||||||
|
3.static const char* kernel_name_: OpenCL kernel name defined by users, which is unique.
|
||||||
|
4.static int32_t kernel_id_:OpenCL kernel id is defined as
|
||||||
|
|
||||||
|
```c++
|
||||||
|
int32_t CustomOpClass::kernel_id_ = -1 * (++gobal_kernel_id_).
|
||||||
|
```
|
||||||
|
|
||||||
|
5.const char* kernel_resource_: OpenCL kernel registration should be defined in custom op class initialization function. It can contain multi functions adaptd to servel situations. For example:
|
||||||
|
|
||||||
|
```c++
|
||||||
|
kernel_resource_ = "__kernel void hswish_BF16toBF16(\n\
|
||||||
|
__read_only image2d_array_t input,\n\
|
||||||
|
__write_only image2d_array_t output,\n\
|
||||||
|
float beta\n\
|
||||||
|
)\n\
|
||||||
|
{\n\
|
||||||
|
/*kernel funtion resource*/\n\
|
||||||
|
}\n\
|
||||||
|
\n\
|
||||||
|
__kernel void hswish_BF32toF32(\n\
|
||||||
|
__read_only image2d_array_t input,\n\
|
||||||
|
__write_only image2d_array_t output,\n\
|
||||||
|
float inputScale,\n\
|
||||||
|
float inputTail,\n\
|
||||||
|
float outputScale,\n\
|
||||||
|
float outputZP\n\
|
||||||
|
)\n\
|
||||||
|
{\n\
|
||||||
|
/*kernel funtion resource*/ \n\
|
||||||
|
}\n\";
|
||||||
|
```
|
||||||
|
|
||||||
|
### How to complete custom operation functions
|
||||||
|
|
||||||
|
1.SetupShapeInfor: the function for output tensor size.
|
||||||
|
```c++
|
||||||
|
void SetupShapeInfor() override {
|
||||||
|
outputs_size_[0].push_back(...);
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2.SetupParams: the function for kernel select and build option. The func_name_ is the selected function name provided by kernel_resource_, is used to determine which kernel function to be applied. build_option is the compiler options when compile custom op resource.
|
||||||
|
```c++
|
||||||
|
void SetupParams(
|
||||||
|
std::vector<tim::vx::DataType> input_types,
|
||||||
|
std::string& build_option) override {
|
||||||
|
if(...){
|
||||||
|
func_name_ = "..."/*it MUST provided in kernel_source_ */;
|
||||||
|
build_option = "..."/*compile paramters*/;
|
||||||
|
}else{
|
||||||
|
...
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3.SetupEnqueue: the function for kernel local size and gobal size.
|
||||||
|
```c++
|
||||||
|
void SetupEnqueue(uint32_t& dim, std::vector<size_t>& global_size,
|
||||||
|
std::vector<size_t>& local_size) {
|
||||||
|
dim = .../*kernel dim*/;
|
||||||
|
local_size[0] = .../*kernel local size*/;
|
||||||
|
global_size[0] = .../*kernel global size*/;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
local_size and global_size are similar features as **clEnqueueNDRangeKernel** in standard OpenCL.
|
||||||
|
|
||||||
Some tips for work_size:
|
Some tips for work_size:
|
||||||
HWThreadCount = 4
|
HWThreadCount = 4
|
||||||
|
|
||||||
### Layout Inference {todo}
|
4.Clone: the function for operation clone.
|
||||||
|
```c++
|
||||||
|
std::shared_ptr<tim::vx::Operation> Clone(
|
||||||
|
std::shared_ptr<tim::vx::Graph>& graph) const override{
|
||||||
|
return graph->CreateOperation<user::custom_operation>(graph,this->params/*others*/);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Layout Inference
|
||||||
|
|
||||||
so far we don't support this feature. User should take care of the layout transform carefully.
|
so far we don't support this feature. User should take care of the layout transform carefully.
|
||||||
TODO: vsi will rework the framework so that any customized op can work properly in layout transform.
|
TODO: vsi will rework the framework so that any customized op can work properly in layout transform.
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue