Add customized operator document (#323)
Signed-off-by: ZhangXiang <Xiang.Zhang@verisilicon.com>
This commit is contained in:
parent
aaaeda1846
commit
6412bd4ea5
|
|
@ -28,6 +28,9 @@ Feel free to raise a github issue if you wish to add TIM-VX for other frameworks
|
|||
|
||||

|
||||
|
||||
|
||||
## Technical documents
|
||||
* [Add customized operator](docs/customized_op.md)
|
||||
# Get started
|
||||
|
||||
## Build and Run
|
||||
|
|
|
|||
|
|
@ -0,0 +1,122 @@
|
|||
- [Extend tim-vx with customized operator](#extend-tim-vx-with-customized-operator)
|
||||
- [User stories](#user-stories)
|
||||
- [Design overview](#design-overview)
|
||||
- [**Composed operator**](#composed-operator)
|
||||
- [Layout Inference {todo}](#layout-inference-todo)
|
||||
- [**Customized opencl operator**](#customized-opencl-operator)
|
||||
- [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
|
||||
- [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size)
|
||||
- [Layout Inference {todo}](#layout-inference-todo-1)
|
||||
|
||||
# Extend tim-vx with customized operator
|
||||
|
||||
tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
|
||||
|
||||
* Compose new operation with builtin ops. example: RNNCell
|
||||
* Register opencl kernel as customized operator
|
||||
|
||||
# User stories
|
||||
As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
|
||||
|
||||
As **application developer**, I want to **be able to create my own opeartor with standard opencl kernel**, so that I can **support novel operators not presented in tim-vx**.
|
||||
|
||||
# Design overview
|
||||

|
||||
|
||||
* Green components implemented as a public API of tim-vx.
|
||||
* Red components could be implemented outside of tim-vx.
|
||||
* Gray components implemented as a private code inside tim-vx.
|
||||
|
||||
## **Composed operator**
|
||||
|
||||
If some operator can be composed by built-in operators, such as RNNCell which actually built from FullyConnected, Tanh, and DataConvert Layers,
|
||||
developer can add their own operator implementation before VSI introduce high-performance built-in ops.
|
||||
|
||||
[Implementation reference of RNNCell](https://github.com/VeriSilicon/TIM-VX/blob/main/src/tim/vx/ops/rnn_cell.cc)
|
||||
|
||||
**Keynotes for RNNCell**:
|
||||
|
||||
In the constructor of RNNCellImpl, internal operators - fc/tanh/dataconvert - will be created without inner connection.
|
||||
The inner connection build up inside bindInput() and bindOutput();
|
||||
|
||||
### Layout Inference {todo}
|
||||
|
||||
Inside of composed operator, it actually is a subgraph of tim-vx's built-in operatos, it should be easy to extend the original layout inference for build-in operators to composed operator - just do layout inference inside the subgraph.
|
||||
|
||||
```c++
|
||||
void ComposedOp::OnInputs(std::vector<std::shared_ptr<vx::Tensor> next_tensor) {
|
||||
for(auto op: op_->OpsInSubgraph()) {
|
||||
auto Cloned = handleLayoutInference(new_graph, op);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## **Customized opencl operator**
|
||||
|
||||
Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built-in infrastructure, user can inject their operator with :
|
||||
|
||||
1. OpenCL kernel stream as source code;
|
||||
2. Kernel enqueue configuration for global_work_size and local_work_size;
|
||||
3. Scalar parameter list defined as a std::tuple;
|
||||
3. Readable operator name;
|
||||
|
||||
TIM-VX provide two different approach to integrate user's operator:
|
||||
1. Build from source : build tim-vx source and user operators' implementation as single library;
|
||||
2. Build from sdk: tim-vx prebuilt as a standalone library and a set of standard headers; user build operator implementation and link with tim-vx;
|
||||
|
||||
From tim-vx api view, the customized operator registed at graph-level, the registration automatically effected at the first time to create instance of the customized operator. With this approcah, user can override built-in operator or support new operator in a new model easily.
|
||||
|
||||
```c++
|
||||
void CreateGraphWithCustomizedOperator() {
|
||||
// create context/graph/tensor as before.
|
||||
auto conv = graph->CreateOperation<tim::vx::Conv2d>(...);
|
||||
auto post_detect = graph->CreateOperation<3rd_party::DetectionPostProcess>(...);
|
||||
post_detect.BindInput(...);
|
||||
post_detect.BindOutput(...);
|
||||
|
||||
graph->Compile();
|
||||
}
|
||||
```
|
||||
|
||||
### How to determine parameter list in a tuple
|
||||
Usually, kernel take two different kinds of paramter: "tensor-like" and scalar; The tensor-like parameters usually is the output-tensor from other operators or input for other operator.
|
||||
In the operator's paramter list, only scalar parameters should be defined. "tensor-like" operand should provied by bindInput/bindOutput.
|
||||
|
||||
The scalar paramters **MUST** provided at kernel registration.
|
||||
|
||||
Take following hswish as example:
|
||||
CL kernel signature:
|
||||
```cl
|
||||
__kernel void hswish_F32toF32(
|
||||
__read_only image2d_array_t input,
|
||||
__write_only image2d_array_t output,
|
||||
float inputScale,
|
||||
float inputTail,
|
||||
float outputScale,
|
||||
float outputZP)
|
||||
```
|
||||
|
||||
C++ paramter list defined by user
|
||||
```c++
|
||||
namespace user {
|
||||
class customized_hswish : public tim::vx::CustomizeOpBase {
|
||||
using param_types = std::tuple<float/*inputScale*/, float/*inputTail*/, float/*outputScale*/, float/*outputZP*/>;
|
||||
customized_hswish(std::shared_ptr<tim::vx::Graph> g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) {
|
||||
}
|
||||
|
||||
auto clone(std::shared_ptr<tim::vx::Graph> g) {
|
||||
return g->CreateOperation<user::customized_hswish>(g, this->params/*others*/);
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
### How to config global_work_size and local_work_size
|
||||
|
||||
Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL;
|
||||
|
||||
Some tips for work_size:
|
||||
HWThreadCount = 4
|
||||
|
||||
### Layout Inference {todo}
|
||||
so far we don't support this feature. User should take care of the layout transform carefully.
|
||||
TODO: vsi will rework the framework so that any customized op can work properly in layout transform.
|
||||
|
|
@ -0,0 +1,92 @@
|
|||
<diagram program="umletino" version="14.4.0-SNAPSHOT"><zoom_level>10</zoom_level><help_text>Space for diagram notes</help_text><element><id>UMLClass</id><coordinates><x>190</x><y>69</y><w>310</w><h>60</h></coordinates><panel_attributes><<interface>>
|
||||
tim::vx::Operation
|
||||
bg=green
|
||||
--
|
||||
BindInput(tensor: const shared_ptr<Tensor>): Operation&
|
||||
</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>590</x><y>69</y><w>130</w><h>60</h></coordinates><panel_attributes><<interface>>
|
||||
OpImpl
|
||||
bg=gray
|
||||
--
|
||||
graph_ : GraphImpl*
|
||||
</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>70</x><y>209</y><w>320</w><h>60</h></coordinates><panel_attributes>tim::vx::DirectMapOp
|
||||
bg=green
|
||||
--
|
||||
DirectMapOp(graph: Graph*, kind: uint32_t, in_cnt: int =0,
|
||||
out_cnt: int=0, layout : DataLayout = DataLayout::ANY)
|
||||
</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>220</x><y>118</y><w>30</w><h>110</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>10;10;10;90</additional_attributes></element><element><id>Relation</id><coordinates><x>490</x><y>78</y><w>120</w><h>50</h></coordinates><panel_attributes>lt=<<<<<-
|
||||
m1=1
|
||||
m2=1
|
||||
</panel_attributes><additional_attributes>10;20;100;20</additional_attributes></element><element><id>UMLClass</id><coordinates><x>0</x><y>324</y><w>180</w><h>100</h></coordinates><panel_attributes>vx.builtin.ops
|
||||
bg=green
|
||||
--
|
||||
{innerclass
|
||||
Conv1d
|
||||
innerclass}
|
||||
{innerclass
|
||||
Conv2d
|
||||
innerclass}
|
||||
{innerclass
|
||||
{innerclass
|
||||
Conv3d
|
||||
innerclass}</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>80</x><y>258</y><w>140</w><h>80</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>120;10;120;40;10;40;10;60</additional_attributes></element><element><id>UMLClass</id><coordinates><x>670</x><y>209</y><w>180</w><h>60</h></coordinates><panel_attributes>DirectMapOpImpl
|
||||
"VxBuiltInOpImpl"
|
||||
bg=gray
|
||||
--
|
||||
node_: vsi_nn_node_t* = nullptr
|
||||
</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>640</x><y>123</y><w>140</w><h>100</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>10;10;120;80</additional_attributes></element><element><id>UMLClass</id><coordinates><x>476</x><y>308</y><w>170</w><h>60</h></coordinates><panel_attributes>RNNCellImpl
|
||||
bg=gray
|
||||
--
|
||||
fc0_ : shared_ptr<Operation>
|
||||
fc1_ : sahred_ptr<Operation></panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>246</x><y>308</y><w>160</w><h>60</h></coordinates><panel_attributes>RNNCell
|
||||
bg=green
|
||||
--
|
||||
RNNCell(graph: Graph,
|
||||
act : ActivationType)</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>336</x><y>118</y><w>30</w><h>210</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>10;10;10;190</additional_attributes></element><element><id>Relation</id><coordinates><x>376</x><y>238</y><w>310</w><h>40</h></coordinates><panel_attributes>lt=<<<<<-</panel_attributes><additional_attributes>10;10;290;11</additional_attributes></element><element><id>Relation</id><coordinates><x>396</x><y>318</y><w>100</w><h>30</h></coordinates><panel_attributes>lt=<<<<<-</panel_attributes><additional_attributes>10;10;80;10</additional_attributes></element><element><id>Relation</id><coordinates><x>210</x><y>514</y><w>150</w><h>120</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>127;10;10;50;10;100</additional_attributes></element><element><id>Relation</id><coordinates><x>421</x><y>123</y><w>40</w><h>310</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>20;10;16;290</additional_attributes></element><element><id>UMLClass</id><coordinates><x>66</x><y>618</y><w>310</w><h>110</h></coordinates><panel_attributes>CustomizedOp
|
||||
bg=red
|
||||
{innerclass
|
||||
ParamType = tuple<int ...>
|
||||
--
|
||||
innerclass}
|
||||
kernel_str_ : const char*
|
||||
kernel_init_fp_: vx_kernel_initialize_f
|
||||
kernel_name_ : const char*
|
||||
params_ : ParamType
|
||||
|
||||
</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>440</x><y>609</y><w>280</w><h>120</h></coordinates><panel_attributes>Access Parameter
|
||||
{innerclass
|
||||
PackedParam
|
||||
bg=green
|
||||
--
|
||||
type: enum {FLOAT, INT32, ...}
|
||||
data: unin{float, int32, pointer}
|
||||
innerclass}
|
||||
|
||||
{innerclass
|
||||
at(p: const T&, offset: int): Param
|
||||
innerclass}
|
||||
|
||||
|
||||
|
||||
</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>741</x><y>263</y><w>40</w><h>160</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>20;10;16;140</additional_attributes></element><element><id>Relation</id><coordinates><x>541</x><y>123</y><w>120</w><h>210</h></coordinates><panel_attributes>lt=<<-</panel_attributes><additional_attributes>100;10;20;60;16;190</additional_attributes></element><element><id>Relation</id><coordinates><x>430</x><y>510</y><w>160</w><h>120</h></coordinates><panel_attributes>lt=<.
|
||||
m2=use</panel_attributes><additional_attributes>140;100;140;50;10;50;10;10</additional_attributes></element><element><id>UMLClass</id><coordinates><x>257</x><y>409</y><w>230</w><h>110</h></coordinates><panel_attributes>CostomOpBase
|
||||
bg=green
|
||||
--
|
||||
Setup(in: vec<tim::vx::tensor>,
|
||||
out: vec<tim::vx::tensor>): bool
|
||||
--
|
||||
get(offset: int):PackedParam
|
||||
next(): bool
|
||||
size(): size_t</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>476</x><y>458</y><w>180</w><h>40</h></coordinates><panel_attributes>lt=<<<<<-</panel_attributes><additional_attributes>10;10;160;11</additional_attributes></element><element><id>UMLClass</id><coordinates><x>640</x><y>409</y><w>270</w><h>130</h></coordinates><panel_attributes>CustomOpBaseImpl
|
||||
bg=gray
|
||||
--
|
||||
name_: char*
|
||||
--
|
||||
_op_setup(self: vsi_nn_node_t*,_
|
||||
_ inputs: vsi_nn_tensor_t**,_
|
||||
_ outputs: vsi_nn_tensor_t**): vsi_bool_
|
||||
_op_compute(...)_
|
||||
--
|
||||
proc_: vsi_nn_op_proc_t*
|
||||
|
||||
|
||||
</panel_attributes><additional_attributes></additional_attributes></element></diagram>
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 749 KiB |
Loading…
Reference in New Issue