Add customized operator document (#323)

Signed-off-by: ZhangXiang <Xiang.Zhang@verisilicon.com>
This commit is contained in:
Sven 2022-03-18 10:08:27 +08:00 committed by GitHub
parent aaaeda1846
commit 6412bd4ea5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 217 additions and 0 deletions

View File

@ -28,6 +28,9 @@ Feel free to raise a github issue if you wish to add TIM-VX for other frameworks
![TIM-VX Architecture](docs/image/timvx_overview.svg)
## Technical documents
* [Add customized operator](docs/customized_op.md)
# Get started
## Build and Run

122
docs/customized_op.md Normal file
View File

@ -0,0 +1,122 @@
- [Extend tim-vx with customized operator](#extend-tim-vx-with-customized-operator)
- [User stories](#user-stories)
- [Design overview](#design-overview)
- [**Composed operator**](#composed-operator)
- [Layout Inference {todo}](#layout-inference-todo)
- [**Customized opencl operator**](#customized-opencl-operator)
- [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
- [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size)
- [Layout Inference {todo}](#layout-inference-todo-1)
# Extend tim-vx with customized operator
tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
* Compose new operation with builtin ops. example: RNNCell
* Register opencl kernel as customized operator
# User stories
As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
As **application developer**, I want to **be able to create my own opeartor with standard opencl kernel**, so that I can **support novel operators not presented in tim-vx**.
# Design overview
![extend.tim-vx.operators](image/extend.tim-vx.operators.png)
* Green components implemented as a public API of tim-vx.
* Red components could be implemented outside of tim-vx.
* Gray components implemented as a private code inside tim-vx.
## **Composed operator**
If some operator can be composed by built-in operators, such as RNNCell which actually built from FullyConnected, Tanh, and DataConvert Layers,
developer can add their own operator implementation before VSI introduce high-performance built-in ops.
[Implementation reference of RNNCell](https://github.com/VeriSilicon/TIM-VX/blob/main/src/tim/vx/ops/rnn_cell.cc)
**Keynotes for RNNCell**:
In the constructor of RNNCellImpl, internal operators - fc/tanh/dataconvert - will be created without inner connection.
The inner connection build up inside bindInput() and bindOutput();
### Layout Inference {todo}
Inside of composed operator, it actually is a subgraph of tim-vx's built-in operatos, it should be easy to extend the original layout inference for build-in operators to composed operator - just do layout inference inside the subgraph.
```c++
void ComposedOp::OnInputs(std::vector<std::shared_ptr<vx::Tensor> next_tensor) {
for(auto op: op_->OpsInSubgraph()) {
auto Cloned = handleLayoutInference(new_graph, op);
}
}
```
## **Customized opencl operator**
Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built-in infrastructure, user can inject their operator with :
1. OpenCL kernel stream as source code;
2. Kernel enqueue configuration for global_work_size and local_work_size;
3. Scalar parameter list defined as a std::tuple;
3. Readable operator name;
TIM-VX provide two different approach to integrate user's operator:
1. Build from source : build tim-vx source and user operators' implementation as single library;
2. Build from sdk: tim-vx prebuilt as a standalone library and a set of standard headers; user build operator implementation and link with tim-vx;
From tim-vx api view, the customized operator registed at graph-level, the registration automatically effected at the first time to create instance of the customized operator. With this approcah, user can override built-in operator or support new operator in a new model easily.
```c++
void CreateGraphWithCustomizedOperator() {
// create context/graph/tensor as before.
auto conv = graph->CreateOperation<tim::vx::Conv2d>(...);
auto post_detect = graph->CreateOperation<3rd_party::DetectionPostProcess>(...);
post_detect.BindInput(...);
post_detect.BindOutput(...);
graph->Compile();
}
```
### How to determine parameter list in a tuple
Usually, kernel take two different kinds of paramter: "tensor-like" and scalar; The tensor-like parameters usually is the output-tensor from other operators or input for other operator.
In the operator's paramter list, only scalar parameters should be defined. "tensor-like" operand should provied by bindInput/bindOutput.
The scalar paramters **MUST** provided at kernel registration.
Take following hswish as example:
CL kernel signature:
```cl
__kernel void hswish_F32toF32(
__read_only image2d_array_t input,
__write_only image2d_array_t output,
float inputScale,
float inputTail,
float outputScale,
float outputZP)
```
C++ paramter list defined by user
```c++
namespace user {
class customized_hswish : public tim::vx::CustomizeOpBase {
using param_types = std::tuple<float/*inputScale*/, float/*inputTail*/, float/*outputScale*/, float/*outputZP*/>;
customized_hswish(std::shared_ptr<tim::vx::Graph> g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) {
}
auto clone(std::shared_ptr<tim::vx::Graph> g) {
return g->CreateOperation<user::customized_hswish>(g, this->params/*others*/);
}
};
}
```
### How to config global_work_size and local_work_size
Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL;
Some tips for work_size:
HWThreadCount = 4
### Layout Inference {todo}
so far we don't support this feature. User should take care of the layout transform carefully.
TODO: vsi will rework the framework so that any customized op can work properly in layout transform.

View File

@ -0,0 +1,92 @@
<diagram program="umletino" version="14.4.0-SNAPSHOT"><zoom_level>10</zoom_level><help_text>Space for diagram notes</help_text><element><id>UMLClass</id><coordinates><x>190</x><y>69</y><w>310</w><h>60</h></coordinates><panel_attributes>&lt;&lt;interface&gt;&gt;
tim::vx::Operation
bg=green
--
BindInput(tensor: const shared_ptr&lt;Tensor&gt;): Operation&amp;
</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>590</x><y>69</y><w>130</w><h>60</h></coordinates><panel_attributes>&lt;&lt;interface&gt;&gt;
OpImpl
bg=gray
--
graph_ : GraphImpl*
</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>70</x><y>209</y><w>320</w><h>60</h></coordinates><panel_attributes>tim::vx::DirectMapOp
bg=green
--
DirectMapOp(graph: Graph*, kind: uint32_t, in_cnt: int =0,
out_cnt: int=0, layout : DataLayout = DataLayout::ANY)
</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>220</x><y>118</y><w>30</w><h>110</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>10;10;10;90</additional_attributes></element><element><id>Relation</id><coordinates><x>490</x><y>78</y><w>120</w><h>50</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-
m1=1
m2=1
</panel_attributes><additional_attributes>10;20;100;20</additional_attributes></element><element><id>UMLClass</id><coordinates><x>0</x><y>324</y><w>180</w><h>100</h></coordinates><panel_attributes>vx.builtin.ops
bg=green
--
{innerclass
Conv1d
innerclass}
{innerclass
Conv2d
innerclass}
{innerclass
{innerclass
Conv3d
innerclass}</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>80</x><y>258</y><w>140</w><h>80</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>120;10;120;40;10;40;10;60</additional_attributes></element><element><id>UMLClass</id><coordinates><x>670</x><y>209</y><w>180</w><h>60</h></coordinates><panel_attributes>DirectMapOpImpl
"VxBuiltInOpImpl"
bg=gray
--
node_: vsi_nn_node_t* = nullptr
</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>640</x><y>123</y><w>140</w><h>100</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>10;10;120;80</additional_attributes></element><element><id>UMLClass</id><coordinates><x>476</x><y>308</y><w>170</w><h>60</h></coordinates><panel_attributes>RNNCellImpl
bg=gray
--
fc0_ : shared_ptr&lt;Operation&gt;
fc1_ : sahred_ptr&lt;Operation&gt;</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>246</x><y>308</y><w>160</w><h>60</h></coordinates><panel_attributes>RNNCell
bg=green
--
RNNCell(graph: Graph,
act : ActivationType)</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>336</x><y>118</y><w>30</w><h>210</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>10;10;10;190</additional_attributes></element><element><id>Relation</id><coordinates><x>376</x><y>238</y><w>310</w><h>40</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-</panel_attributes><additional_attributes>10;10;290;11</additional_attributes></element><element><id>Relation</id><coordinates><x>396</x><y>318</y><w>100</w><h>30</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-</panel_attributes><additional_attributes>10;10;80;10</additional_attributes></element><element><id>Relation</id><coordinates><x>210</x><y>514</y><w>150</w><h>120</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>127;10;10;50;10;100</additional_attributes></element><element><id>Relation</id><coordinates><x>421</x><y>123</y><w>40</w><h>310</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>20;10;16;290</additional_attributes></element><element><id>UMLClass</id><coordinates><x>66</x><y>618</y><w>310</w><h>110</h></coordinates><panel_attributes>CustomizedOp
bg=red
{innerclass
ParamType = tuple&lt;int ...&gt;
--
innerclass}
kernel_str_ : const char*
kernel_init_fp_: vx_kernel_initialize_f
kernel_name_ : const char*
params_ : ParamType
</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>440</x><y>609</y><w>280</w><h>120</h></coordinates><panel_attributes>Access Parameter
{innerclass
PackedParam
bg=green
--
type: enum {FLOAT, INT32, ...}
data: unin{float, int32, pointer}
innerclass}
{innerclass
at(p: const T&amp;, offset: int): Param
innerclass}
</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>741</x><y>263</y><w>40</w><h>160</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>20;10;16;140</additional_attributes></element><element><id>Relation</id><coordinates><x>541</x><y>123</y><w>120</w><h>210</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>100;10;20;60;16;190</additional_attributes></element><element><id>Relation</id><coordinates><x>430</x><y>510</y><w>160</w><h>120</h></coordinates><panel_attributes>lt=&lt;.
m2=use</panel_attributes><additional_attributes>140;100;140;50;10;50;10;10</additional_attributes></element><element><id>UMLClass</id><coordinates><x>257</x><y>409</y><w>230</w><h>110</h></coordinates><panel_attributes>CostomOpBase
bg=green
--
Setup(in: vec&lt;tim::vx::tensor&gt;,
out: vec&lt;tim::vx::tensor&gt;): bool
--
get(offset: int):PackedParam
next(): bool
size(): size_t</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>476</x><y>458</y><w>180</w><h>40</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-</panel_attributes><additional_attributes>10;10;160;11</additional_attributes></element><element><id>UMLClass</id><coordinates><x>640</x><y>409</y><w>270</w><h>130</h></coordinates><panel_attributes>CustomOpBaseImpl
bg=gray
--
name_: char*
--
_op_setup(self: vsi_nn_node_t*,_
_ inputs: vsi_nn_tensor_t**,_
_ outputs: vsi_nn_tensor_t**): vsi_bool_
_op_compute(...)_
--
proc_: vsi_nn_op_proc_t*
</panel_attributes><additional_attributes></additional_attributes></element></diagram>

Binary file not shown.

After

Width:  |  Height:  |  Size: 749 KiB