diff --git a/README.md b/README.md index 1f29721..3b2030c 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,9 @@ Feel free to raise a github issue if you wish to add TIM-VX for other frameworks ![TIM-VX Architecture](docs/image/timvx_overview.svg) + +## Technical documents +* [Add customized operator](docs/customized_op.md) # Get started ## Build and Run diff --git a/docs/customized_op.md b/docs/customized_op.md new file mode 100644 index 0000000..36dd7a1 --- /dev/null +++ b/docs/customized_op.md @@ -0,0 +1,122 @@ +- [Extend tim-vx with customized operator](#extend-tim-vx-with-customized-operator) +- [User stories](#user-stories) +- [Design overview](#design-overview) + - [**Composed operator**](#composed-operator) + - [Layout Inference {todo}](#layout-inference-todo) + - [**Customized opencl operator**](#customized-opencl-operator) + - [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple) + - [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size) + - [Layout Inference {todo}](#layout-inference-todo-1) + +# Extend tim-vx with customized operator + +tim-vx will provide two different approches supporting extend AI operators besides built-in ops. + + * Compose new operation with builtin ops. example: RNNCell + * Register opencl kernel as customized operator + +# User stories +As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks. + +As **application developer**, I want to **be able to create my own opeartor with standard opencl kernel**, so that I can **support novel operators not presented in tim-vx**. + +# Design overview +![extend.tim-vx.operators](image/extend.tim-vx.operators.png) + +* Green components implemented as a public API of tim-vx. +* Red components could be implemented outside of tim-vx. +* Gray components implemented as a private code inside tim-vx. + +## **Composed operator** + +If some operator can be composed by built-in operators, such as RNNCell which actually built from FullyConnected, Tanh, and DataConvert Layers, +developer can add their own operator implementation before VSI introduce high-performance built-in ops. + +[Implementation reference of RNNCell](https://github.com/VeriSilicon/TIM-VX/blob/main/src/tim/vx/ops/rnn_cell.cc) + +**Keynotes for RNNCell**: + +In the constructor of RNNCellImpl, internal operators - fc/tanh/dataconvert - will be created without inner connection. +The inner connection build up inside bindInput() and bindOutput(); + +### Layout Inference {todo} + +Inside of composed operator, it actually is a subgraph of tim-vx's built-in operatos, it should be easy to extend the original layout inference for build-in operators to composed operator - just do layout inference inside the subgraph. + +```c++ +void ComposedOp::OnInputs(std::vector next_tensor) { + for(auto op: op_->OpsInSubgraph()) { + auto Cloned = handleLayoutInference(new_graph, op); + } +} +``` + +## **Customized opencl operator** + +Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built-in infrastructure, user can inject their operator with : + +1. OpenCL kernel stream as source code; +2. Kernel enqueue configuration for global_work_size and local_work_size; +3. Scalar parameter list defined as a std::tuple; +3. Readable operator name; + +TIM-VX provide two different approach to integrate user's operator: +1. Build from source : build tim-vx source and user operators' implementation as single library; +2. Build from sdk: tim-vx prebuilt as a standalone library and a set of standard headers; user build operator implementation and link with tim-vx; + +From tim-vx api view, the customized operator registed at graph-level, the registration automatically effected at the first time to create instance of the customized operator. With this approcah, user can override built-in operator or support new operator in a new model easily. + +```c++ +void CreateGraphWithCustomizedOperator() { + // create context/graph/tensor as before. + auto conv = graph->CreateOperation(...); + auto post_detect = graph->CreateOperation<3rd_party::DetectionPostProcess>(...); + post_detect.BindInput(...); + post_detect.BindOutput(...); + + graph->Compile(); +} +``` + +### How to determine parameter list in a tuple +Usually, kernel take two different kinds of paramter: "tensor-like" and scalar; The tensor-like parameters usually is the output-tensor from other operators or input for other operator. +In the operator's paramter list, only scalar parameters should be defined. "tensor-like" operand should provied by bindInput/bindOutput. + +The scalar paramters **MUST** provided at kernel registration. + +Take following hswish as example: +CL kernel signature: +```cl +__kernel void hswish_F32toF32( + __read_only image2d_array_t input, + __write_only image2d_array_t output, + float inputScale, + float inputTail, + float outputScale, + float outputZP) +``` + +C++ paramter list defined by user +```c++ +namespace user { + class customized_hswish : public tim::vx::CustomizeOpBase { + using param_types = std::tuple; + customized_hswish(std::shared_ptr g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) { + } + + auto clone(std::shared_ptr g) { + return g->CreateOperation(g, this->params/*others*/); + } + }; +} +``` +### How to config global_work_size and local_work_size + +Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL; + +Some tips for work_size: + HWThreadCount = 4 + +### Layout Inference {todo} +so far we don't support this feature. User should take care of the layout transform carefully. +TODO: vsi will rework the framework so that any customized op can work properly in layout transform. \ No newline at end of file diff --git a/docs/extend_tim-vx-operators.uxf b/docs/extend_tim-vx-operators.uxf new file mode 100644 index 0000000..21826d9 --- /dev/null +++ b/docs/extend_tim-vx-operators.uxf @@ -0,0 +1,92 @@ +10Space for diagram notesUMLClass1906931060<<interface>> +tim::vx::Operation +bg=green +-- +BindInput(tensor: const shared_ptr<Tensor>): Operation& +UMLClass5906913060<<interface>> +OpImpl +bg=gray +-- +graph_ : GraphImpl* +UMLClass7020932060tim::vx::DirectMapOp +bg=green +-- +DirectMapOp(graph: Graph*, kind: uint32_t, in_cnt: int =0, +out_cnt: int=0, layout : DataLayout = DataLayout::ANY) +Relation22011830110lt=<<-10;10;10;90Relation4907812050lt=<<<<<- +m1=1 +m2=1 +10;20;100;20UMLClass0324180100vx.builtin.ops +bg=green +-- +{innerclass +Conv1d +innerclass} +{innerclass +Conv2d +innerclass} +{innerclass +{innerclass +Conv3d +innerclass}Relation8025814080lt=<<-120;10;120;40;10;40;10;60UMLClass67020918060DirectMapOpImpl +"VxBuiltInOpImpl" +bg=gray +-- +node_: vsi_nn_node_t* = nullptr +Relation640123140100lt=<<-10;10;120;80UMLClass47630817060RNNCellImpl +bg=gray +-- +fc0_ : shared_ptr<Operation> +fc1_ : sahred_ptr<Operation>UMLClass24630816060RNNCell +bg=green +-- +RNNCell(graph: Graph, + act : ActivationType)Relation33611830210lt=<<-10;10;10;190Relation37623831040lt=<<<<<-10;10;290;11Relation39631810030lt=<<<<<-10;10;80;10Relation210514150120lt=<<-127;10;10;50;10;100Relation42112340310lt=<<-20;10;16;290UMLClass66618310110CustomizedOp +bg=red +{innerclass +ParamType = tuple<int ...> +-- +innerclass} +kernel_str_ : const char* +kernel_init_fp_: vx_kernel_initialize_f +kernel_name_ : const char* +params_ : ParamType + +UMLClass440609280120Access Parameter +{innerclass +PackedParam +bg=green +-- +type: enum {FLOAT, INT32, ...} +data: unin{float, int32, pointer} +innerclass} + +{innerclass +at(p: const T&, offset: int): Param +innerclass} + + + +Relation74126340160lt=<<-20;10;16;140Relation541123120210lt=<<-100;10;20;60;16;190Relation430510160120lt=<. +m2=use140;100;140;50;10;50;10;10UMLClass257409230110CostomOpBase +bg=green +-- +Setup(in: vec<tim::vx::tensor>, + out: vec<tim::vx::tensor>): bool +-- +get(offset: int):PackedParam +next(): bool +size(): size_tRelation47645818040lt=<<<<<-10;10;160;11UMLClass640409270130CustomOpBaseImpl +bg=gray +-- +name_: char* +-- +_op_setup(self: vsi_nn_node_t*,_ +_ inputs: vsi_nn_tensor_t**,_ +_ outputs: vsi_nn_tensor_t**): vsi_bool_ +_op_compute(...)_ +-- +proc_: vsi_nn_op_proc_t* + + + \ No newline at end of file diff --git a/docs/image/extend.tim-vx.operators.png b/docs/image/extend.tim-vx.operators.png new file mode 100644 index 0000000..810fd34 Binary files /dev/null and b/docs/image/extend.tim-vx.operators.png differ