complete custom op readme

2023-02-06 17:23:49 +08:00 · 2023-02-06 17:23:49 +08:00 · a21214f76a
parent 5e7f5cecea
commit a21214f76a
1 changed files with 98 additions and 24 deletions
--- a/docs/customized_op.md
+++ b/docs/customized_op.md
@ -5,15 +5,15 @@
    - [Layout Inference {todo}](#layout-inference-todo)
  - [**Customized opencl operator**](#customized-opencl-operator)
    - [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
-    - [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size)
+    - [How to initialize](#how-to-determine-parameter-list-in-a-tuple) [custom operation](#extend-tim-vx-with-customized-operator)
    - [How to complete custom operation functions](#how-to-determine-parameter-list-in-a-tuple)
    - [Layout Inference {todo}](#layout-inference-todo-1)
 # Extend tim-vx with customized operator
 tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
-  
+* Compose new operation with builtin ops. example: RNNCell
-  * Compose new operation with builtin ops. example: RNNCell
+* Register  opencl kernel as customized operator
  * Register  opencl kernel as customized operator
 # User stories
 As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
@ -22,7 +22,6 @@ As **application developer**, I want to **be able to create my own opeartor with
 # Design overview
 ![extend.tim-vx.operators](image/extend.tim-vx.operators.png)
 * Green components implemented as a public API of tim-vx.
 * Red components could be implemented outside of tim-vx.
 * Gray components implemented as a private code inside tim-vx.
@ -58,7 +57,7 @@ Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built
 1. OpenCL kernel stream as source code;
 2. Kernel enqueue configuration for global_work_size and local_work_size;
 3. Scalar parameter list defined as a std::tuple;
-3. Readable operator name;
+4. Readable operator name;
 TIM-VX provide two different approach to integrate user's operator:
 1. Build from source : build tim-vx source and user operators' implementation as single library;
@ -96,27 +95,102 @@ __kernel void hswish_F32toF32(
                 float            outputZP)
 ```
-C++ paramter list defined by user
+### How to initialize custom operation
 The custom operation class can be defined as:
 ```c++
-namespace user {
+  CustomOpClass(Graph*graph, ParamTuple tuple_list, uint32_t input_num,uint32_t output_num)
-   class customized_hswish : public tim::vx::CustomizeOpBase {
+    : CustomOpBase(graph, input_num, output_num, CustomOpClass::kernel_id_, CustomOpClass::kernel_name_,.../*any other parameter required by c++ code, not relevant to cl kernel**/){
-    using param_types = std::tuple<float/*inputScale*/, float/*inputTail*/, float/*outputScale*/, float/*outputZP*/>;
+    	tuple_list_.swap(tuple_list);
-    customized_hswish(std::shared_ptr<tim::vx::Graph> g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) {
+    	param_transform(tuple_list_, param_list_);
-    }
+	kernel_resource_="...";
-
+ protected:
-    auto clone(std::shared_ptr<tim::vx::Graph> g) {
+	ParamTuple tuple_list_;
-        return g->CreateOperation<user::customized_hswish>(g, this->params/*others*/);
+  	static const char* kernel_name_;
-    }
+  	static int32_t kernel_id_;
   };
 }
 ```
 ### How to config global_work_size and local_work_size
-Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL;
+1.ParamTuple tuple_list_: scalar parameters tuple list in CL kernel signature, we provide param_transform() function to transform tuple_list_ to param_list_.
 2.uint32_t input_num/output_num: the number of kernel operation inputs/outputs.
 3.static const char* kernel_name_: OpenCL kernel name defined by users, which is unique.
 4.static int32_t kernel_id_:OpenCL kernel id is defined as
 ```c++
 int32_t CustomOpClass::kernel_id_ = -1 * (++gobal_kernel_id_).
 ```
 5.const char* kernel_resource_: OpenCL kernel registration should be defined in custom op class initialization function. It can contain multi functions adaptd to servel situations. For example:
 ```c++
 kernel_resource_ = "__kernel void hswish_BF16toBF16(\n\
    __read_only  image2d_array_t  input,\n\
    __write_only image2d_array_t  output,\n\
                           float  beta\n\
    )\n\
 {\n\
   /*kernel funtion resource*/\n\
 }\n\
 \n\
 __kernel void hswish_BF32toF32(\n\
    __read_only  image2d_array_t  input,\n\
    __write_only image2d_array_t  output,\n\
                           float  inputScale,\n\
                           float  inputTail,\n\
                           float  outputScale,\n\
                           float  outputZP\n\
    )\n\
 {\n\
  /*kernel funtion resource*/ \n\
 }\n\";
 ```
 ### How to complete custom operation functions
 1.SetupShapeInfor: the function for output tensor size.
 ```c++
 void SetupShapeInfor() override {
   outputs_size_[0].push_back(...);
   ...
 }
 ```
 2.SetupParams:  the function for kernel select and build option. The func_name_ is the selected function name provided by kernel_resource_, is used to determine which kernel function to be applied. build_option is the compiler options when compile custom op resource.
 ```c++
  void SetupParams(
      std::vector<tim::vx::DataType> input_types,
      std::string& build_option) override {
      if(...){
        func_name_ = "..."/*it MUST provided in kernel_source_ */;
        build_option = "..."/*compile paramters*/;
      }else{
 	...
      }
  }
 ```
 3.SetupEnqueue: the function for kernel local size and gobal size.
 ```c++
 void SetupEnqueue(uint32_t& dim, std::vector<size_t>& global_size,
                  std::vector<size_t>& local_size) {
  dim = .../*kernel dim*/;
  local_size[0] = .../*kernel local size*/;
  global_size[0] = .../*kernel global size*/;
 }
 ```
 local_size and global_size are similar features as **clEnqueueNDRangeKernel** in standard OpenCL.
 Some tips for work_size:
    HWThreadCount = 4
-### Layout Inference {todo}
+4.Clone: the function for operation clone.
 ```c++
 std::shared_ptr<tim::vx::Operation> Clone(
      std::shared_ptr<tim::vx::Graph>& graph) const override{
   return graph->CreateOperation<user::custom_operation>(graph,this->params/*others*/);
 }
 ```
 ### Layout Inference
 so far we don't support this feature. User should take care of the layout transform carefully.
 TODO: vsi will rework the framework so that any customized op can work properly in layout transform.