diff --git a/README.md b/README.md
index 1f29721..3b2030c 100644
--- a/README.md
+++ b/README.md
@@ -28,6 +28,9 @@ Feel free to raise a github issue if you wish to add TIM-VX for other frameworks
 
 ![TIM-VX Architecture](docs/image/timvx_overview.svg)
 
+
+## Technical documents
+*   [Add customized operator](docs/customized_op.md)
 # Get started
 
 ## Build and Run
diff --git a/docs/customized_op.md b/docs/customized_op.md
new file mode 100644
index 0000000..36dd7a1
--- /dev/null
+++ b/docs/customized_op.md
@@ -0,0 +1,122 @@
+- [Extend tim-vx with customized operator](#extend-tim-vx-with-customized-operator)
+- [User stories](#user-stories)
+- [Design overview](#design-overview)
+  - [**Composed operator**](#composed-operator)
+    - [Layout Inference {todo}](#layout-inference-todo)
+  - [**Customized opencl operator**](#customized-opencl-operator)
+    - [How to determine parameter list in a tuple](#how-to-determine-parameter-list-in-a-tuple)
+    - [How to config global_work_size and local_work_size](#how-to-config-global_work_size-and-local_work_size)
+    - [Layout Inference {todo}](#layout-inference-todo-1)
+
+# Extend tim-vx with customized operator
+
+tim-vx will provide two different approches supporting extend AI operators besides built-in ops.
+  
+  * Compose new operation with builtin ops. example: RNNCell
+  * Register  opencl kernel as customized operator
+
+# User stories
+As **application developer**, I want to **be able to create new opeartor with built-in ops**, so that I can **simplify the lowing from high-level framework(tensorflow,pytorch) to tim-vx**, since I don't want to rewrite same pattern in different frameworks.
+
+As **application developer**, I want to **be able to create my own opeartor with standard opencl kernel**, so that I can **support novel operators not presented in tim-vx**.
+
+# Design overview
+![extend.tim-vx.operators](image/extend.tim-vx.operators.png)
+
+* Green components implemented as a public API of tim-vx.
+* Red components could be implemented outside of tim-vx.
+* Gray components implemented as a private code inside tim-vx.
+
+## **Composed operator**
+
+If some operator can be composed by built-in operators, such as RNNCell which actually built from FullyConnected, Tanh, and DataConvert Layers,
+developer can add their own operator implementation before VSI introduce high-performance built-in ops.
+
+[Implementation reference of RNNCell](https://github.com/VeriSilicon/TIM-VX/blob/main/src/tim/vx/ops/rnn_cell.cc)
+
+**Keynotes for RNNCell**:
+
+In the constructor of RNNCellImpl, internal operators - fc/tanh/dataconvert - will be created without inner connection.
+The inner connection build up inside bindInput() and bindOutput();
+
+### Layout Inference {todo}
+
+Inside of composed operator, it actually is a subgraph of tim-vx's built-in operatos, it should be easy to extend the original layout inference for build-in operators to composed operator - just do layout inference inside the subgraph.
+
+```c++
+void ComposedOp::OnInputs(std::vector<std::shared_ptr<vx::Tensor> next_tensor) {
+    for(auto op: op_->OpsInSubgraph()) {
+        auto Cloned = handleLayoutInference(new_graph, op);
+    }
+}
+```
+
+## **Customized opencl operator**
+
+Customzied kernel should implemented with standard OpenCL 2.0; With tim-vx built-in infrastructure, user can inject their operator with :
+
+1. OpenCL kernel stream as source code;
+2. Kernel enqueue configuration for global_work_size and local_work_size;
+3. Scalar parameter list defined as a std::tuple;
+3. Readable operator name;
+
+TIM-VX provide two different approach to integrate user's operator:
+1. Build from source : build tim-vx source and user operators' implementation as single library;
+2. Build from sdk: tim-vx prebuilt as a standalone library and a set of standard headers; user build operator implementation and link with tim-vx;
+
+From tim-vx api view, the customized operator registed at graph-level, the registration automatically effected at the first time to create instance of the customized operator. With this approcah, user can override built-in operator or support new operator in a new model easily.
+
+```c++
+void CreateGraphWithCustomizedOperator() {
+    // create context/graph/tensor as before.
+    auto conv = graph->CreateOperation<tim::vx::Conv2d>(...);
+    auto post_detect = graph->CreateOperation<3rd_party::DetectionPostProcess>(...);
+    post_detect.BindInput(...);
+    post_detect.BindOutput(...);
+    
+    graph->Compile();
+}
+```
+
+### How to determine parameter list in a tuple
+Usually, kernel take two different kinds of paramter: "tensor-like" and scalar; The tensor-like parameters usually is the output-tensor from other operators or input for other operator. 
+In the operator's paramter list, only scalar parameters should be defined. "tensor-like" operand should provied by bindInput/bindOutput. 
+
+The scalar paramters **MUST** provided at kernel registration.
+
+Take following hswish as example:
+CL kernel signature:
+```cl
+__kernel void hswish_F32toF32(
+    __read_only  image2d_array_t  input,
+    __write_only image2d_array_t  output,
+                 float            inputScale,
+                 float            inputTail,
+                 float            outputScale,
+                 float            outputZP)
+```
+
+C++ paramter list defined by user
+```c++
+namespace user {
+   class customized_hswish : public tim::vx::CustomizeOpBase {
+    using param_types = std::tuple<float/*inputScale*/, float/*inputTail*/, float/*outputScale*/, float/*outputZP*/>;
+    customized_hswish(std::shared_ptr<tim::vx::Graph> g, const param_types& params/* any other parameter required by c++ code, not relevant to cl kernel*/) {
+    }
+
+    auto clone(std::shared_ptr<tim::vx::Graph> g) {
+        return g->CreateOperation<user::customized_hswish>(g, this->params/*others*/);
+    }
+   };
+}
+```
+### How to config global_work_size and local_work_size
+
+Similar feature as **clEnqueueNDRangeKernel** in standard OpenCL;
+
+Some tips for work_size:
+    HWThreadCount = 4
+
+### Layout Inference {todo}
+so far we don't support this feature. User should take care of the layout transform carefully.
+TODO: vsi will rework the framework so that any customized op can work properly in layout transform.
\ No newline at end of file
diff --git a/docs/extend_tim-vx-operators.uxf b/docs/extend_tim-vx-operators.uxf
new file mode 100644
index 0000000..21826d9
--- /dev/null
+++ b/docs/extend_tim-vx-operators.uxf
@@ -0,0 +1,92 @@
+<diagram program="umletino" version="14.4.0-SNAPSHOT"><zoom_level>10</zoom_level><help_text>Space for diagram notes</help_text><element><id>UMLClass</id><coordinates><x>190</x><y>69</y><w>310</w><h>60</h></coordinates><panel_attributes>&lt;&lt;interface&gt;&gt;
+tim::vx::Operation
+bg=green
+--
+BindInput(tensor: const shared_ptr&lt;Tensor&gt;): Operation&amp;
+</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>590</x><y>69</y><w>130</w><h>60</h></coordinates><panel_attributes>&lt;&lt;interface&gt;&gt;
+OpImpl
+bg=gray
+--
+graph_ : GraphImpl*
+</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>70</x><y>209</y><w>320</w><h>60</h></coordinates><panel_attributes>tim::vx::DirectMapOp
+bg=green
+--
+DirectMapOp(graph: Graph*, kind: uint32_t, in_cnt: int =0, 
+out_cnt: int=0, layout : DataLayout = DataLayout::ANY)
+</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>220</x><y>118</y><w>30</w><h>110</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>10;10;10;90</additional_attributes></element><element><id>Relation</id><coordinates><x>490</x><y>78</y><w>120</w><h>50</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-
+m1=1
+m2=1
+</panel_attributes><additional_attributes>10;20;100;20</additional_attributes></element><element><id>UMLClass</id><coordinates><x>0</x><y>324</y><w>180</w><h>100</h></coordinates><panel_attributes>vx.builtin.ops
+bg=green
+--
+{innerclass
+Conv1d
+innerclass}
+{innerclass
+Conv2d
+innerclass}
+{innerclass
+{innerclass
+Conv3d
+innerclass}</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>80</x><y>258</y><w>140</w><h>80</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>120;10;120;40;10;40;10;60</additional_attributes></element><element><id>UMLClass</id><coordinates><x>670</x><y>209</y><w>180</w><h>60</h></coordinates><panel_attributes>DirectMapOpImpl
+"VxBuiltInOpImpl"
+bg=gray
+--
+node_: vsi_nn_node_t* = nullptr
+</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>640</x><y>123</y><w>140</w><h>100</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>10;10;120;80</additional_attributes></element><element><id>UMLClass</id><coordinates><x>476</x><y>308</y><w>170</w><h>60</h></coordinates><panel_attributes>RNNCellImpl
+bg=gray
+--
+fc0_ : shared_ptr&lt;Operation&gt;
+fc1_ : sahred_ptr&lt;Operation&gt;</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>246</x><y>308</y><w>160</w><h>60</h></coordinates><panel_attributes>RNNCell
+bg=green
+--
+RNNCell(graph: Graph, 
+              act : ActivationType)</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>336</x><y>118</y><w>30</w><h>210</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>10;10;10;190</additional_attributes></element><element><id>Relation</id><coordinates><x>376</x><y>238</y><w>310</w><h>40</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-</panel_attributes><additional_attributes>10;10;290;11</additional_attributes></element><element><id>Relation</id><coordinates><x>396</x><y>318</y><w>100</w><h>30</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-</panel_attributes><additional_attributes>10;10;80;10</additional_attributes></element><element><id>Relation</id><coordinates><x>210</x><y>514</y><w>150</w><h>120</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>127;10;10;50;10;100</additional_attributes></element><element><id>Relation</id><coordinates><x>421</x><y>123</y><w>40</w><h>310</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>20;10;16;290</additional_attributes></element><element><id>UMLClass</id><coordinates><x>66</x><y>618</y><w>310</w><h>110</h></coordinates><panel_attributes>CustomizedOp
+bg=red
+{innerclass
+ParamType = tuple&lt;int ...&gt;
+--
+innerclass}
+kernel_str_ : const char*
+kernel_init_fp_: vx_kernel_initialize_f
+kernel_name_ : const char*
+params_ : ParamType
+
+</panel_attributes><additional_attributes></additional_attributes></element><element><id>UMLClass</id><coordinates><x>440</x><y>609</y><w>280</w><h>120</h></coordinates><panel_attributes>Access Parameter 
+{innerclass
+PackedParam
+bg=green
+--
+type: enum {FLOAT, INT32, ...}
+data: unin{float, int32, pointer}
+innerclass}
+
+{innerclass
+at(p: const T&amp;, offset: int): Param
+innerclass}
+
+
+
+</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>741</x><y>263</y><w>40</w><h>160</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>20;10;16;140</additional_attributes></element><element><id>Relation</id><coordinates><x>541</x><y>123</y><w>120</w><h>210</h></coordinates><panel_attributes>lt=&lt;&lt;-</panel_attributes><additional_attributes>100;10;20;60;16;190</additional_attributes></element><element><id>Relation</id><coordinates><x>430</x><y>510</y><w>160</w><h>120</h></coordinates><panel_attributes>lt=&lt;.
+m2=use</panel_attributes><additional_attributes>140;100;140;50;10;50;10;10</additional_attributes></element><element><id>UMLClass</id><coordinates><x>257</x><y>409</y><w>230</w><h>110</h></coordinates><panel_attributes>CostomOpBase
+bg=green
+--
+Setup(in: vec&lt;tim::vx::tensor&gt;, 
+               out: vec&lt;tim::vx::tensor&gt;): bool
+--
+get(offset: int):PackedParam
+next(): bool
+size(): size_t</panel_attributes><additional_attributes></additional_attributes></element><element><id>Relation</id><coordinates><x>476</x><y>458</y><w>180</w><h>40</h></coordinates><panel_attributes>lt=&lt;&lt;&lt;&lt;&lt;-</panel_attributes><additional_attributes>10;10;160;11</additional_attributes></element><element><id>UMLClass</id><coordinates><x>640</x><y>409</y><w>270</w><h>130</h></coordinates><panel_attributes>CustomOpBaseImpl
+bg=gray
+--
+name_: char*
+--
+_op_setup(self: vsi_nn_node_t*,_
+_                inputs: vsi_nn_tensor_t**,_
+_                outputs: vsi_nn_tensor_t**): vsi_bool_
+_op_compute(...)_
+--
+proc_: vsi_nn_op_proc_t*
+
+
+</panel_attributes><additional_attributes></additional_attributes></element></diagram>
\ No newline at end of file
diff --git a/docs/image/extend.tim-vx.operators.png b/docs/image/extend.tim-vx.operators.png
new file mode 100644
index 0000000..810fd34
Binary files /dev/null and b/docs/image/extend.tim-vx.operators.png differ