* Properly support tensor handle for both input and output * Fix UT to use size_in_bytes instead of size in elements Signed-off-by: Kainan Cha <kainan.zha@verisilicon.com>