mlir-hlo

Commit Graph

Author	SHA1	Message	Date
A. Unique TensorFlower	9f47ff607b	[MLIR][KernelGen] Better rank specialization for clusters containing `mhlo.select` Merge `mhlo.select` into rank specialization clusters. Infer shape equalities correctly from `mhlo.select` (and also from `mhlo.clamp`). This allows to lower the relu kernel completely flattened. PiperOrigin-RevId: 379925793	2021-06-17 04:06:08 -07:00
Adrian Kuegel	376da8592f	Add MLIR generated SignOp GPU kernel for complex types. PiperOrigin-RevId: 379924456	2021-06-17 03:56:58 -07:00
Adrian Kuegel	73ed8cbf82	Add MLIR generated NegOp GPU kernel for complex types. PiperOrigin-RevId: 379905236	2021-06-17 01:30:51 -07:00
Mehdi Amini	8c8e81cb69	Fix pass definition to inherit from the TableGen generated base class (NFC) PiperOrigin-RevId: 379860210	2021-06-16 19:05:11 -07:00
A. Unique TensorFlower	2e08c246e9	Integrate LLVM at llvm/llvm-project@7fff39d9b0 Updates LLVM usage to match [7fff39d9b046](https://github.com/llvm/llvm-project/commit/7fff39d9b046) PiperOrigin-RevId: 379822265	2021-06-16 15:18:51 -07:00
Wenyi Zhao	88cc0c6c46	PR #50271 : [MLIR][DISC] Bufferize GatherOp and DynamicGatherOp Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50271 support hlo-to-lhlo conversion for GatherOp and DynamicGatherOp Copybara import of the project: -- 117a1b1bcaac7ecc5224b02863eede5c1b9618fe by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] Bufferize GatherOp and DynamicGatherOp PiperOrigin-RevId: 379801972	2021-06-16 13:47:56 -07:00
Wenyi Zhao	34dc5f2a79	PR #50020 : [MLIR][DISC] support fusion on buffer Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50020 This pass implements the logic to group kLoop/kInput fusion patterns on buffer level. The reason for this is that we can avoid a lot of headaches to handle `shape-only` consumers specially (e.g. memref.dim, shape.shapeOf) since shapes are already resolved in buffer world. It may be better to move this pass to tensor level after more shape inference/constraint infras are ready on mhlo level. Copybara import of the project: -- e31f8344b59aa9860097197585215ea1689b8ff4 by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] support fusion on buffer This pass implements the logic to group kLoop/kInput fusion patterns on buffer level. The reason for this is that we can avoid a lot of headaches to handle `shape-only` consumers specially (e.g. memref.dim, shape.shapeOf) since shapes are already resolved in buffer world. It may be better to move this pass to tensor level after more shape inference/constraint infras are ready on mhlo level. -- 35f2eb2791241b0ab5db1ddcaf1b4006278ddccf by Wenyi Zhao <reyizero@gmail.com>: fix -- 923c8d61f7fe00a2a0df22d5be396508f0667964 by Wenyi Zhao <reyizero@gmail.com>: fix sanity check failure PiperOrigin-RevId: 379743424	2021-06-16 09:51:29 -07:00
A. Unique TensorFlower	82696f8598	[MLIR][HLO] Annotate `mhlo.clamp` and `mhlo.select` as element-wise broadcasting The operations allow for a limited form of broadcasting which allows some operands to be scalars. As such they are neither strictly `Elementwise`, nor `Broadcasting`. They do fulfill the requirements for `BroadcastingElementwise` though. PiperOrigin-RevId: 379719961	2021-06-16 07:59:26 -07:00
A. Unique TensorFlower	a65cf627c4	Integrate LLVM at llvm/llvm-project@662f9bff33 Updates LLVM usage to match [662f9bff337b](https://github.com/llvm/llvm-project/commit/662f9bff337b) PiperOrigin-RevId: 379699574	2021-06-16 05:35:13 -07:00
A. Unique TensorFlower	06ce0852d1	Integrate LLVM at llvm/llvm-project@cc8d32ae7d Updates LLVM usage to match [cc8d32ae7d94](https://github.com/llvm/llvm-project/commit/cc8d32ae7d94) PiperOrigin-RevId: 379676797	2021-06-16 02:40:01 -07:00
Hanhan Wang	b44ab8ad49	Add support for lowering DataMovementOp ops to Linalg on unsigned types. PiperOrigin-RevId: 379527360	2021-06-15 10:58:22 -07:00
Feiwen	3afbe312f8	PR #49919 : [MLIR][DISC] pattern conversion from tf2mhlo: ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49919 We are porting our MLIR-based dynamic shape compiler to tf community (From OP def, Patttern, to Optimization pass, etc). This is the 5th PR about tf2mhlo pattern conversion, which including ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic. The rest pattern conversions we will add: - ConvertSqueezeOpxxx - ConvertStridedSliceOpxxx - ConvertPrintOp Copybara import of the project: -- 21b3c3eb05b12956bcdb8b98cc54d9371dbf034d by azazhu <azazhu@gmail.com>: [MLIR][DISC] pattern conversion from tf2mhlo: ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic -- 634630a4e2e426357290650bd579b35efecab5b3 by azazhu <azazhu@gmail.com>: [MLIR][DISC] refine ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic -- 39a2bedd6dafb369ae960c5197b7a352bfdfbc80 by azazhu <azazhu@gmail.com>: add RealDynamicSliceOp's canonicalize and fix CI -- a1c38dd0963d602ed4812da0d77a096a95920ddb by azazhu <azazhu@gmail.com>: fix CI for ConvertUnpackOpDynamic -- 5a8b4eb389ed6dc554104356c37f2f1550802b8c by azazhu <azazhu@gmail.com>: fix typo in ConvertSigmoidGradOpDynamic PiperOrigin-RevId: 379521079	2021-06-15 10:33:32 -07:00
Chris Jones	5fbdac34a9	[XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect. PiperOrigin-RevId: 379455720	2021-06-15 03:55:19 -07:00
Adrian Kuegel	399dae666d	Add MLIR generated ExpOp GPU kernel for complex types. We lower lmhlo::ExpOp to mlir::complex::ExpOp for complex types. PiperOrigin-RevId: 379432147	2021-06-15 00:45:45 -07:00
Geoffrey Martin-Noble	f9f7a63870	Add missing dep on RAL pass generation Without this I see errors about being unable to find the generated header in our project's build. PiperOrigin-RevId: 379377718	2021-06-14 17:02:26 -07:00
A. Unique TensorFlower	6f5a440031	Integrate LLVM at llvm/llvm-project@56ae4f23b2 Updates LLVM usage to match [56ae4f23b227](https://github.com/llvm/llvm-project/commit/56ae4f23b227) PiperOrigin-RevId: 379348813	2021-06-14 14:17:52 -07:00
Wenyi Zhao	7f94bd923b	PR #50236 : [MLIR][DISC] Bufferize TransposeOp and ConcatenateOp Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50236 support hlo-to-lhlo conversion for TransposeOp and ConcatenateOp Copybara import of the project: -- 62860e717f2a14fbd3ddfb634aa6ff132d245a72 by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] Bufferize TransposeOp and ConcatenateOp -- ce2ff57c1edee1172cd2f36346cc0b34ec1c7467 by Wenyi Zhao <reyizero@gmail.com>: fix PiperOrigin-RevId: 379330954	2021-06-14 12:37:45 -07:00
Wenyi Zhao	23ebbb28d1	PR #50191 : [MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50191 DISC is a e2e flow, including both compiler side and runtime side. For runtime side, we have different targeting environments (e.g. tensorflow, pytorch, or sometimes even a standalone binary). In order to simplify the design of the compiler side, we design a Runtime Abstraction Layer (RAL) to sperate the compiler side and runtime side. Thus the compiler side only need to target RAL itself and it is the responsibility of RAL to handle the differences between different targeting environments. One of the most important functions of RAL is to manage stateful resources. To this end, it provides a context object, and hides all stateful operations behind this context, thus the compiler side itself doesn't need to care about the resource initialization. For example, a kernel must be loaded before it can be launched on GPU. However, the loading operation should only be taken once during the whole lifetime of the context in order to achieve the best performance. Based on the initialization-free interfaces provided by RAL, compiler side can focus on its core optimization logic and lets the RAL to manage the resource status. The context mentioned above is passed as a parameter to the entry function and all RAL APIs should always use the context as their first argument. This CR also provides a pass to help to ensure this property. The pass rewrites the entry function to make sure their first argument is the context. For entry function, the pass also rewrites its inputs and outputs. To be concrete, all the original inputs and outputs of the entry function are received from and sent to RAL through a sequence of RAL API calls correspondingly. The motivation behind this is to hide the implementation details of I/Os. This design may also potentially enable partial execution of the compiled module when some of the inputs are ready. Copybara import of the project: -- c4f20a89aed71181e75bcc5265723b88bde23240 by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect DISC is a e2e flow, including both compiler side and runtime side. For runtime side, we have different targeting environments (e.g. tensorflow, pytorch, or sometimes even a standalone binary). In order to simplify the design of the compiler side, we design a Runtime Abstraction Layer (RAL) to sperate the compiler side and runtime side. Thus the compiler side only need to target RAL itself and it is the responsibility of RAL to handle the differences between different targeting environments. One of the most important functions of RAL is to manage stateful resources. To this end, it provides a context object, and hides all stateful operations behind this context, thus the compiler side itself doesn't need to care about the resource initialization. For example, a kernel must be loaded before it can be launched on GPU. However, the loading operation should only be taken once during the whole lifetime of the context in order to achieve the best performance. Based on the initialization-free interfaces provided by RAL, compiler side can focus on its core optimization logic and lets the RAL to manage the resource status. The context mentioned above is passed as a parameter to the entry function and all RAL APIs should always use the context as their first argument. This CR also provides a pass to help to ensure this property. The pass rewrites the entry function to make sure their first argument is the context. For entry function, the pass also rewrites its inputs and outputs. To be concrete, all the original inputs and outputs of the entry function are received from and sent to RAL through a sequence of RAL API calls correspondingly. The motivation behind this is to hide the implementation details of I/Os. This design may also potentially enable partial execution of the compiled module when some of the inputs are ready. -- 1991d4f80ab6087943956e1c0fec4940a22ab08d by Wenyi Zhao <reyizero@gmail.com>: fix PiperOrigin-RevId: 379317586	2021-06-14 11:27:43 -07:00
Rahul Joshi	a6011d0279	[HLO] Add AllReduceScatter to MHLO and LMHLO dialects. PiperOrigin-RevId: 379296198	2021-06-14 09:37:07 -07:00
A. Unique TensorFlower	dbfa4b1537	Integrate LLVM at llvm/llvm-project@b90f9bea96 Updates LLVM usage to match [b90f9bea9673](https://github.com/llvm/llvm-project/commit/b90f9bea9673) PiperOrigin-RevId: 379251091	2021-06-14 04:15:09 -07:00
A. Unique TensorFlower	07c92f0ad8	Integrate LLVM at llvm/llvm-project@e0b469ffa1 Updates LLVM usage to match [e0b469ffa142](https://github.com/llvm/llvm-project/commit/e0b469ffa142) PiperOrigin-RevId: 378992873	2021-06-11 19:36:05 -07:00
A. Unique TensorFlower	89faaa6575	Integrate LLVM at llvm/llvm-project@82a3b606b0 Updates LLVM usage to match [82a3b606b01d](https://github.com/llvm/llvm-project/commit/82a3b606b01d) PiperOrigin-RevId: 378974967	2021-06-11 16:50:40 -07:00
Wenyi Zhao	8388303fd2	PR #50211 : [MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50211 support hlo-to-lhlo conversion for RealDynamicSliceOp and ReduceOp Copybara import of the project: -- c417b336670a1fc256f7026dfe8080e46d13d79a by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp PiperOrigin-RevId: 378972113	2021-06-11 16:33:15 -07:00
Jacques Pienaar	95ba03534f	Allow variadic operands/result in MHLO while This just adds support for it in the op, but keeps the production/uses as is (e.g., single tensor or tuple) matching what XLA export requires. In follow up here, would be to add pass for export to retuple and then the canonical form could be changed. Tuple'ing given control flow via regions & multi-result operations does not add representational power and all the get_tuple_element ops obscure the computation. The old form allowed single tensor or tuple. The new variadic number of tensor or tuples as tuples may be nested, so the input could have (Tensor<..>, Tuple<Tensor<...>, Tuple<...>, ...>, Tensor<...>) and HLO_Tensor doesn't allow Tuples. PiperOrigin-RevId: 378934388	2021-06-11 13:08:28 -07:00
A. Unique TensorFlower	33f95eecc7	Integrate LLVM at llvm/llvm-project@f3f904563e Updates LLVM usage to match [f3f904563ec9](https://github.com/llvm/llvm-project/commit/f3f904563ec9) PiperOrigin-RevId: 378880044	2021-06-11 08:47:40 -07:00
A. Unique TensorFlower	bd5752f0bf	[MLIR][HLO] Find shape equivalences and use them for better rank specialization Find shape equivalence classes among the operands and use them for better rank specialization. If all operands are known to be of the same shape, we can flatten them to rank one. If there are two shape equivalence classes, we can generalize the scalar rank specialization cases. PiperOrigin-RevId: 378844575	2021-06-11 04:00:26 -07:00
A. Unique TensorFlower	5cca8a14e3	Integrate LLVM at llvm/llvm-project@4f6ec382c8 Updates LLVM usage to match [4f6ec382c8b7](https://github.com/llvm/llvm-project/commit/4f6ec382c8b7) PiperOrigin-RevId: 378808874	2021-06-10 22:53:49 -07:00
A. Unique TensorFlower	ad7bc780b9	Integrate LLVM at llvm/llvm-project@ff81a2c95d Updates LLVM usage to match [ff81a2c95ddb](https://github.com/llvm/llvm-project/commit/ff81a2c95ddb) PiperOrigin-RevId: 378745601	2021-06-10 15:19:43 -07:00
Wenyi Zhao	6660234d80	PR #50100 : [MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50100 support hlo-to-lhlo conversion for DynamicIotaOp and DynamicPadOp Copybara import of the project: -- c3aae94954e35d3f8ad265f619ef9765665a5115 by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp -- adc6996d70b804d61310d56a33fac975d70c8636 by Wenyi Zhao <reyizero@gmail.com>: minor PiperOrigin-RevId: 378733284	2021-06-10 14:20:45 -07:00
A. Unique TensorFlower	642ca86a3f	Integrate LLVM at llvm/llvm-project@ad6a84f82c Updates LLVM usage to match [ad6a84f82c45](https://github.com/llvm/llvm-project/commit/ad6a84f82c45) PiperOrigin-RevId: 378690997	2021-06-10 11:07:48 -07:00
A. Unique TensorFlower	14093b7906	[XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect. PiperOrigin-RevId: 378681070	2021-06-10 10:27:22 -07:00
Chris Jones	968226b9d7	[XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect. PiperOrigin-RevId: 378640706	2021-06-10 06:54:42 -07:00
Adrian Kuegel	6088eb697c	Fix Cosh approximation for F16. We should upcast F16 to F32 to prevent precision loss. E.g. cosh(-9) would evaluate to 4042 previously instead of 4052. This allows to enable the MLIR generated kernel for F16 type. Also move template instantiation for Sinh to inside the #ifdef block. This was missed in a previous commit. PiperOrigin-RevId: 378635042	2021-06-10 06:16:44 -07:00
A. Unique TensorFlower	837a1de7c5	Integrate LLVM at llvm/llvm-project@e11b5b87be Updates LLVM usage to match [e11b5b87bebf](https://github.com/llvm/llvm-project/commit/e11b5b87bebf) PiperOrigin-RevId: 378589304	2021-06-10 00:18:25 -07:00
A. Unique TensorFlower	9f67417b41	[MLIR][HLO] Avoid duplicate cluster operands when merging When merging rank specialization clusters, avoid duplicating operands. A fewer number of operands usually allows better rank specialization. PiperOrigin-RevId: 378445946	2021-06-09 10:54:55 -07:00
A. Unique TensorFlower	b580722041	[MLIR][KernelGen] Merge rank specialization clusters Merge adjacent rank specialization clusters. Combine their operands, bodies, and results. PiperOrigin-RevId: 378433222	2021-06-09 10:07:47 -07:00
Adrian Kuegel	b6d8160611	Add Broadcasting and BroadcastingElementwise traits to ConstantLikeOp. This allows to include such ops in rank specialization clusters. PiperOrigin-RevId: 378380915	2021-06-09 05:09:26 -07:00
A. Unique TensorFlower	b9e45007d5	[MLIR][HLO] Extend broadcast propagation pass to enable more fusion Move element-wise operations into assuming regions. This enables fusion opportunities within the region. PiperOrigin-RevId: 378362725	2021-06-09 03:03:37 -07:00
A. Unique TensorFlower	d828b457b3	Handle empty tensors in SimplifyConcatSlice. If the result of the slice is an empty tensor, do nothing. This fixes a crash: we can't create a `concat` with an empty operand range. PiperOrigin-RevId: 378354956	2021-06-09 02:15:47 -07:00
Mehdi Amini	1770ed455f	Remove unnecessary duplicated source from "lhlo" (NFC) PiperOrigin-RevId: 378291564	2021-06-08 18:07:26 -07:00
A. Unique TensorFlower	4134923d4f	Integrate LLVM at llvm/llvm-project@f96b5e801d Updates LLVM usage to match [f96b5e801d67](https://github.com/llvm/llvm-project/commit/f96b5e801d67) PiperOrigin-RevId: 378139137	2021-06-08 06:24:26 -07:00
Adrian Kuegel	9a8c254526	Support complex types for Sinh. Because mhlo::ConstantLike doesn't support complex types, we need to use GetScalarOfType and broadcast it to the needed shape. Disable the tf2xla fallback, now that MLIR fully supports Sinh. PiperOrigin-RevId: 378123151	2021-06-08 04:23:19 -07:00
A. Unique TensorFlower	c47869f931	[MLIR][HLO] Rename `move-up-dynamic-broadcasts-for-fusion` to `broadcast-propagation` PiperOrigin-RevId: 378102608	2021-06-08 01:51:10 -07:00
A. Unique TensorFlower	b2839c735b	Integrate LLVM at llvm/llvm-project@8344e215ec Updates LLVM usage to match [8344e215ec6c](https://github.com/llvm/llvm-project/commit/8344e215ec6c) PiperOrigin-RevId: 378043689	2021-06-07 17:32:52 -07:00
A. Unique TensorFlower	c11de49300	Integrate LLVM at llvm/llvm-project@7ed7d4ccb8 Updates LLVM usage to match [7ed7d4ccb899](https://github.com/llvm/llvm-project/commit/7ed7d4ccb899) PiperOrigin-RevId: 377972571	2021-06-07 12:05:09 -07:00
Benjamin Kramer	d1c60df2fe	[MHLO:linalg] Be more aggressive about turning mhlo.const into std.constant On tensors the only difference between these ops is that mhlo.const supports unsigned types. PiperOrigin-RevId: 377970948	2021-06-07 11:58:23 -07:00
Hanhan Wang	25b93c8d66	Add support for lowering mhlo.iota/dynamic_iota to Linalg on unsigned types. PiperOrigin-RevId: 377956338	2021-06-07 10:59:33 -07:00
Adrian Kuegel	5315997402	Fix Sinh approximation for F16. We should upcast F16 to F32 to prevent precision loss. E.g. sinh(-9) would evaluate to -4042 previously instead of -4052. This allows to enable the MLIR generated kernel for F16 type. PiperOrigin-RevId: 377901896	2021-06-07 06:38:42 -07:00
Tobias Gysi	fc723380e6	Update lhlo to use the new structured op interface. Replace deprecated methods in lhlo_fuse_linalg.cc. The new structured op interface has been introduced in https://reviews.llvm.org/D103394. PiperOrigin-RevId: 377875452	2021-06-07 03:11:03 -07:00
Wenyi Zhao	ade873a5e0	PR #49970 : [MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49970 1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim 2, add a flag `convert-to-lmhlo-only` to seperate following two case: - hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo counterparts, do not apply any optimization (e.g. elide any buffer copy). Buffer optimization is not easy in dynamic shape world especially when involving control flow, thus we leave this to another dedicated pass. - hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo ops (e.g. reshape) to memref dialect directly and Lowers others to their lmhlo counterparts. Copybara import of the project: -- 562bd65a368f6194405c4ae6900e3b4388a5ec03 by Wenyi Zhao <reyizero@gmail.com>: [MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim 1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim 2, add a flag `convert-to-lmhlo-only` to seperate following two case: - hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo counterparts, do not apply any optimization (e.g. elide any buffer copy). Buffer optimization is not easy in dynamic shape world especially when involving control flow, thus we leave this to another dedicated pass. - hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo ops (e.g. reshape) to memref dialect directly and Lowers others to their lmhlo counterparts. PiperOrigin-RevId: 377603395	2021-06-04 15:36:03 -07:00

1 2 3 4 5 ...

1055 Commits All Branches Search

1055 Commits

All Branches