Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50211
support hlo-to-lhlo conversion for RealDynamicSliceOp and ReduceOp
Copybara import of the project:
--
c417b336670a1fc256f7026dfe8080e46d13d79a by Wenyi Zhao <reyizero@gmail.com>:
[MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp
PiperOrigin-RevId: 378972113
This just adds support for it in the op, but keeps the production/uses as is (e.g., single tensor or tuple) matching what XLA export requires. In follow up here, would be to add pass for export to retuple and then the canonical form could be changed. Tuple'ing given control flow via regions & multi-result operations does not add representational power and all the get_tuple_element ops obscure the computation.
The old form allowed single tensor or tuple. The new variadic number of tensor or tuples as tuples may be nested, so the input could have (Tensor<..>, Tuple<Tensor<...>, Tuple<...>, ...>, Tensor<...>) and HLO_Tensor doesn't allow Tuples.
PiperOrigin-RevId: 378934388
Find shape equivalence classes among the operands and use them for better rank
specialization. If all operands are known to be of the same shape, we can
flatten them to rank one. If there are two shape equivalence classes, we can
generalize the scalar rank specialization cases.
PiperOrigin-RevId: 378844575
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50100
support hlo-to-lhlo conversion for DynamicIotaOp and DynamicPadOp
Copybara import of the project:
--
c3aae94954e35d3f8ad265f619ef9765665a5115 by Wenyi Zhao <reyizero@gmail.com>:
[MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp
--
adc6996d70b804d61310d56a33fac975d70c8636 by Wenyi Zhao <reyizero@gmail.com>:
minor
PiperOrigin-RevId: 378733284
We should upcast F16 to F32 to prevent precision loss.
E.g. cosh(-9) would evaluate to 4042 previously instead of 4052.
This allows to enable the MLIR generated kernel for F16 type.
Also move template instantiation for Sinh to inside the #ifdef block.
This was missed in a previous commit.
PiperOrigin-RevId: 378635042
When merging rank specialization clusters, avoid duplicating operands. A fewer
number of operands usually allows better rank specialization.
PiperOrigin-RevId: 378445946
If the result of the slice is an empty tensor, do nothing.
This fixes a crash: we can't create a `concat` with an
empty operand range.
PiperOrigin-RevId: 378354956
Because mhlo::ConstantLike doesn't support complex types, we need to use
GetScalarOfType and broadcast it to the needed shape.
Disable the tf2xla fallback, now that MLIR fully supports Sinh.
PiperOrigin-RevId: 378123151
We should upcast F16 to F32 to prevent precision loss.
E.g. sinh(-9) would evaluate to -4042 previously instead of -4052.
This allows to enable the MLIR generated kernel for F16 type.
PiperOrigin-RevId: 377901896
Replace deprecated methods in lhlo_fuse_linalg.cc. The new structured op interface has been introduced in https://reviews.llvm.org/D103394.
PiperOrigin-RevId: 377875452
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49970
1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim
2, add a flag `convert-to-lmhlo-only` to seperate following two case:
- hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo
counterparts, do not apply any optimization (e.g. elide any
buffer copy). Buffer optimization is not easy in dynamic
shape world especially when involving control flow, thus we
leave this to another dedicated pass.
- hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo
ops (e.g. reshape) to memref dialect directly and Lowers others
to their lmhlo counterparts.
Copybara import of the project:
--
562bd65a368f6194405c4ae6900e3b4388a5ec03 by Wenyi Zhao <reyizero@gmail.com>:
[MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim
1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim
2, add a flag `convert-to-lmhlo-only` to seperate following two case:
- hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo
counterparts, do not apply any optimization (e.g. elide any
buffer copy). Buffer optimization is not easy in dynamic
shape world especially when involving control flow, thus we
leave this to another dedicated pass.
- hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo
ops (e.g. reshape) to memref dialect directly and Lowers others
to their lmhlo counterparts.
PiperOrigin-RevId: 377603395
Adds import/export/verifier support as well.
Also makes `channel_handle` uniform across mhlo.all_reduce and mhlo.all-gather.
PiperOrigin-RevId: 377323468
Fix usage of default constructor. Instead, always use the parameterized
constructor and make the maximum supported rank explicit.
PiperOrigin-RevId: 377037155
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49598
This PR implements logic for lowering memref.tensor_load ops that are
inserted during `mhlo-legalize-to-lmhlo`
Copybara import of the project:
--
80eb377af4e02182e1aecc943a41ca5d7d1c2100 by Wenyi Zhao <reyizero@gmail.com>:
[MLIR][DISC] legalize tensor_load inserted during hlo-to-lhlo conversion
This PR implements logic for lowering memref.tensor_load ops that are
inserted during `mhlo-legalize-to-lmhlo`.
--
ac452fe3dcd591211cd5c59be9189fe2f7153b41 by Wenyi Zhao <reyizero@gmail.com>:
minor fix
--
6b36017f8632a06adbc3e05a62975fa641d0260f by Wenyi Zhao <reyizero@gmail.com>:
minor refine
--
846005cc76d0033112e47825c2e9a97790b6925f by Wenyi Zhao <reyizero@gmail.com>:
minor fix
--
f6a4becaa287d5ca323b2d152a4d0ae053730fd9 by Wenyi Zhao <reyizero@gmail.com>:
fix
--
5555749f60f7fce8f57962860ef65efccf0362ba by Wenyi Zhao <reyizero@gmail.com>:
fix
--
8873b9b6d9315c1199ca9f7c133ecf377ecd2fa6 by Wenyi Zhao <reyizero@gmail.com>:
fix
PiperOrigin-RevId: 376942547
The maximum supported target rank of 5 is sufficient for all operations but
`select`. Make the maximum target rank configurable in the rank specialization.
This reduces the number of generated kernels for operations that don't require
it.
PiperOrigin-RevId: 376822496
Add the first MLIR-generated kernel that relies on an in-TF lowering. Fusion for
this kernel relies on the generalized rank specialization for operation groups.
PiperOrigin-RevId: 376805435
Take advantage of the fact that scalars are already ranked and that they are
neutral elements to broadcasting. Do not reshape scalars, do not consider them
for broadcasting, and materialize ranked operations on scalars accordingly.
PiperOrigin-RevId: 375968371
Rank specialization cases can be applied to all argument tensors of smaller
ranks than the expected maximum rank. This is crucial if all operands are
effectively scalars and the maximum reduced rank is 0.
PiperOrigin-RevId: 375712020
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49454
The new interface is more safe to be used during dialect conversion
(e.g. converting from tensor world to buffer world).
Copybara import of the project:
--
a6968072d59bec3c3bbaef0121d297e807c37c91 by Wenyi Zhao <reyizero@gmail.com>:
[MLIR][DISC] Upgrade to use the new `reifyReturnTypeShapes` interface.
The new interface is more safe to be used during dialect conversion
(e.g. converting from tensor world to buffer world).
--
55e7c6b7f2f99b99e226645a57e2433fae3e90ed by Wenyi Zhao <reyizero@gmail.com>:
minor fix
PiperOrigin-RevId: 375500273
This only works for updating tensors, not add/min/max computations. It requires
the index depth to be 1 because of the limitation in Linalg. We can not compare
multiple indices without packing indices.
PiperOrigin-RevId: 375137721
For rank specialization clusters that have only two operands, we can materialize
two extra cases in which either of them is a scalar. This avoids redundant index
computations in these cases.
PiperOrigin-RevId: 375037390
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49228
We are porting our MLIR-based dynamic shape compiler to tf community (From OP def, Patttern, to Optimization pass, etc).
This is the first PR, which including some dynamic shape OPs def in mhlo and lmhlo dialect.
For mhlo dialect, we add:
- HLO_RealDynamicSliceOp
- HLO_DynamicPadOp
- HLO_DynamicGatherOp
- HLO_DynamicConvOp
For lmhlo dialect, we add:
- LHLO_RealDynamicSliceOp
- LHLO_DynamicBroadcastInDimOp
- LHLO_DynamicGatherOp
- LHLO_DynamicPadOp
- LHLO_DynamicBitcastOp
- LHLO_DynamicConvOp
- LHLO_DynamicIotaOp
- LHLO_DynamicReshapeOp
- LHLO_DotGeneralOp
- LHLO_BitcastOp
Rest Ops to add:
* We will send a separate PR containing LHLO_DynamicWhileOp and LHLO_DynamicCaseOp for control flow.
* We will add a separate dedicated dialect like mhlo_ral, which including D2HOp/H2DOp/DebugPrintOp/TopKOp, etc.
Previous discussions:[RFC](https://groups.google.com/a/tensorflow.org/g/mlir/c/_X48poNcbDI/m/jCC8BWIICQAJ), [discussion_1](https://llvm.discourse.group/t/updates-on-mlir-based-dynamic-shape-compiler/2384), [Recording of meeting](https://drive.google.com/file/d/1_uEISlV5MUWdG9faKAdKlCWnPtGjRC-D/view?usp=sharing).
Copybara import of the project:
--
e22d9e61106e00a1a1c6f368cc4a03e3bd1f414c by azazhu <azazhu@gmail.com>:
[DISC]fea: porting mhlo and lmhlo OPs
--
9ec3e76290da07cbd53d7da5fa86ff67179441a1 by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1. add summary and description for dynamic OPs in mhlo and lmhlo; 2. rm InferOutputTypes; 3. add verify for RealDynamicSliceOp and DynamicPadOp
--
0d68cd135555fd935991c12456b21329e628f23f by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1.remove D2H,H2D and DebugPrint Ops from mhlo/lmhlo dialect; 2. add type constraint to DynamicPadOp and RealDynamicSliceOp; 3.refine lmhlo type constraint; 4.rename RealDynamicSliceOp as name conflict.
--
698762a77d60f6a844cb1ab3f32740d4ef3c5843 by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1. replace dyn_cast to cast 2. refine code
PiperOrigin-RevId: 375022260
* The op defines this to be index, any integer, or pred (i1).
* Many TensorFlow legalizations produce integers for the shape.
PiperOrigin-RevId: 374566113
* The former is typically invariant regardless of backend.
* The latter may need to be done differently depending on capabilities of the lowering target.
PiperOrigin-RevId: 374492924