This only works for updating tensors, not add/min/max computations. It requires
the index depth to be 1 because of the limitation in Linalg. We can not compare
multiple indices without packing indices.
PiperOrigin-RevId: 375137721
For rank specialization clusters that have only two operands, we can materialize
two extra cases in which either of them is a scalar. This avoids redundant index
computations in these cases.
PiperOrigin-RevId: 375037390
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49228
We are porting our MLIR-based dynamic shape compiler to tf community (From OP def, Patttern, to Optimization pass, etc).
This is the first PR, which including some dynamic shape OPs def in mhlo and lmhlo dialect.
For mhlo dialect, we add:
- HLO_RealDynamicSliceOp
- HLO_DynamicPadOp
- HLO_DynamicGatherOp
- HLO_DynamicConvOp
For lmhlo dialect, we add:
- LHLO_RealDynamicSliceOp
- LHLO_DynamicBroadcastInDimOp
- LHLO_DynamicGatherOp
- LHLO_DynamicPadOp
- LHLO_DynamicBitcastOp
- LHLO_DynamicConvOp
- LHLO_DynamicIotaOp
- LHLO_DynamicReshapeOp
- LHLO_DotGeneralOp
- LHLO_BitcastOp
Rest Ops to add:
* We will send a separate PR containing LHLO_DynamicWhileOp and LHLO_DynamicCaseOp for control flow.
* We will add a separate dedicated dialect like mhlo_ral, which including D2HOp/H2DOp/DebugPrintOp/TopKOp, etc.
Previous discussions:[RFC](https://groups.google.com/a/tensorflow.org/g/mlir/c/_X48poNcbDI/m/jCC8BWIICQAJ), [discussion_1](https://llvm.discourse.group/t/updates-on-mlir-based-dynamic-shape-compiler/2384), [Recording of meeting](https://drive.google.com/file/d/1_uEISlV5MUWdG9faKAdKlCWnPtGjRC-D/view?usp=sharing).
Copybara import of the project:
--
e22d9e61106e00a1a1c6f368cc4a03e3bd1f414c by azazhu <azazhu@gmail.com>:
[DISC]fea: porting mhlo and lmhlo OPs
--
9ec3e76290da07cbd53d7da5fa86ff67179441a1 by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1. add summary and description for dynamic OPs in mhlo and lmhlo; 2. rm InferOutputTypes; 3. add verify for RealDynamicSliceOp and DynamicPadOp
--
0d68cd135555fd935991c12456b21329e628f23f by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1.remove D2H,H2D and DebugPrint Ops from mhlo/lmhlo dialect; 2. add type constraint to DynamicPadOp and RealDynamicSliceOp; 3.refine lmhlo type constraint; 4.rename RealDynamicSliceOp as name conflict.
--
698762a77d60f6a844cb1ab3f32740d4ef3c5843 by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1. replace dyn_cast to cast 2. refine code
PiperOrigin-RevId: 375022260
* The op defines this to be index, any integer, or pred (i1).
* Many TensorFlow legalizations produce integers for the shape.
PiperOrigin-RevId: 374566113
* The former is typically invariant regardless of backend.
* The latter may need to be done differently depending on capabilities of the lowering target.
PiperOrigin-RevId: 374492924
Add lowering pattern for rank specialization clusters with more than one
non-scalar operand. The lowering resembles that of the `TransformUnrankedHlo`
pass and switches cases for maximal ranks from 1 through 8.
PiperOrigin-RevId: 374377002
The pattern can be generalized to also rank specialize operations with a single
non-scalar operand. Also extract helper functions that can be reused in
following specializations.
PiperOrigin-RevId: 374198381
Also cluster operations that operate on same shape operands. These implicitly
satisfy the broadcasting semantics requirement. Also, add test cases for some
cases that appear in the current MLIR-generated kernels.
PiperOrigin-RevId: 374191950
The ReduceRegion* patterns are matching on the same ops as the PointwiseToLinalg*
patterns and on certain toolchains (MSVC) the order can be wrong. If the pointwise
runs first then it converts the op *within* the reduction before the reduction one
runs, leading to nested linalg op weirdness.
PiperOrigin-RevId: 373848269
Rename `gentbl` to `gentbl_cc_library` to make it clearer which kind of rule is
ultimately used.
Update `gentbl_*` macros to take `tbl_outs` options as a list rather a
whitespace-separated string and remove the related string handling.
PiperOrigin-RevId: 373406352
Add a pass to cluster unranked C/HLO operations in one
`chlo.rank_specialization_cluster` op. The C/HLO operations are moved to the
body of the operation. Later passes can use this to rank-specialize all these
operations together.
PiperOrigin-RevId: 373336725