Take advantage of the fact that scalars are already ranked and that they are
neutral elements to broadcasting. Do not reshape scalars, do not consider them
for broadcasting, and materialize ranked operations on scalars accordingly.
PiperOrigin-RevId: 375968371
Rank specialization cases can be applied to all argument tensors of smaller
ranks than the expected maximum rank. This is crucial if all operands are
effectively scalars and the maximum reduced rank is 0.
PiperOrigin-RevId: 375712020
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49454
The new interface is more safe to be used during dialect conversion
(e.g. converting from tensor world to buffer world).
Copybara import of the project:
--
a6968072d59bec3c3bbaef0121d297e807c37c91 by Wenyi Zhao <reyizero@gmail.com>:
[MLIR][DISC] Upgrade to use the new `reifyReturnTypeShapes` interface.
The new interface is more safe to be used during dialect conversion
(e.g. converting from tensor world to buffer world).
--
55e7c6b7f2f99b99e226645a57e2433fae3e90ed by Wenyi Zhao <reyizero@gmail.com>:
minor fix
PiperOrigin-RevId: 375500273
This only works for updating tensors, not add/min/max computations. It requires
the index depth to be 1 because of the limitation in Linalg. We can not compare
multiple indices without packing indices.
PiperOrigin-RevId: 375137721
For rank specialization clusters that have only two operands, we can materialize
two extra cases in which either of them is a scalar. This avoids redundant index
computations in these cases.
PiperOrigin-RevId: 375037390
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49228
We are porting our MLIR-based dynamic shape compiler to tf community (From OP def, Patttern, to Optimization pass, etc).
This is the first PR, which including some dynamic shape OPs def in mhlo and lmhlo dialect.
For mhlo dialect, we add:
- HLO_RealDynamicSliceOp
- HLO_DynamicPadOp
- HLO_DynamicGatherOp
- HLO_DynamicConvOp
For lmhlo dialect, we add:
- LHLO_RealDynamicSliceOp
- LHLO_DynamicBroadcastInDimOp
- LHLO_DynamicGatherOp
- LHLO_DynamicPadOp
- LHLO_DynamicBitcastOp
- LHLO_DynamicConvOp
- LHLO_DynamicIotaOp
- LHLO_DynamicReshapeOp
- LHLO_DotGeneralOp
- LHLO_BitcastOp
Rest Ops to add:
* We will send a separate PR containing LHLO_DynamicWhileOp and LHLO_DynamicCaseOp for control flow.
* We will add a separate dedicated dialect like mhlo_ral, which including D2HOp/H2DOp/DebugPrintOp/TopKOp, etc.
Previous discussions:[RFC](https://groups.google.com/a/tensorflow.org/g/mlir/c/_X48poNcbDI/m/jCC8BWIICQAJ), [discussion_1](https://llvm.discourse.group/t/updates-on-mlir-based-dynamic-shape-compiler/2384), [Recording of meeting](https://drive.google.com/file/d/1_uEISlV5MUWdG9faKAdKlCWnPtGjRC-D/view?usp=sharing).
Copybara import of the project:
--
e22d9e61106e00a1a1c6f368cc4a03e3bd1f414c by azazhu <azazhu@gmail.com>:
[DISC]fea: porting mhlo and lmhlo OPs
--
9ec3e76290da07cbd53d7da5fa86ff67179441a1 by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1. add summary and description for dynamic OPs in mhlo and lmhlo; 2. rm InferOutputTypes; 3. add verify for RealDynamicSliceOp and DynamicPadOp
--
0d68cd135555fd935991c12456b21329e628f23f by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1.remove D2H,H2D and DebugPrint Ops from mhlo/lmhlo dialect; 2. add type constraint to DynamicPadOp and RealDynamicSliceOp; 3.refine lmhlo type constraint; 4.rename RealDynamicSliceOp as name conflict.
--
698762a77d60f6a844cb1ab3f32740d4ef3c5843 by azazhu <azazhu@gmail.com>:
[DISC][MLIR] 1. replace dyn_cast to cast 2. refine code
PiperOrigin-RevId: 375022260
* The op defines this to be index, any integer, or pred (i1).
* Many TensorFlow legalizations produce integers for the shape.
PiperOrigin-RevId: 374566113
* The former is typically invariant regardless of backend.
* The latter may need to be done differently depending on capabilities of the lowering target.
PiperOrigin-RevId: 374492924
Add lowering pattern for rank specialization clusters with more than one
non-scalar operand. The lowering resembles that of the `TransformUnrankedHlo`
pass and switches cases for maximal ranks from 1 through 8.
PiperOrigin-RevId: 374377002
The pattern can be generalized to also rank specialize operations with a single
non-scalar operand. Also extract helper functions that can be reused in
following specializations.
PiperOrigin-RevId: 374198381
Also cluster operations that operate on same shape operands. These implicitly
satisfy the broadcasting semantics requirement. Also, add test cases for some
cases that appear in the current MLIR-generated kernels.
PiperOrigin-RevId: 374191950
The ReduceRegion* patterns are matching on the same ops as the PointwiseToLinalg*
patterns and on certain toolchains (MSVC) the order can be wrong. If the pointwise
runs first then it converts the op *within* the reduction before the reduction one
runs, leading to nested linalg op weirdness.
PiperOrigin-RevId: 373848269
Add a pass to cluster unranked C/HLO operations in one
`chlo.rank_specialization_cluster` op. The C/HLO operations are moved to the
body of the operation. Later passes can use this to rank-specialize all these
operations together.
PiperOrigin-RevId: 373336725
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/48667
Added RegionBranchOpInterfaces to lmhlo operations that use regions.
This is needed, since the bufferization features in MLIR have to reason about the control flow within these operations.
Copybara import of the project:
--
572fd7d850a46630b812da84e9094280f89f259e by Julian Gross <julian.gross@dfki.de>:
Added RegionBranchOpInterfaces to lmhlo operations.
PiperOrigin-RevId: 372070825
This strips away the signedness with a type converter, using unrealized
conversion casts. The rest is mostly mechanically pushing the original op down
the pipeline so lowerings can see the original types.
Signed types stay signless for now. This can be changed in the HLO bridge later.
I did a pass over all ops and added unsigned lowerings where they were missing.
There may be more.
Currently the lowering will die at a later stage because it doesn't understand
the unrealized casts.
PiperOrigin-RevId: 371077494
This uses a indexed linalg.generic, which is rather awkward standalone but
allows fusing into the output of the concatenate and avoid to ever materialize
it in memory. I think this is the only way to get that with the current linalg
stack, fusion across a concatenate would require more infrastructure.
PiperOrigin-RevId: 369677652
Add a folder for maps whose body returns only one of the arguments. When this arises the fold replaces the map output with one of the operand tensors.
PiperOrigin-RevId: 369304322
Assuming ops can only be merged if their witnesses will dominate the merged
assuming op. This is not the case if the second op's witness is a result of the
first.
PiperOrigin-RevId: 369192868
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/47315
Lowering of `concatenateOp` is added from lmhlo to Affine. The lowering
has been added as a part of `lhlo-legalize-to-affine` pass.
Signed-off-by: Prashant Kumar <prashantk@polymagelabs.com>
Copybara import of the project:
--
15314e4579f7a6901cf3475eff25962a34772eaf by Prashant Kumar <prashantk@polymagelabs.com>:
[MLIR] Add concatenateOp lowering from lmhlo to Affine.
Lowering of `concatenateOp` is added from lmhlo to Affine. The lowering
has been added as a part of `lhlo-legalize-to-affine` pass.
Signed-off-by: Prashant Kumar <prashantk@polymagelabs.com>
PiperOrigin-RevId: 368465992
The pattern does not support ops with non-zero padding config. Add a check to
prevent unexpected lowering.
It is not easy to add tests because other patterns will convert body ops, and
it causes issues like invalid IRs.
PiperOrigin-RevId: 367202450
We now use the same special cases for all ops with arity >= 2.
For binary ops, we now have only one special case if at least one of the
operands has exactly one element. In that case, we reshape both operands to
rank 1. Before, we had separate special cases whether the left-hand side
or the right-hand side have a scalar shape.
PiperOrigin-RevId: 366005835
When an op is moved out of an assuming region we already know statically that it
is independent of the assuming region. Hence, there is no need to yield its
results.
PiperOrigin-RevId: 366001405
Add pattern to move operations out of assuming op. This only valid for
constraint-independent ops, like `cstr_broadcastable` and `shape_of`. It will
eventually allow to make assuming regions' constraints independent from each
other so that they can be merged.
PiperOrigin-RevId: 365993145
This matches the behavior of mhlo.case. Additionally, fix the verification of CaseOp in the case of nested ops with mhlo.return-containing regions.
PiperOrigin-RevId: 365936672
We can use it also for ternary ops like Select if we change the signature so
that a ValueRange is passed in.
Also remove special casing for HloComplexAdaptor. It can be handled with the
generic adaptor as well.
PiperOrigin-RevId: 365777493
We only need the memref_reinterpret_cast if we don't know whether a dimension
gets expanded or not. With static shapes we know that a dimension can only be
expanded if it's a static 1, so lower it in the same way we lower fully
static broadcasts.
PiperOrigin-RevId: 363859181
Make the error message a bit more verbose & it is cheaper to verify the elements rather than creating a (potentially) new type.
PiperOrigin-RevId: 363073909
This is the same as iota, but instead of taking the dimensions from the result
tensor we use the supplied shape extents tensor.
PiperOrigin-RevId: 362298548
This is an annoying edge case because the collapse->expand lowering expects at
least R1 or it will produce invalid linalg reshapes. Using the direct lowering
works fine.
PiperOrigin-RevId: 362269199
- Extract verification of source target pairs attached to collective permute into a common
helper function and use that to verify both MHLO and LMHLO variants.
- Change MlirGpuTestBase::ParseMlirModule to allow returning back a failure, and use
that to update the mlir_gpu_compile_test to check the new behavior.
PiperOrigin-RevId: 362156962
THe conversion from dot_general to dot fails when trying to retrieve
and use the precision config, since precision_config is optional.
PiperOrigin-RevId: 362095296
For now, the pass only reifies the required shape computations. Moving
broadcasts will follow to allow for fusion across them.
PiperOrigin-RevId: 362033715
Return nan at zeta poles or inf where the limit is defined. Also test the kernel
based on the series representation of zeta.
PiperOrigin-RevId: 361993482