For rank specialization clusters that have only two operands, we can materialize
two extra cases in which either of them is a scalar. This avoids redundant index
computations in these cases.
PiperOrigin-RevId: 375037390
* The op defines this to be index, any integer, or pred (i1).
* Many TensorFlow legalizations produce integers for the shape.
PiperOrigin-RevId: 374566113
* The former is typically invariant regardless of backend.
* The latter may need to be done differently depending on capabilities of the lowering target.
PiperOrigin-RevId: 374492924
Add lowering pattern for rank specialization clusters with more than one
non-scalar operand. The lowering resembles that of the `TransformUnrankedHlo`
pass and switches cases for maximal ranks from 1 through 8.
PiperOrigin-RevId: 374377002
The pattern can be generalized to also rank specialize operations with a single
non-scalar operand. Also extract helper functions that can be reused in
following specializations.
PiperOrigin-RevId: 374198381
Also cluster operations that operate on same shape operands. These implicitly
satisfy the broadcasting semantics requirement. Also, add test cases for some
cases that appear in the current MLIR-generated kernels.
PiperOrigin-RevId: 374191950
Add a pass to cluster unranked C/HLO operations in one
`chlo.rank_specialization_cluster` op. The C/HLO operations are moved to the
body of the operation. Later passes can use this to rank-specialize all these
operations together.
PiperOrigin-RevId: 373336725
This strips away the signedness with a type converter, using unrealized
conversion casts. The rest is mostly mechanically pushing the original op down
the pipeline so lowerings can see the original types.
Signed types stay signless for now. This can be changed in the HLO bridge later.
I did a pass over all ops and added unsigned lowerings where they were missing.
There may be more.
Currently the lowering will die at a later stage because it doesn't understand
the unrealized casts.
PiperOrigin-RevId: 371077494
This uses a indexed linalg.generic, which is rather awkward standalone but
allows fusing into the output of the concatenate and avoid to ever materialize
it in memory. I think this is the only way to get that with the current linalg
stack, fusion across a concatenate would require more infrastructure.
PiperOrigin-RevId: 369677652
Add a folder for maps whose body returns only one of the arguments. When this arises the fold replaces the map output with one of the operand tensors.
PiperOrigin-RevId: 369304322
Assuming ops can only be merged if their witnesses will dominate the merged
assuming op. This is not the case if the second op's witness is a result of the
first.
PiperOrigin-RevId: 369192868
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/47315
Lowering of `concatenateOp` is added from lmhlo to Affine. The lowering
has been added as a part of `lhlo-legalize-to-affine` pass.
Signed-off-by: Prashant Kumar <prashantk@polymagelabs.com>
Copybara import of the project:
--
15314e4579f7a6901cf3475eff25962a34772eaf by Prashant Kumar <prashantk@polymagelabs.com>:
[MLIR] Add concatenateOp lowering from lmhlo to Affine.
Lowering of `concatenateOp` is added from lmhlo to Affine. The lowering
has been added as a part of `lhlo-legalize-to-affine` pass.
Signed-off-by: Prashant Kumar <prashantk@polymagelabs.com>
PiperOrigin-RevId: 368465992