Merge `mhlo.select` into rank specialization clusters. Infer shape equalities
correctly from `mhlo.select` (and also from `mhlo.clamp`). This allows to lower
the relu kernel completely flattened.
PiperOrigin-RevId: 379925793
The operations allow for a limited form of broadcasting which allows some
operands to be scalars. As such they are neither strictly `Elementwise`, nor
`Broadcasting`. They do fulfill the requirements for `BroadcastingElementwise`
though.
PiperOrigin-RevId: 379719961
Find shape equivalence classes among the operands and use them for better rank
specialization. If all operands are known to be of the same shape, we can
flatten them to rank one. If there are two shape equivalence classes, we can
generalize the scalar rank specialization cases.
PiperOrigin-RevId: 378844575
When merging rank specialization clusters, avoid duplicating operands. A fewer
number of operands usually allows better rank specialization.
PiperOrigin-RevId: 378445946
Fix usage of default constructor. Instead, always use the parameterized
constructor and make the maximum supported rank explicit.
PiperOrigin-RevId: 377037155
The maximum supported target rank of 5 is sufficient for all operations but
`select`. Make the maximum target rank configurable in the rank specialization.
This reduces the number of generated kernels for operations that don't require
it.
PiperOrigin-RevId: 376822496
Add the first MLIR-generated kernel that relies on an in-TF lowering. Fusion for
this kernel relies on the generalized rank specialization for operation groups.
PiperOrigin-RevId: 376805435
Take advantage of the fact that scalars are already ranked and that they are
neutral elements to broadcasting. Do not reshape scalars, do not consider them
for broadcasting, and materialize ranked operations on scalars accordingly.
PiperOrigin-RevId: 375968371
Rank specialization cases can be applied to all argument tensors of smaller
ranks than the expected maximum rank. This is crucial if all operands are
effectively scalars and the maximum reduced rank is 0.
PiperOrigin-RevId: 375712020
For rank specialization clusters that have only two operands, we can materialize
two extra cases in which either of them is a scalar. This avoids redundant index
computations in these cases.
PiperOrigin-RevId: 375037390
Add lowering pattern for rank specialization clusters with more than one
non-scalar operand. The lowering resembles that of the `TransformUnrankedHlo`
pass and switches cases for maximal ranks from 1 through 8.
PiperOrigin-RevId: 374377002
The pattern can be generalized to also rank specialize operations with a single
non-scalar operand. Also extract helper functions that can be reused in
following specializations.
PiperOrigin-RevId: 374198381
Also cluster operations that operate on same shape operands. These implicitly
satisfy the broadcasting semantics requirement. Also, add test cases for some
cases that appear in the current MLIR-generated kernels.
PiperOrigin-RevId: 374191950
Add a pass to cluster unranked C/HLO operations in one
`chlo.rank_specialization_cluster` op. The C/HLO operations are moved to the
body of the operation. Later passes can use this to rank-specialize all these
operations together.
PiperOrigin-RevId: 373336725