Commit Graph

1041 Commits

Author SHA1 Message Date
Geoffrey Martin-Noble f9f7a63870 Add missing dep on RAL pass generation
Without this I see errors about being unable to find the generated header in our project's build.

PiperOrigin-RevId: 379377718
2021-06-14 17:02:26 -07:00
A. Unique TensorFlower 6f5a440031 Integrate LLVM at llvm/llvm-project@56ae4f23b2
Updates LLVM usage to match
[56ae4f23b227](https://github.com/llvm/llvm-project/commit/56ae4f23b227)

PiperOrigin-RevId: 379348813
2021-06-14 14:17:52 -07:00
Wenyi Zhao 7f94bd923b PR #50236: [MLIR][DISC] Bufferize TransposeOp and ConcatenateOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50236

support hlo-to-lhlo conversion for TransposeOp and ConcatenateOp
Copybara import of the project:

--
62860e717f2a14fbd3ddfb634aa6ff132d245a72 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize TransposeOp and ConcatenateOp

--
ce2ff57c1edee1172cd2f36346cc0b34ec1c7467 by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 379330954
2021-06-14 12:37:45 -07:00
Wenyi Zhao 23ebbb28d1 PR #50191: [MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50191

DISC is a e2e flow, including both compiler side and runtime side. For
runtime side, we have different targeting environments (e.g. tensorflow,
pytorch, or sometimes even a standalone binary). In order to simplify
the design of the compiler side, we design a Runtime Abstraction Layer
(RAL) to sperate the compiler side and runtime side. Thus the compiler
side only need to target RAL itself and it is the responsibility of RAL
to handle the differences between different targeting environments.

One of the most important functions of RAL is to manage stateful
resources. To this end, it provides a context object, and hides all
stateful operations behind this context, thus the compiler side itself
doesn't need to care about the resource initialization. For example, a
kernel must be loaded before it can be launched on GPU. However, the
loading operation should only be taken once during the whole lifetime of
the context in order to achieve the best performance. Based on the
initialization-free interfaces provided by RAL, compiler side can focus
on its core optimization logic and lets the RAL to manage the resource
status.

The context mentioned above is passed as a parameter to the entry
function and all RAL APIs should always use the context as their first
argument. This CR also provides a pass to help to ensure this property.
The pass rewrites the entry function to make sure their first argument
is the context. For entry function, the pass also rewrites its inputs
and outputs. To be concrete, all the original inputs and outputs of the
entry function are received from and sent to RAL through a sequence of
RAL API calls correspondingly. The motivation behind this is to hide the
implementation details of I/Os. This design may also potentially enable
partial execution of the compiled module when some of the inputs are
ready.
Copybara import of the project:

--
c4f20a89aed71181e75bcc5265723b88bde23240 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect

DISC is a e2e flow, including both compiler side and runtime side. For
runtime side, we have different targeting environments (e.g. tensorflow,
pytorch, or sometimes even a standalone binary). In order to simplify
the design of the compiler side, we design a Runtime Abstraction Layer
(RAL) to sperate the compiler side and runtime side. Thus the compiler
side only need to target RAL itself and it is the responsibility of RAL
to handle the differences between different targeting environments.

One of the most important functions of RAL is to manage stateful
resources. To this end, it provides a context object, and hides all
stateful operations behind this context, thus the compiler side itself
doesn't need to care about the resource initialization. For example, a
kernel must be loaded before it can be launched on GPU. However, the
loading operation should only be taken once during the whole lifetime of
the context in order to achieve the best performance. Based on the
initialization-free interfaces provided by RAL, compiler side can focus
on its core optimization logic and lets the RAL to manage the resource
status.

The context mentioned above is passed as a parameter to the entry
function and all RAL APIs should always use the context as their first
argument. This CR also provides a pass to help to ensure this property.
The pass rewrites the entry function to make sure their first argument
is the context. For entry function, the pass also rewrites its inputs
and outputs. To be concrete, all the original inputs and outputs of the
entry function are received from and sent to RAL through a sequence of
RAL API calls correspondingly. The motivation behind this is to hide the
implementation details of I/Os. This design may also potentially enable
partial execution of the compiled module when some of the inputs are
ready.

--
1991d4f80ab6087943956e1c0fec4940a22ab08d by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 379317586
2021-06-14 11:27:43 -07:00
Rahul Joshi a6011d0279 [HLO] Add AllReduceScatter to MHLO and LMHLO dialects.
PiperOrigin-RevId: 379296198
2021-06-14 09:37:07 -07:00
A. Unique TensorFlower dbfa4b1537 Integrate LLVM at llvm/llvm-project@b90f9bea96
Updates LLVM usage to match
[b90f9bea9673](https://github.com/llvm/llvm-project/commit/b90f9bea9673)

PiperOrigin-RevId: 379251091
2021-06-14 04:15:09 -07:00
A. Unique TensorFlower 07c92f0ad8 Integrate LLVM at llvm/llvm-project@e0b469ffa1
Updates LLVM usage to match
[e0b469ffa142](https://github.com/llvm/llvm-project/commit/e0b469ffa142)

PiperOrigin-RevId: 378992873
2021-06-11 19:36:05 -07:00
A. Unique TensorFlower 89faaa6575 Integrate LLVM at llvm/llvm-project@82a3b606b0
Updates LLVM usage to match
[82a3b606b01d](https://github.com/llvm/llvm-project/commit/82a3b606b01d)

PiperOrigin-RevId: 378974967
2021-06-11 16:50:40 -07:00
Wenyi Zhao 8388303fd2 PR #50211: [MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50211

support hlo-to-lhlo conversion for RealDynamicSliceOp and ReduceOp
Copybara import of the project:

--
c417b336670a1fc256f7026dfe8080e46d13d79a by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp

PiperOrigin-RevId: 378972113
2021-06-11 16:33:15 -07:00
Jacques Pienaar 95ba03534f Allow variadic operands/result in MHLO while
This just adds support for it in the op, but keeps the production/uses as is (e.g., single tensor or tuple) matching what XLA export requires. In follow up here, would be to add pass for export to retuple and then the canonical form could be changed. Tuple'ing given control flow via regions & multi-result operations does not add representational power and all the get_tuple_element ops obscure the computation.

The old form allowed single tensor or tuple. The new variadic number of tensor or tuples as tuples may be nested, so the input could have (Tensor<..>, Tuple<Tensor<...>, Tuple<...>, ...>, Tensor<...>) and HLO_Tensor doesn't allow Tuples.

PiperOrigin-RevId: 378934388
2021-06-11 13:08:28 -07:00
A. Unique TensorFlower 33f95eecc7 Integrate LLVM at llvm/llvm-project@f3f904563e
Updates LLVM usage to match
[f3f904563ec9](https://github.com/llvm/llvm-project/commit/f3f904563ec9)

PiperOrigin-RevId: 378880044
2021-06-11 08:47:40 -07:00
A. Unique TensorFlower bd5752f0bf [MLIR][HLO] Find shape equivalences and use them for better rank specialization
Find shape equivalence classes among the operands and use them for better rank
specialization. If all operands are known to be of the same shape, we can
flatten them to rank one. If there are two shape equivalence classes, we can
generalize the scalar rank specialization cases.

PiperOrigin-RevId: 378844575
2021-06-11 04:00:26 -07:00
A. Unique TensorFlower 5cca8a14e3 Integrate LLVM at llvm/llvm-project@4f6ec382c8
Updates LLVM usage to match
[4f6ec382c8b7](https://github.com/llvm/llvm-project/commit/4f6ec382c8b7)

PiperOrigin-RevId: 378808874
2021-06-10 22:53:49 -07:00
A. Unique TensorFlower ad7bc780b9 Integrate LLVM at llvm/llvm-project@ff81a2c95d
Updates LLVM usage to match
[ff81a2c95ddb](https://github.com/llvm/llvm-project/commit/ff81a2c95ddb)

PiperOrigin-RevId: 378745601
2021-06-10 15:19:43 -07:00
Wenyi Zhao 6660234d80 PR #50100: [MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50100

support hlo-to-lhlo conversion for DynamicIotaOp and DynamicPadOp
Copybara import of the project:

--
c3aae94954e35d3f8ad265f619ef9765665a5115 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp

--
adc6996d70b804d61310d56a33fac975d70c8636 by Wenyi Zhao <reyizero@gmail.com>:

minor

PiperOrigin-RevId: 378733284
2021-06-10 14:20:45 -07:00
A. Unique TensorFlower 642ca86a3f Integrate LLVM at llvm/llvm-project@ad6a84f82c
Updates LLVM usage to match
[ad6a84f82c45](https://github.com/llvm/llvm-project/commit/ad6a84f82c45)

PiperOrigin-RevId: 378690997
2021-06-10 11:07:48 -07:00
A. Unique TensorFlower 14093b7906 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 378681070
2021-06-10 10:27:22 -07:00
Chris Jones 968226b9d7 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 378640706
2021-06-10 06:54:42 -07:00
Adrian Kuegel 6088eb697c Fix Cosh approximation for F16.
We should upcast F16 to F32 to prevent precision loss.
E.g. cosh(-9) would evaluate to 4042 previously instead of 4052.
This allows to enable the MLIR generated kernel for F16 type.
Also move template instantiation for Sinh to inside the #ifdef block.
This was missed in a previous commit.

PiperOrigin-RevId: 378635042
2021-06-10 06:16:44 -07:00
A. Unique TensorFlower 837a1de7c5 Integrate LLVM at llvm/llvm-project@e11b5b87be
Updates LLVM usage to match
[e11b5b87bebf](https://github.com/llvm/llvm-project/commit/e11b5b87bebf)

PiperOrigin-RevId: 378589304
2021-06-10 00:18:25 -07:00
A. Unique TensorFlower 9f67417b41 [MLIR][HLO] Avoid duplicate cluster operands when merging
When merging rank specialization clusters, avoid duplicating operands. A fewer
number of operands usually allows better rank specialization.

PiperOrigin-RevId: 378445946
2021-06-09 10:54:55 -07:00
A. Unique TensorFlower b580722041 [MLIR][KernelGen] Merge rank specialization clusters
Merge adjacent rank specialization clusters. Combine their operands, bodies, and
results.

PiperOrigin-RevId: 378433222
2021-06-09 10:07:47 -07:00
Adrian Kuegel b6d8160611 Add Broadcasting and BroadcastingElementwise traits to ConstantLikeOp.
This allows to include such ops in rank specialization clusters.

PiperOrigin-RevId: 378380915
2021-06-09 05:09:26 -07:00
A. Unique TensorFlower b9e45007d5 [MLIR][HLO] Extend broadcast propagation pass to enable more fusion
Move element-wise operations into assuming regions. This enables fusion
opportunities within the region.

PiperOrigin-RevId: 378362725
2021-06-09 03:03:37 -07:00
A. Unique TensorFlower d828b457b3 Handle empty tensors in SimplifyConcatSlice.
If the result of the slice is an empty tensor, do nothing.
This fixes a crash: we can't create a `concat` with an
empty operand range.

PiperOrigin-RevId: 378354956
2021-06-09 02:15:47 -07:00
Mehdi Amini 1770ed455f Remove unnecessary duplicated source from "lhlo" (NFC)
PiperOrigin-RevId: 378291564
2021-06-08 18:07:26 -07:00
A. Unique TensorFlower 4134923d4f Integrate LLVM at llvm/llvm-project@f96b5e801d
Updates LLVM usage to match
[f96b5e801d67](https://github.com/llvm/llvm-project/commit/f96b5e801d67)

PiperOrigin-RevId: 378139137
2021-06-08 06:24:26 -07:00
Adrian Kuegel 9a8c254526 Support complex types for Sinh.
Because mhlo::ConstantLike doesn't support complex types, we need to use
GetScalarOfType and broadcast it to the needed shape.
Disable the tf2xla fallback, now that MLIR fully supports Sinh.

PiperOrigin-RevId: 378123151
2021-06-08 04:23:19 -07:00
A. Unique TensorFlower c47869f931 [MLIR][HLO] Rename `move-up-dynamic-broadcasts-for-fusion` to `broadcast-propagation`
PiperOrigin-RevId: 378102608
2021-06-08 01:51:10 -07:00
A. Unique TensorFlower b2839c735b Integrate LLVM at llvm/llvm-project@8344e215ec
Updates LLVM usage to match
[8344e215ec6c](https://github.com/llvm/llvm-project/commit/8344e215ec6c)

PiperOrigin-RevId: 378043689
2021-06-07 17:32:52 -07:00
A. Unique TensorFlower c11de49300 Integrate LLVM at llvm/llvm-project@7ed7d4ccb8
Updates LLVM usage to match
[7ed7d4ccb899](https://github.com/llvm/llvm-project/commit/7ed7d4ccb899)

PiperOrigin-RevId: 377972571
2021-06-07 12:05:09 -07:00
Benjamin Kramer d1c60df2fe [MHLO:linalg] Be more aggressive about turning mhlo.const into std.constant
On tensors the only difference between these ops is that mhlo.const supports unsigned types.

PiperOrigin-RevId: 377970948
2021-06-07 11:58:23 -07:00
Hanhan Wang 25b93c8d66 Add support for lowering mhlo.iota/dynamic_iota to Linalg on unsigned types.
PiperOrigin-RevId: 377956338
2021-06-07 10:59:33 -07:00
Adrian Kuegel 5315997402 Fix Sinh approximation for F16.
We should upcast F16 to F32 to prevent precision loss.
E.g. sinh(-9) would evaluate to -4042 previously instead of -4052.
This allows to enable the MLIR generated kernel for F16 type.

PiperOrigin-RevId: 377901896
2021-06-07 06:38:42 -07:00
Tobias Gysi fc723380e6 Update lhlo to use the new structured op interface.
Replace deprecated methods in lhlo_fuse_linalg.cc. The new structured op interface has been introduced in https://reviews.llvm.org/D103394.

PiperOrigin-RevId: 377875452
2021-06-07 03:11:03 -07:00
Wenyi Zhao ade873a5e0 PR #49970: [MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49970

1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim

2, add a flag `convert-to-lmhlo-only` to seperate following two case:
   - hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo
     counterparts, do not apply any optimization (e.g. elide any
     buffer copy). Buffer optimization is not easy in dynamic
     shape world especially when involving control flow, thus we
     leave this to another dedicated pass.

   - hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo
     ops (e.g. reshape) to memref dialect directly and Lowers others
     to their lmhlo counterparts.
Copybara import of the project:

--
562bd65a368f6194405c4ae6900e3b4388a5ec03 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim

1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim

2, add a flag `convert-to-lmhlo-only` to seperate following two case:
   - hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo
     counterparts, do not apply any optimization (e.g. elide any
     buffer copy). Buffer optimization is not easy in dynamic
     shape world especially when involving control flow, thus we
     leave this to another dedicated pass.

   - hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo
     ops (e.g. reshape) to memref dialect directly and Lowers others
     to their lmhlo counterparts.

PiperOrigin-RevId: 377603395
2021-06-04 15:36:03 -07:00
A. Unique TensorFlower 8b3a75ea25 Integrate LLVM at llvm/llvm-project@b109172d99
Updates LLVM usage to match
[b109172d993e](https://github.com/llvm/llvm-project/commit/b109172d993e)

PiperOrigin-RevId: 377549160
2021-06-04 11:11:34 -07:00
A. Unique TensorFlower 9c895af2f1 Integrate LLVM at llvm/llvm-project@23a116c8c4
Updates LLVM usage to match
[23a116c8c446](https://github.com/llvm/llvm-project/commit/23a116c8c446)

PiperOrigin-RevId: 377501435
2021-06-04 06:46:18 -07:00
A. Unique TensorFlower f1f4c903df Integrate LLVM at llvm/llvm-project@fcf8827a98
Updates LLVM usage to match
[fcf8827a98be](https://github.com/llvm/llvm-project/commit/fcf8827a98be)

PiperOrigin-RevId: 377485560
2021-06-04 04:31:28 -07:00
A. Unique TensorFlower db05388a3c Integrate LLVM at llvm/llvm-project@da3ed58b97
Updates LLVM usage to match
[da3ed58b97c1](https://github.com/llvm/llvm-project/commit/da3ed58b97c1)

PiperOrigin-RevId: 377432380
2021-06-03 20:45:18 -07:00
A. Unique TensorFlower aba16adfa5 Add `mhlo.all_gather` op to MHLO dialect.
Adds import/export/verifier support as well.
Also makes `channel_handle` uniform across mhlo.all_reduce and mhlo.all-gather.

PiperOrigin-RevId: 377323468
2021-06-03 10:45:29 -07:00
Jacques Pienaar 4fc2e87a42 Add mhlo python binding generator target
This just invokes the generator backend & creates a filegroup.

PiperOrigin-RevId: 377318653
2021-06-03 10:26:30 -07:00
A. Unique TensorFlower fe42a08fc9 Use channel_handle for ChannelHandles in MHLO ops. This makes the naming of these properties consistent across these ops.
PiperOrigin-RevId: 377309518
2021-06-03 09:49:47 -07:00
A. Unique TensorFlower 063086dd78 Integrate LLVM at llvm/llvm-project@c89dff5855
Updates LLVM usage to match
[c89dff5855bb](https://github.com/llvm/llvm-project/commit/c89dff5855bb)

PiperOrigin-RevId: 377199586
2021-06-02 19:36:24 -07:00
A. Unique TensorFlower 4620410f18 Integrate LLVM at llvm/llvm-project@b25546a4b4
Updates LLVM usage to match
[b25546a4b406](https://github.com/llvm/llvm-project/commit/b25546a4b406)

PiperOrigin-RevId: 377077163
2021-06-02 09:32:59 -07:00
A. Unique TensorFlower 75a1c450ea [MLIR][KernelGen] Fix Windows build failure
Fix usage of default constructor. Instead, always use the parameterized
constructor and make the maximum supported rank explicit.

PiperOrigin-RevId: 377037155
2021-06-02 05:34:44 -07:00
A. Unique TensorFlower 557e56362e [MLIR][KernelGen] Simplify rank specialization tests with smaller target rank
For the tests rank specialize only up to rank 3. The remaining cases for higher
ranks are analogous.

PiperOrigin-RevId: 377024370
2021-06-02 03:48:07 -07:00
wyzhao 968d4b8709 PR #49598: [MLIR][DISC] legalize tensor_load inserted during hlo-to-lhlo conversion
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49598

This PR implements logic for lowering memref.tensor_load ops that are
inserted during `mhlo-legalize-to-lmhlo`
Copybara import of the project:

--
80eb377af4e02182e1aecc943a41ca5d7d1c2100 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] legalize tensor_load inserted during hlo-to-lhlo conversion

This PR implements logic for lowering memref.tensor_load ops that are
inserted during `mhlo-legalize-to-lmhlo`.

--
ac452fe3dcd591211cd5c59be9189fe2f7153b41 by Wenyi Zhao <reyizero@gmail.com>:

minor fix

--
6b36017f8632a06adbc3e05a62975fa641d0260f by Wenyi Zhao <reyizero@gmail.com>:

minor refine

--
846005cc76d0033112e47825c2e9a97790b6925f by Wenyi Zhao <reyizero@gmail.com>:

minor fix

--
f6a4becaa287d5ca323b2d152a4d0ae053730fd9 by Wenyi Zhao <reyizero@gmail.com>:

fix

--
5555749f60f7fce8f57962860ef65efccf0362ba by Wenyi Zhao <reyizero@gmail.com>:

fix

--
8873b9b6d9315c1199ca9f7c133ecf377ecd2fa6 by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 376942547
2021-06-01 16:27:56 -07:00
A. Unique TensorFlower 5baf6e7709 Integrate LLVM at llvm/llvm-project@97d234935f
Updates LLVM usage to match
[97d234935f15](https://github.com/llvm/llvm-project/commit/97d234935f15)

PiperOrigin-RevId: 376840022
2021-06-01 08:38:32 -07:00
A. Unique TensorFlower d1828625ab [MLIR][KernelGen] Make maximum supported rank in rank specialization configurable
The maximum supported target rank of 5 is sufficient for all operations but
`select`. Make the maximum target rank configurable in the rank specialization.
This reduces the number of generated kernels for operations that don't require
it.

PiperOrigin-RevId: 376822496
2021-06-01 06:54:31 -07:00