Commit Graph

1070 Commits

Author SHA1 Message Date
colin.liang e62fa9311f add tensor from host to jit runner,and get output 2021-07-13 20:11:53 +08:00
colin.liang 62e7b883c7 add cpu runner in llvm 2021-07-12 21:03:29 +08:00
A. Unique TensorFlower 20ff8b4c93 Integrate LLVM at llvm/llvm-project@b3634d3e88
Updates LLVM usage to match
[b3634d3e88b7](https://github.com/llvm/llvm-project/commit/b3634d3e88b7)

PiperOrigin-RevId: 380699306
2021-06-21 17:22:48 -07:00
A. Unique TensorFlower 8ff5f9d94a Integrate LLVM at llvm/llvm-project@186f2ac612
Updates LLVM usage to match
[186f2ac612ad](https://github.com/llvm/llvm-project/commit/186f2ac612ad)

PiperOrigin-RevId: 380647265
2021-06-21 13:08:34 -07:00
Benjamin Kramer 03d2cb606d [mhlo] Make sure reifyResultTypes returns a tensor of index
Dynamic broadcast/reshape/iota take i32/i64 shape inputs, but users of
reification expect index shapes. Insert an appropriate cast if necessary.

PiperOrigin-RevId: 380613128
2021-06-21 10:42:38 -07:00
A. Unique TensorFlower a6b8882739 Integrate LLVM at llvm/llvm-project@b650778dc4
Updates LLVM usage to match
[b650778dc4ac](https://github.com/llvm/llvm-project/commit/b650778dc4ac)

PiperOrigin-RevId: 380565709
2021-06-21 06:40:22 -07:00
A. Unique TensorFlower 0162057815 Integrate LLVM at llvm/llvm-project@134723edd5
Adjusted libcxx BUILD files for
2ec672dcdf

Updates LLVM usage to match
[134723edd5bf](https://github.com/llvm/llvm-project/commit/134723edd5bf)

PiperOrigin-RevId: 380542043
2021-06-21 03:38:21 -07:00
A. Unique TensorFlower 785d7042f0 Integrate LLVM at llvm/llvm-project@c9889c44ec
Updates LLVM usage to match
[c9889c44ec5a](https://github.com/llvm/llvm-project/commit/c9889c44ec5a)

PiperOrigin-RevId: 380519152
2021-06-21 00:49:07 -07:00
A. Unique TensorFlower 05ad9ddf84 Integrate LLVM at llvm/llvm-project@93183a41b9
Updates LLVM usage to match
[93183a41b962](https://github.com/llvm/llvm-project/commit/93183a41b962)

PiperOrigin-RevId: 380170094
2021-06-18 06:04:15 -07:00
Adrian Kuegel 4c282fb542 Avoid Broadcast op if all shapes are (known to) be equal.
The rank specialization case for shapes which are either of the same shape or a
scalar doesn't need to compute the final result shape.

PiperOrigin-RevId: 380129316
2021-06-18 00:22:35 -07:00
A. Unique TensorFlower d4a7901284 Integrate LLVM at llvm/llvm-project@366df11a35
Updates LLVM usage to match
[366df11a3539](https://github.com/llvm/llvm-project/commit/366df11a3539)

PiperOrigin-RevId: 380081103
2021-06-17 17:29:07 -07:00
Abhishek Varma da6593e960 PR #50073: [MLIR] Add GatherOp lowering from lmhlo to Affine.
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50073

-- Lowering of `GatherOp` is added from lmhlo to Affine. The lowering
   has been added as a part of `lhlo-legalize-to-affine` pass.

Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com>
Copybara import of the project:

--
5b3dcd4ab31a69f305cd079b869ee35ba6dc8bf5 by Abhishek Varma <abhishek.varma@polymagelabs.com>:

[MLIR] Add GatherOp lowering from lmhlo to Affine.

-- Lowering of `GatherOp` is added from lmhlo to Affine. The lowering
   has been added as a part of `lhlo-legalize-to-affine` pass.

Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com>
PiperOrigin-RevId: 380052157
2021-06-17 14:55:49 -07:00
Adrian Kuegel 2ab16024cf Add tests for lowering HLO_ExpOp for complex types to Linalg.
PiperOrigin-RevId: 379944871
2021-06-17 06:34:19 -07:00
A. Unique TensorFlower 470ac45f45 [MLIR][HLO] Remove unused pass `TransformUnrankedHloPass`
The pass was replaced by the new generalized rank specialization and the two
passes `mhlo-rank-specialization-cluster` and `mhlo-rank-specialization-to-scf`.

PiperOrigin-RevId: 379935562
2021-06-17 05:20:49 -07:00
A. Unique TensorFlower 10634ca3a6 Integrate LLVM at llvm/llvm-project@854ef875b9
Updates LLVM usage to match
[854ef875b929](https://github.com/llvm/llvm-project/commit/854ef875b929)

PiperOrigin-RevId: 379926064
2021-06-17 04:08:35 -07:00
A. Unique TensorFlower 9f47ff607b [MLIR][KernelGen] Better rank specialization for clusters containing `mhlo.select`
Merge `mhlo.select` into rank specialization clusters. Infer shape equalities
correctly from `mhlo.select` (and also from `mhlo.clamp`). This allows to lower
the relu kernel completely flattened.

PiperOrigin-RevId: 379925793
2021-06-17 04:06:08 -07:00
Adrian Kuegel 376da8592f Add MLIR generated SignOp GPU kernel for complex types.
PiperOrigin-RevId: 379924456
2021-06-17 03:56:58 -07:00
Adrian Kuegel 73ed8cbf82 Add MLIR generated NegOp GPU kernel for complex types.
PiperOrigin-RevId: 379905236
2021-06-17 01:30:51 -07:00
Mehdi Amini 8c8e81cb69 Fix pass definition to inherit from the TableGen generated base class (NFC)
PiperOrigin-RevId: 379860210
2021-06-16 19:05:11 -07:00
A. Unique TensorFlower 2e08c246e9 Integrate LLVM at llvm/llvm-project@7fff39d9b0
Updates LLVM usage to match
[7fff39d9b046](https://github.com/llvm/llvm-project/commit/7fff39d9b046)

PiperOrigin-RevId: 379822265
2021-06-16 15:18:51 -07:00
Wenyi Zhao 88cc0c6c46 PR #50271: [MLIR][DISC] Bufferize GatherOp and DynamicGatherOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50271

support hlo-to-lhlo conversion for GatherOp and DynamicGatherOp
Copybara import of the project:

--
117a1b1bcaac7ecc5224b02863eede5c1b9618fe by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize GatherOp and DynamicGatherOp

PiperOrigin-RevId: 379801972
2021-06-16 13:47:56 -07:00
Wenyi Zhao 34dc5f2a79 PR #50020: [MLIR][DISC] support fusion on buffer
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50020

This pass implements the logic to group kLoop/kInput fusion patterns on
buffer level. The reason for this is that we can avoid a lot of
headaches to handle `shape-only` consumers specially (e.g. memref.dim,
shape.shapeOf) since shapes are already resolved in buffer world. It may
be better to move this pass to tensor level after more shape
inference/constraint infras are ready on mhlo level.
Copybara import of the project:

--
e31f8344b59aa9860097197585215ea1689b8ff4 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] support fusion on buffer

This pass implements the logic to group kLoop/kInput fusion patterns on
buffer level. The reason for this is that we can avoid a lot of
headaches to handle `shape-only` consumers specially (e.g. memref.dim,
shape.shapeOf) since shapes are already resolved in buffer world. It may
be better to move this pass to tensor level after more shape
inference/constraint infras are ready on mhlo level.

--
35f2eb2791241b0ab5db1ddcaf1b4006278ddccf by Wenyi Zhao <reyizero@gmail.com>:

fix

--
923c8d61f7fe00a2a0df22d5be396508f0667964 by Wenyi Zhao <reyizero@gmail.com>:

fix sanity check failure

PiperOrigin-RevId: 379743424
2021-06-16 09:51:29 -07:00
A. Unique TensorFlower 82696f8598 [MLIR][HLO] Annotate `mhlo.clamp` and `mhlo.select` as element-wise broadcasting
The operations allow for a limited form of broadcasting which allows some
operands to be scalars. As such they are neither strictly `Elementwise`, nor
`Broadcasting`. They do fulfill the requirements for `BroadcastingElementwise`
though.

PiperOrigin-RevId: 379719961
2021-06-16 07:59:26 -07:00
A. Unique TensorFlower a65cf627c4 Integrate LLVM at llvm/llvm-project@662f9bff33
Updates LLVM usage to match
[662f9bff337b](https://github.com/llvm/llvm-project/commit/662f9bff337b)

PiperOrigin-RevId: 379699574
2021-06-16 05:35:13 -07:00
A. Unique TensorFlower 06ce0852d1 Integrate LLVM at llvm/llvm-project@cc8d32ae7d
Updates LLVM usage to match
[cc8d32ae7d94](https://github.com/llvm/llvm-project/commit/cc8d32ae7d94)

PiperOrigin-RevId: 379676797
2021-06-16 02:40:01 -07:00
Hanhan Wang b44ab8ad49 Add support for lowering DataMovementOp ops to Linalg on unsigned types.
PiperOrigin-RevId: 379527360
2021-06-15 10:58:22 -07:00
Feiwen 3afbe312f8 PR #49919: [MLIR][DISC] pattern conversion from tf2mhlo: ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49919

We are porting our MLIR-based dynamic shape compiler to tf community (From OP def, Patttern, to Optimization pass, etc).
This is the 5th PR about tf2mhlo pattern conversion, which including ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic.
The rest pattern conversions we will add:
- ConvertSqueezeOpxxx
- ConvertStridedSliceOpxxx
- ConvertPrintOp
Copybara import of the project:

--
21b3c3eb05b12956bcdb8b98cc54d9371dbf034d by azazhu <azazhu@gmail.com>:

[MLIR][DISC] pattern conversion from tf2mhlo: ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic

--
634630a4e2e426357290650bd579b35efecab5b3 by azazhu <azazhu@gmail.com>:

[MLIR][DISC] refine ConvertUnpackOpDynamic, ConvertSignOpDynamic, ConvertSigmoidGradOpDynamic

--
39a2bedd6dafb369ae960c5197b7a352bfdfbc80 by azazhu <azazhu@gmail.com>:

add RealDynamicSliceOp's canonicalize and fix CI

--
a1c38dd0963d602ed4812da0d77a096a95920ddb by azazhu <azazhu@gmail.com>:

fix CI for ConvertUnpackOpDynamic

--
5a8b4eb389ed6dc554104356c37f2f1550802b8c by azazhu <azazhu@gmail.com>:

fix typo in ConvertSigmoidGradOpDynamic

PiperOrigin-RevId: 379521079
2021-06-15 10:33:32 -07:00
Chris Jones 5fbdac34a9 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 379455720
2021-06-15 03:55:19 -07:00
Adrian Kuegel 399dae666d Add MLIR generated ExpOp GPU kernel for complex types.
We lower lmhlo::ExpOp to mlir::complex::ExpOp for complex types.

PiperOrigin-RevId: 379432147
2021-06-15 00:45:45 -07:00
Geoffrey Martin-Noble f9f7a63870 Add missing dep on RAL pass generation
Without this I see errors about being unable to find the generated header in our project's build.

PiperOrigin-RevId: 379377718
2021-06-14 17:02:26 -07:00
A. Unique TensorFlower 6f5a440031 Integrate LLVM at llvm/llvm-project@56ae4f23b2
Updates LLVM usage to match
[56ae4f23b227](https://github.com/llvm/llvm-project/commit/56ae4f23b227)

PiperOrigin-RevId: 379348813
2021-06-14 14:17:52 -07:00
Wenyi Zhao 7f94bd923b PR #50236: [MLIR][DISC] Bufferize TransposeOp and ConcatenateOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50236

support hlo-to-lhlo conversion for TransposeOp and ConcatenateOp
Copybara import of the project:

--
62860e717f2a14fbd3ddfb634aa6ff132d245a72 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize TransposeOp and ConcatenateOp

--
ce2ff57c1edee1172cd2f36346cc0b34ec1c7467 by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 379330954
2021-06-14 12:37:45 -07:00
Wenyi Zhao 23ebbb28d1 PR #50191: [MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50191

DISC is a e2e flow, including both compiler side and runtime side. For
runtime side, we have different targeting environments (e.g. tensorflow,
pytorch, or sometimes even a standalone binary). In order to simplify
the design of the compiler side, we design a Runtime Abstraction Layer
(RAL) to sperate the compiler side and runtime side. Thus the compiler
side only need to target RAL itself and it is the responsibility of RAL
to handle the differences between different targeting environments.

One of the most important functions of RAL is to manage stateful
resources. To this end, it provides a context object, and hides all
stateful operations behind this context, thus the compiler side itself
doesn't need to care about the resource initialization. For example, a
kernel must be loaded before it can be launched on GPU. However, the
loading operation should only be taken once during the whole lifetime of
the context in order to achieve the best performance. Based on the
initialization-free interfaces provided by RAL, compiler side can focus
on its core optimization logic and lets the RAL to manage the resource
status.

The context mentioned above is passed as a parameter to the entry
function and all RAL APIs should always use the context as their first
argument. This CR also provides a pass to help to ensure this property.
The pass rewrites the entry function to make sure their first argument
is the context. For entry function, the pass also rewrites its inputs
and outputs. To be concrete, all the original inputs and outputs of the
entry function are received from and sent to RAL through a sequence of
RAL API calls correspondingly. The motivation behind this is to hide the
implementation details of I/Os. This design may also potentially enable
partial execution of the compiled module when some of the inputs are
ready.
Copybara import of the project:

--
c4f20a89aed71181e75bcc5265723b88bde23240 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect

DISC is a e2e flow, including both compiler side and runtime side. For
runtime side, we have different targeting environments (e.g. tensorflow,
pytorch, or sometimes even a standalone binary). In order to simplify
the design of the compiler side, we design a Runtime Abstraction Layer
(RAL) to sperate the compiler side and runtime side. Thus the compiler
side only need to target RAL itself and it is the responsibility of RAL
to handle the differences between different targeting environments.

One of the most important functions of RAL is to manage stateful
resources. To this end, it provides a context object, and hides all
stateful operations behind this context, thus the compiler side itself
doesn't need to care about the resource initialization. For example, a
kernel must be loaded before it can be launched on GPU. However, the
loading operation should only be taken once during the whole lifetime of
the context in order to achieve the best performance. Based on the
initialization-free interfaces provided by RAL, compiler side can focus
on its core optimization logic and lets the RAL to manage the resource
status.

The context mentioned above is passed as a parameter to the entry
function and all RAL APIs should always use the context as their first
argument. This CR also provides a pass to help to ensure this property.
The pass rewrites the entry function to make sure their first argument
is the context. For entry function, the pass also rewrites its inputs
and outputs. To be concrete, all the original inputs and outputs of the
entry function are received from and sent to RAL through a sequence of
RAL API calls correspondingly. The motivation behind this is to hide the
implementation details of I/Os. This design may also potentially enable
partial execution of the compiled module when some of the inputs are
ready.

--
1991d4f80ab6087943956e1c0fec4940a22ab08d by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 379317586
2021-06-14 11:27:43 -07:00
Rahul Joshi a6011d0279 [HLO] Add AllReduceScatter to MHLO and LMHLO dialects.
PiperOrigin-RevId: 379296198
2021-06-14 09:37:07 -07:00
A. Unique TensorFlower dbfa4b1537 Integrate LLVM at llvm/llvm-project@b90f9bea96
Updates LLVM usage to match
[b90f9bea9673](https://github.com/llvm/llvm-project/commit/b90f9bea9673)

PiperOrigin-RevId: 379251091
2021-06-14 04:15:09 -07:00
A. Unique TensorFlower 07c92f0ad8 Integrate LLVM at llvm/llvm-project@e0b469ffa1
Updates LLVM usage to match
[e0b469ffa142](https://github.com/llvm/llvm-project/commit/e0b469ffa142)

PiperOrigin-RevId: 378992873
2021-06-11 19:36:05 -07:00
A. Unique TensorFlower 89faaa6575 Integrate LLVM at llvm/llvm-project@82a3b606b0
Updates LLVM usage to match
[82a3b606b01d](https://github.com/llvm/llvm-project/commit/82a3b606b01d)

PiperOrigin-RevId: 378974967
2021-06-11 16:50:40 -07:00
Wenyi Zhao 8388303fd2 PR #50211: [MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50211

support hlo-to-lhlo conversion for RealDynamicSliceOp and ReduceOp
Copybara import of the project:

--
c417b336670a1fc256f7026dfe8080e46d13d79a by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize RealDynamicSliceOp and ReduceOp

PiperOrigin-RevId: 378972113
2021-06-11 16:33:15 -07:00
Jacques Pienaar 95ba03534f Allow variadic operands/result in MHLO while
This just adds support for it in the op, but keeps the production/uses as is (e.g., single tensor or tuple) matching what XLA export requires. In follow up here, would be to add pass for export to retuple and then the canonical form could be changed. Tuple'ing given control flow via regions & multi-result operations does not add representational power and all the get_tuple_element ops obscure the computation.

The old form allowed single tensor or tuple. The new variadic number of tensor or tuples as tuples may be nested, so the input could have (Tensor<..>, Tuple<Tensor<...>, Tuple<...>, ...>, Tensor<...>) and HLO_Tensor doesn't allow Tuples.

PiperOrigin-RevId: 378934388
2021-06-11 13:08:28 -07:00
A. Unique TensorFlower 33f95eecc7 Integrate LLVM at llvm/llvm-project@f3f904563e
Updates LLVM usage to match
[f3f904563ec9](https://github.com/llvm/llvm-project/commit/f3f904563ec9)

PiperOrigin-RevId: 378880044
2021-06-11 08:47:40 -07:00
A. Unique TensorFlower bd5752f0bf [MLIR][HLO] Find shape equivalences and use them for better rank specialization
Find shape equivalence classes among the operands and use them for better rank
specialization. If all operands are known to be of the same shape, we can
flatten them to rank one. If there are two shape equivalence classes, we can
generalize the scalar rank specialization cases.

PiperOrigin-RevId: 378844575
2021-06-11 04:00:26 -07:00
A. Unique TensorFlower 5cca8a14e3 Integrate LLVM at llvm/llvm-project@4f6ec382c8
Updates LLVM usage to match
[4f6ec382c8b7](https://github.com/llvm/llvm-project/commit/4f6ec382c8b7)

PiperOrigin-RevId: 378808874
2021-06-10 22:53:49 -07:00
A. Unique TensorFlower ad7bc780b9 Integrate LLVM at llvm/llvm-project@ff81a2c95d
Updates LLVM usage to match
[ff81a2c95ddb](https://github.com/llvm/llvm-project/commit/ff81a2c95ddb)

PiperOrigin-RevId: 378745601
2021-06-10 15:19:43 -07:00
Wenyi Zhao 6660234d80 PR #50100: [MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50100

support hlo-to-lhlo conversion for DynamicIotaOp and DynamicPadOp
Copybara import of the project:

--
c3aae94954e35d3f8ad265f619ef9765665a5115 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Bufferize DynamicIotaOp and DynamicPadOp

--
adc6996d70b804d61310d56a33fac975d70c8636 by Wenyi Zhao <reyizero@gmail.com>:

minor

PiperOrigin-RevId: 378733284
2021-06-10 14:20:45 -07:00
A. Unique TensorFlower 642ca86a3f Integrate LLVM at llvm/llvm-project@ad6a84f82c
Updates LLVM usage to match
[ad6a84f82c45](https://github.com/llvm/llvm-project/commit/ad6a84f82c45)

PiperOrigin-RevId: 378690997
2021-06-10 11:07:48 -07:00
A. Unique TensorFlower 14093b7906 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 378681070
2021-06-10 10:27:22 -07:00
Chris Jones 968226b9d7 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 378640706
2021-06-10 06:54:42 -07:00
Adrian Kuegel 6088eb697c Fix Cosh approximation for F16.
We should upcast F16 to F32 to prevent precision loss.
E.g. cosh(-9) would evaluate to 4042 previously instead of 4052.
This allows to enable the MLIR generated kernel for F16 type.
Also move template instantiation for Sinh to inside the #ifdef block.
This was missed in a previous commit.

PiperOrigin-RevId: 378635042
2021-06-10 06:16:44 -07:00
A. Unique TensorFlower 837a1de7c5 Integrate LLVM at llvm/llvm-project@e11b5b87be
Updates LLVM usage to match
[e11b5b87bebf](https://github.com/llvm/llvm-project/commit/e11b5b87bebf)

PiperOrigin-RevId: 378589304
2021-06-10 00:18:25 -07:00
A. Unique TensorFlower 9f67417b41 [MLIR][HLO] Avoid duplicate cluster operands when merging
When merging rank specialization clusters, avoid duplicating operands. A fewer
number of operands usually allows better rank specialization.

PiperOrigin-RevId: 378445946
2021-06-09 10:54:55 -07:00