Commit Graph

42 Commits

Author SHA1 Message Date
A. Unique TensorFlower 470ac45f45 [MLIR][HLO] Remove unused pass `TransformUnrankedHloPass`
The pass was replaced by the new generalized rank specialization and the two
passes `mhlo-rank-specialization-cluster` and `mhlo-rank-specialization-to-scf`.

PiperOrigin-RevId: 379935562
2021-06-17 05:20:49 -07:00
Mehdi Amini 8c8e81cb69 Fix pass definition to inherit from the TableGen generated base class (NFC)
PiperOrigin-RevId: 379860210
2021-06-16 19:05:11 -07:00
Wenyi Zhao 34dc5f2a79 PR #50020: [MLIR][DISC] support fusion on buffer
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50020

This pass implements the logic to group kLoop/kInput fusion patterns on
buffer level. The reason for this is that we can avoid a lot of
headaches to handle `shape-only` consumers specially (e.g. memref.dim,
shape.shapeOf) since shapes are already resolved in buffer world. It may
be better to move this pass to tensor level after more shape
inference/constraint infras are ready on mhlo level.
Copybara import of the project:

--
e31f8344b59aa9860097197585215ea1689b8ff4 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] support fusion on buffer

This pass implements the logic to group kLoop/kInput fusion patterns on
buffer level. The reason for this is that we can avoid a lot of
headaches to handle `shape-only` consumers specially (e.g. memref.dim,
shape.shapeOf) since shapes are already resolved in buffer world. It may
be better to move this pass to tensor level after more shape
inference/constraint infras are ready on mhlo level.

--
35f2eb2791241b0ab5db1ddcaf1b4006278ddccf by Wenyi Zhao <reyizero@gmail.com>:

fix

--
923c8d61f7fe00a2a0df22d5be396508f0667964 by Wenyi Zhao <reyizero@gmail.com>:

fix sanity check failure

PiperOrigin-RevId: 379743424
2021-06-16 09:51:29 -07:00
A. Unique TensorFlower 82696f8598 [MLIR][HLO] Annotate `mhlo.clamp` and `mhlo.select` as element-wise broadcasting
The operations allow for a limited form of broadcasting which allows some
operands to be scalars. As such they are neither strictly `Elementwise`, nor
`Broadcasting`. They do fulfill the requirements for `BroadcastingElementwise`
though.

PiperOrigin-RevId: 379719961
2021-06-16 07:59:26 -07:00
Chris Jones 5fbdac34a9 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 379455720
2021-06-15 03:55:19 -07:00
Wenyi Zhao 23ebbb28d1 PR #50191: [MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/50191

DISC is a e2e flow, including both compiler side and runtime side. For
runtime side, we have different targeting environments (e.g. tensorflow,
pytorch, or sometimes even a standalone binary). In order to simplify
the design of the compiler side, we design a Runtime Abstraction Layer
(RAL) to sperate the compiler side and runtime side. Thus the compiler
side only need to target RAL itself and it is the responsibility of RAL
to handle the differences between different targeting environments.

One of the most important functions of RAL is to manage stateful
resources. To this end, it provides a context object, and hides all
stateful operations behind this context, thus the compiler side itself
doesn't need to care about the resource initialization. For example, a
kernel must be loaded before it can be launched on GPU. However, the
loading operation should only be taken once during the whole lifetime of
the context in order to achieve the best performance. Based on the
initialization-free interfaces provided by RAL, compiler side can focus
on its core optimization logic and lets the RAL to manage the resource
status.

The context mentioned above is passed as a parameter to the entry
function and all RAL APIs should always use the context as their first
argument. This CR also provides a pass to help to ensure this property.
The pass rewrites the entry function to make sure their first argument
is the context. For entry function, the pass also rewrites its inputs
and outputs. To be concrete, all the original inputs and outputs of the
entry function are received from and sent to RAL through a sequence of
RAL API calls correspondingly. The motivation behind this is to hide the
implementation details of I/Os. This design may also potentially enable
partial execution of the compiled module when some of the inputs are
ready.
Copybara import of the project:

--
c4f20a89aed71181e75bcc5265723b88bde23240 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] Add RAL (Runtime abstraction layer) Dialect

DISC is a e2e flow, including both compiler side and runtime side. For
runtime side, we have different targeting environments (e.g. tensorflow,
pytorch, or sometimes even a standalone binary). In order to simplify
the design of the compiler side, we design a Runtime Abstraction Layer
(RAL) to sperate the compiler side and runtime side. Thus the compiler
side only need to target RAL itself and it is the responsibility of RAL
to handle the differences between different targeting environments.

One of the most important functions of RAL is to manage stateful
resources. To this end, it provides a context object, and hides all
stateful operations behind this context, thus the compiler side itself
doesn't need to care about the resource initialization. For example, a
kernel must be loaded before it can be launched on GPU. However, the
loading operation should only be taken once during the whole lifetime of
the context in order to achieve the best performance. Based on the
initialization-free interfaces provided by RAL, compiler side can focus
on its core optimization logic and lets the RAL to manage the resource
status.

The context mentioned above is passed as a parameter to the entry
function and all RAL APIs should always use the context as their first
argument. This CR also provides a pass to help to ensure this property.
The pass rewrites the entry function to make sure their first argument
is the context. For entry function, the pass also rewrites its inputs
and outputs. To be concrete, all the original inputs and outputs of the
entry function are received from and sent to RAL through a sequence of
RAL API calls correspondingly. The motivation behind this is to hide the
implementation details of I/Os. This design may also potentially enable
partial execution of the compiled module when some of the inputs are
ready.

--
1991d4f80ab6087943956e1c0fec4940a22ab08d by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 379317586
2021-06-14 11:27:43 -07:00
A. Unique TensorFlower 14093b7906 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 378681070
2021-06-10 10:27:22 -07:00
Chris Jones 968226b9d7 [XLA:GPU] Add AllReduce{Start,Done} to MLIR LHLO dialect.
PiperOrigin-RevId: 378640706
2021-06-10 06:54:42 -07:00
Mehdi Amini 1770ed455f Remove unnecessary duplicated source from "lhlo" (NFC)
PiperOrigin-RevId: 378291564
2021-06-08 18:07:26 -07:00
A. Unique TensorFlower c47869f931 [MLIR][HLO] Rename `move-up-dynamic-broadcasts-for-fusion` to `broadcast-propagation`
PiperOrigin-RevId: 378102608
2021-06-08 01:51:10 -07:00
Wenyi Zhao ade873a5e0 PR #49970: [MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49970

1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim

2, add a flag `convert-to-lmhlo-only` to seperate following two case:
   - hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo
     counterparts, do not apply any optimization (e.g. elide any
     buffer copy). Buffer optimization is not easy in dynamic
     shape world especially when involving control flow, thus we
     leave this to another dedicated pass.

   - hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo
     ops (e.g. reshape) to memref dialect directly and Lowers others
     to their lmhlo counterparts.
Copybara import of the project:

--
562bd65a368f6194405c4ae6900e3b4388a5ec03 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] bufferize DynamicReshape and DynamicBroadcastInDim

1, add hlo-to-lhlo support for DynamicReshape and DynamicBroadcastInDim

2, add a flag `convert-to-lmhlo-only` to seperate following two case:
   - hlo-to-lhlo only. Simply lowers all mhlo ops to their lmhlo
     counterparts, do not apply any optimization (e.g. elide any
     buffer copy). Buffer optimization is not easy in dynamic
     shape world especially when involving control flow, thus we
     leave this to another dedicated pass.

   - hlo-to-lhlo-or-memref-directly. Lowers some metadata-only mhlo
     ops (e.g. reshape) to memref dialect directly and Lowers others
     to their lmhlo counterparts.

PiperOrigin-RevId: 377603395
2021-06-04 15:36:03 -07:00
Jacques Pienaar 4fc2e87a42 Add mhlo python binding generator target
This just invokes the generator backend & creates a filegroup.

PiperOrigin-RevId: 377318653
2021-06-03 10:26:30 -07:00
wyzhao 968d4b8709 PR #49598: [MLIR][DISC] legalize tensor_load inserted during hlo-to-lhlo conversion
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/49598

This PR implements logic for lowering memref.tensor_load ops that are
inserted during `mhlo-legalize-to-lmhlo`
Copybara import of the project:

--
80eb377af4e02182e1aecc943a41ca5d7d1c2100 by Wenyi Zhao <reyizero@gmail.com>:

[MLIR][DISC] legalize tensor_load inserted during hlo-to-lhlo conversion

This PR implements logic for lowering memref.tensor_load ops that are
inserted during `mhlo-legalize-to-lmhlo`.

--
ac452fe3dcd591211cd5c59be9189fe2f7153b41 by Wenyi Zhao <reyizero@gmail.com>:

minor fix

--
6b36017f8632a06adbc3e05a62975fa641d0260f by Wenyi Zhao <reyizero@gmail.com>:

minor refine

--
846005cc76d0033112e47825c2e9a97790b6925f by Wenyi Zhao <reyizero@gmail.com>:

minor fix

--
f6a4becaa287d5ca323b2d152a4d0ae053730fd9 by Wenyi Zhao <reyizero@gmail.com>:

fix

--
5555749f60f7fce8f57962860ef65efccf0362ba by Wenyi Zhao <reyizero@gmail.com>:

fix

--
8873b9b6d9315c1199ca9f7c133ecf377ecd2fa6 by Wenyi Zhao <reyizero@gmail.com>:

fix

PiperOrigin-RevId: 376942547
2021-06-01 16:27:56 -07:00
A. Unique TensorFlower d1828625ab [MLIR][KernelGen] Make maximum supported rank in rank specialization configurable
The maximum supported target rank of 5 is sufficient for all operations but
`select`. Make the maximum target rank configurable in the rank specialization.
This reduces the number of generated kernels for operations that don't require
it.

PiperOrigin-RevId: 376822496
2021-06-01 06:54:31 -07:00
Rahul Joshi fc88cf1ff4 [HLO] Adopt custom syntax for convolution dims and window attributes for LMHLO_GPU
PiperOrigin-RevId: 374889917
2021-05-20 09:41:48 -07:00
Rahul Joshi a361253e4f [HLO] Add custom print/parse for window attributes of convolutions (in LMHLO)
PiperOrigin-RevId: 373807616
2021-05-14 09:47:25 -07:00
Alex Zinenko a2c9b3c9d7 [mlir] update gentbl() Bazel macro
Rename `gentbl` to `gentbl_cc_library` to make it clearer which kind of rule is
ultimately used.

Update `gentbl_*` macros to take `tbl_outs` options as a list rather a
whitespace-separated string and remove the related string handling.

PiperOrigin-RevId: 373406352
2021-05-12 10:58:58 -07:00
Itai Zukerman 30779f0c2f Added build rules for HLO and LHLO ops MD.
PiperOrigin-RevId: 373373920
2021-05-12 08:23:40 -07:00
A. Unique TensorFlower 313d24bc8f [MLIR][HLO] Add `rank-specialization-cluster` pass
Add a pass to cluster unranked C/HLO operations in one
`chlo.rank_specialization_cluster` op. The C/HLO operations are moved to the
body of the operation. Later passes can use this to rank-specialize all these
operations together.

PiperOrigin-RevId: 373336725
2021-05-12 03:46:01 -07:00
A. Unique TensorFlower 96a47345cc [MLIR][HLO] Add `rank_specialization_cluster` op to CHLO
The operation will be used to cluster compatible operations that can be rank-
specialized collectively.

PiperOrigin-RevId: 373128557
2021-05-11 05:17:42 -07:00
dfki-jugr 6bc854f5d9 PR #48667: [mlir-hlo] Added RegionBranchOpInterfaces to lmhlo operations.
Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/48667

Added RegionBranchOpInterfaces to lmhlo operations that use regions.
This is needed, since the bufferization features in MLIR have to reason about the control flow within these operations.
Copybara import of the project:

--
572fd7d850a46630b812da84e9094280f89f259e by Julian Gross <julian.gross@dfki.de>:

Added RegionBranchOpInterfaces to lmhlo operations.

PiperOrigin-RevId: 372070825
2021-05-05 00:27:56 -07:00
Geoffrey Martin-Noble 38d0f96709 Switch deps AllPassesAndDialectsNoRegistration -> AllPassesAndDialects
These targets are now identical as all registration is explicit.
Temporarily leaving the old target as a (deprecated) alias while
changes propagate.

PiperOrigin-RevId: 366513211
2021-04-02 14:26:13 -07:00
Russell Power 3be9874d82 Remove comments for license declarations. These can get out of date/stale.
PiperOrigin-RevId: 365936646
2021-03-30 17:55:42 -07:00
Adrian Kuegel 6388e8d9ee mlir-hlo-opt: set preloadDialectsInContext to false.
This requires specifying dependent dialects in several passes.

PiperOrigin-RevId: 365758084
2021-03-30 01:07:14 -07:00
A. Unique TensorFlower c54527fe88 Integrate LLVM at llvm/llvm-project@678241795c
Updates LLVM usage to match
[678241795c95](https://github.com/llvm/llvm-project/commit/678241795c95)

PiperOrigin-RevId: 363257913
2021-03-16 13:33:00 -07:00
Rahul Joshi 9902e6ee32 [HLO] Add LMHLO CollectivePermute verification.
- Extract verification of source target pairs attached to collective permute into a common
  helper function and use that to verify both MHLO and LMHLO variants.
- Change MlirGpuTestBase::ParseMlirModule to allow returning back a failure, and use
  that to update the mlir_gpu_compile_test to check the new behavior.

PiperOrigin-RevId: 362156962
2021-03-10 15:37:12 -08:00
A. Unique TensorFlower c217a6ef61 [MHLO] Add pass to move up dynamic broadcasts for fusion
For now, the pass only reifies the required shape computations. Moving
broadcasts will follow to allow for fusion across them.

PiperOrigin-RevId: 362033715
2021-03-10 06:21:57 -08:00
Stephan Herhut cabd4d9a06 Canonicalize dynamic_broadcast_in_dim to own shape with rank narrowing on the shape to a corresponding tensor.cast.
PiperOrigin-RevId: 362028291
2021-03-10 05:43:54 -08:00
Geoffrey Martin-Noble 50a516fb9c Adopt td_library
This avoids needing to list all transitive include dependencies and tracks include directories.

PiperOrigin-RevId: 360779798
2021-03-03 16:11:21 -08:00
Rahul Joshi c5f5d13930 [MLIR] Add dependency from "lhlo_ops_structs_inc_gen" to "lhlo" target.
PiperOrigin-RevId: 359488020
2021-02-25 03:51:36 -08:00
Geoffrey Martin-Noble 89f7f2bd65 Lower integer matmuls to linalg
PiperOrigin-RevId: 359306495
2021-02-24 09:45:07 -08:00
Rahul Joshi 5adb7c6e12 [MLIR:LHLO] Add optional call target arg mapping to LMHLO CustomCall operations.
- XLA:HLO -> LMHLO conversion drops all token arguments and return values, however
  custom calls that users write still expect to get buffer pointers for these token types.
- To be able to support this, add an optional call target argument mapping attribute to
  LMHLO custom calls. When this attribute is present, it indicates the number of
  arguments and returns that the custom call expects and also indicates which LMHLO
  arg() or output() maps to which arg or result number of the custom call.

PiperOrigin-RevId: 358826664
2021-02-22 08:43:00 -08:00
A. Unique TensorFlower c06de24f6c [MLIR][CHLO] Generalize lowering with upcast to n-ary operation
Allows reuse for zeta lowering now and for the polygamma lowering soon.

PiperOrigin-RevId: 357739910
2021-02-16 09:47:24 -08:00
A. Unique TensorFlower 4060a86fe2 Integrate LLVM at llvm/llvm-project@2bfe27da17
Updates LLVM usage to match
[2bfe27da171e](https://github.com/llvm/llvm-project/commit/2bfe27da171e)

PiperOrigin-RevId: 357196336
2021-02-12 08:32:03 -08:00
Alexander Belyaev 7aa64ee0b7 [MLIR] Migrate TF from STD complex ops to ComplexDialect.
PiperOrigin-RevId: 352966408
2021-01-21 01:22:25 -08:00
Tres Popp ba0346b071 Integrate LLVM at llvm/llvm-project@96ef4f307d
Updates LLVM usage to match
[96ef4f307df2](https://github.com/llvm/llvm-project/commit/96ef4f307df2)

PiperOrigin-RevId: 352786460
2021-01-20 07:09:47 -08:00
Alexander Belyaev 180f917446 [KERNEL_GEN] Add a pattern for hlo.dyn_broadcast->linalg to enable is_inf kernel.
PiperOrigin-RevId: 351179620
2021-01-11 10:13:31 -08:00
Geoffrey Martin-Noble 47848764a5 Change hardcoded external/ include to match repository name
This is the preferred name for the Bazel repository when included from
another project.

Note: These hardcoded "external/" includes are a horrible hack that we
can hopefully get rid of soon with more robust Bazel tablegen rules.
PiperOrigin-RevId: 350810768
2021-01-08 12:07:53 -08:00
A. Unique TensorFlower b0bf2ef45b Integrate LLVM at llvm/llvm-project@c3acda0798
Updates LLVM usage to match
[c3acda0798f9](https://github.com/llvm/llvm-project/commit/c3acda0798f9)

PiperOrigin-RevId: 348896724
2020-12-23 23:53:54 -08:00
Tres Popp a42213b870 Define lowering of [l]mhlo.pow.
For floating point operations, this uses std.pow.
For integer operations, this lowers to a loop.
This adds a dependency on scf.

PiperOrigin-RevId: 348537232
2020-12-21 15:27:40 -08:00
A. Unique TensorFlower 5da9190dd9 Integrate LLVM at llvm/llvm-project@0cf7e4b252
Updates LLVM usage to match
[0cf7e4b252fe](https://github.com/llvm/llvm-project/commit/0cf7e4b252fe)

PiperOrigin-RevId: 347948887
2020-12-16 20:30:17 -08:00
Phoenix Meadowlark 5a080ad795 Add Bazel build configuration to MLIR HLO standalone repo.
Tested that this works in OSS in tree with `bazel build //...` and as a submodule in IREE.

PiperOrigin-RevId: 347919106
2020-12-16 16:23:31 -08:00