Introduce early documentation on mlir-hlo

PiperOrigin-RevId: 324732753
2020-08-03 19:23:23 -07:00 · 2020-08-03 19:23:23 -07:00 · f044cf780c
parent 4c8fead3e0
commit f044cf780c
2 changed files with 205 additions and 0 deletions
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -0,0 +1,5 @@
 # Contributing
 This repository does not accept direct contributions at this time, but
 contributions can be sent to
 https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/hlo
--- a/README.md
+++ b/README.md
@ -0,0 +1,200 @@
 # MLIR-HLO
 The code here exists in two places:
 *   https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/hlo;
    this is the canonical location and where contributions should be made using
    GitHub pull-requests.
 *   https://github.com/tensorflow/mlir-hlo; this is a standalone repository with
    a view to the same code to allow other projects to use this without
    depending on the entire TF monorepo.
 This implements a self-contained compiler for a linear algebra set of operations
 inspired by XLA
 [HLO IR](https://www.tensorflow.org/xla/architecture#how_does_xla_work) using
 MLIR components. It is designed to provide an end-to-end flow independent of
 TensorFlow and XLA, but usable inside of these projects.
 Coding practice and conventions in this repository follow the
 [MLIR Developer Guide](https://mlir.llvm.org/getting_started/DeveloperGuide/) in
 this repo as part of the intent to act as an incubator for technology to
 upstream.
 ## QuickStart: building and testing
 TODO
 ## Overview
 [XLA](https://www.tensorflow.org/xla/) (Accelerated Linear Algebra) is a
 domain-specific compiler framework and execution environment for linear algebra,
 which powers code-generation for ML frameworks like TensorFlow, JAX, and others.
 A cornerstone of XLA is the HLO (High Level Optimizer) IR, which offers a
 carefully fixed selected list of operations, mostly orthogonal to each other. It
 provides an efficient optimizer for computations expressed with this set of
 operations and generate codes for hardware platforms like CPU, GPU, and TPUs.
 Its goal is to provide a uniform interface to compile and execute these
 optimized HLO programs independently of the targeted device. It is not a
 front-end ML system like TensorFlow or JAX, rather it is a backend framework
 that optimizes HLO and lowers to machine code.
 The HLO set of operations is closed and has well defined semantics. HLO
 operations operate on immutable Tensors with static shapes (actually bounded
 shapes to be exact) and explicit broadcasts.
 [MLIR](https://mlir.llvm.org/) is a compiler infrastructure which intends to
 come with "battery included", as such it intends to provide all the blocks
 required to assemble graph optimization and codegen pipelines. The longer term
 roadmap for MLIR is to provide a
 [Tensor Compute Primitive](https://llvm.discourse.group/c/mlir/MLIR-TCP-WG/36)
 (TCP) dialect, which should hopefully be general enough to model what HLO
 represents today (see
 [slides](https://drive.google.com/open?id=1iljcpTQ5NPaMfGpoPDFml1XkYxjK_6A4) and
 [recording](https://drive.google.com/open?id=1jSPa8TwPKUt0WuLquGc8OgSUVYJHMvWZ)
 for a technical discussion on this topic).
 The work on MLIR-HLO can be seen as a stepping stone towards building TCP, while
 integrating intermediate components into XLA itself by relying on the
 well-proven HLO IR and introducing more pieces from upstream MLIR
 ([Linalg](https://mlir.llvm.org/docs/Dialects/Linalg/),
 [Vector](https://mlir.llvm.org/docs/Dialects/Vector/),
 [GPU](https://mlir.llvm.org/docs/Dialects/GPU/) dialect, ...).
 [This document](https://www.tensorflow.org/mlir/xla_gpu_codegen) provides more
 information on the current migration of the XLA GPU codegen.
 ## MLIR Dialects for XLA-style compilation
 This repository defines three dialects to support a HLO-like compilation
 pipeline using MLIR:
 *   `chlo`: the "client" HLO dialect, intended to be closer to the frontend
    (including implicit broadcast semantics).
 *   `mhlo`: "meta"-HLO dialect ; similar to `xla_hlo`, but with extensions for
    dynamic shape support.
 *   `lmhlo`: "late"-"meta"-HLO, it is the IR after buffer allocation is
    performed. In XLA the buffer allocation is a side-datastructure which keeps
    track of these informations, while this separate dialect materializes it in
    the IR.
 We describe these in more details below.
 ### HLO Client Dialect: `chlo`.
 *   It was originaly designed to map the
    [XLA client APIs](https://www.tensorflow.org/xla/operation_semantics) (e.g.,
    ops supports implicit broadcast and roughly modeled on XlaBuilder API)
    modulo support for dynamic shapes and additional ops required to support
    dynamic client side HLOs.
 *   Ops can be from either the XlaBuilder or XLA helper functions can be
    converted into ops (e.g., given ambiguity in what constitutes these ops,
    there is some freedom to decide), the goal of this dialect is to correspond
    close to client level and enable a thin layer between client use and op
    construction (making it cheap to construct and optimizations on the dialect
    close to optimizations on the client ops).
 Entry:
 *   The vast majority of old "client" interactions are via the XlaBuilder APIs.
    These APIs are used by TF2XLA kernels, JAX, PyTorch bridge and directly. The
    legalization path (described below) can also reuse the XlaBuilder's APIs to
    construct XLA Client HLO ops directly (this uses MlirXlaBuilder which is a
    subclass of XlaBuilder).
 *   The other entry point is during legalization from TensorFlow ops in the TF
    Graph Compiler and other tools (e.g., SavedModel lowering and TFCompile).
 Exit:
 *   MHLO
 *   May be exported to xla::HloInstructionProto by invoking the XlaBuilder APIs
    (with regular XlaBuilder)
 The `chlo` dialect started originally as mapping to the XLA client Builder APIs.
 It enables it to both be constructed and converted back to existing XLA
 interfaces using the XlaBuilder API. Due to the way that translation into and
 out of the dialect works, there is no expectation that this dialect roundtrips
 to XLA (e.g., it is only intended to be translated to MLIR and then legalized to
 another dialect or translated to HloInstructionProto).
 The export approach of reusing the XlaBuilders enables reusing a lot of logic
 that was already implemented in terms of computing shapes, inserting broadcasts
 etc.
 An important topic here is that XLA Client HLO ops are not a well defined set.
 And in particular what some would consider helper functions, others would
 consider ops. It should be easy to move between these and so define a new op
 along with the helper function or autogenerate the helper functions from the
 descriptions of the ops. For the former, a simple approach would be to simply
 consider the context in which the op is being constructed and if an MLIR one,
 construct a op in the client dialect instead of further calls into XlaBuilder.
 The latter could be implemented by adding the op and a legalization of the op to
 other known ops, from which a helper function can get generated that could be
 used as regular.
 Status: Exists but need to be cleaned up.
 ### Meta HLO Dialect `mhlo`
 *   Dialect is closer to current HLO server ops (e.g., no implicit broadcast)
 *   MHLO dialect where we can deviate from the requirements of the client or
    server dialect, in particular:
    *   Control flow ops with implicit capture to enable simpler optimizations
        (e.g., generic LICM, unroll & jam, etc.)
    *   Multiple results ops (e.g., no tuples)
    *   More ops (for example, unique op or assert op), and ops that don't need
        to be added to either client or server dialect.
    *   Op set not constrained by implementation (e.g., hlo.add operating on say
        i79 or !mydialect.weird_type is allowed even though no XLA backend
        supports it). Verification on types happening at the boundaries.
    *   It does not need to preserve some deprecated XLA constructs (e.g.
        stateful RNG HLO).
    *   More dynamic shape support ops without need for updating all
        users/backends.
 *   This dialect enables evolving HLO independently from XLA in order to
    experiment with features we'd like to upstream in MLIR TCP. In particular it
    intends to be user-extensible through
    [interfaces](https://mlir.llvm.org/docs/Interfaces/).
 *   It should have no TensorFlow, or proto, or other Google internal
    dependencies.
 *   It need not be a complete superset of ops compared to XLA HLO dialect.
 Entry:
 *   Legalization from `chlo` dialect or conversion from XLA HLO.
 *   Directly emitted from TF Graph Compiler;
 *   Builder call (e.g., EDSL);
 Exit:
 *   LMHLO, Linalg IREE, directly used in codegen.
 *   XLA HLO.
 The MHLO dialect has no direct export format, it is only meant as an
 intermediate optimization dialect/format. It is also where we can experiment
 cheaply with new ops. This format will be where the representation would differ
 from existing end points.
 Status: Exists but need to be cleaned up and evolved, in particular with respect
 to supporting dynamic shapes.
 ### LMHLO
 LMHLO corresponds to late `mhlo` and operates on buffer domain (e.g., memref)
 with side-effecting operations. The lowering from `mhlo` dialect proceeds by way
 of scheduling, memory and buffer allocation. The current mapping is directly on
 XLA Client HLOs but without implicit broadcast and with operation on memrefs.
 This dialect will instead be rebased on `mhlo` dialect but operating on buffers
 still.
 Entry:
 *   Post buffer assignment on `mhlo` dialect, or from XLA after buffer
    assignment.
 Exit:
 *   Codegen (LLVM IR in the common cases at the moment)
 ## End-to-End pipeline
 TODO