From f044cf780c5af8a7b1983c357869c1a5e8e632b8 Mon Sep 17 00:00:00 2001
From: Mehdi Amini <aminim@google.com>
Date: Mon, 3 Aug 2020 19:23:23 -0700
Subject: [PATCH] Introduce early documentation on mlir-hlo

PiperOrigin-RevId: 324732753
---
 CONTRIBUTING.md |   5 ++
 README.md       | 200 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 205 insertions(+)
 create mode 100644 CONTRIBUTING.md
 create mode 100644 README.md

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..3ee8420
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,5 @@
+# Contributing
+
+This repository does not accept direct contributions at this time, but
+contributions can be sent to
+https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/hlo
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..1be6fb2
--- /dev/null
+++ b/README.md
@@ -0,0 +1,200 @@
+# MLIR-HLO
+
+The code here exists in two places:
+
+*   https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/hlo;
+    this is the canonical location and where contributions should be made using
+    GitHub pull-requests.
+*   https://github.com/tensorflow/mlir-hlo; this is a standalone repository with
+    a view to the same code to allow other projects to use this without
+    depending on the entire TF monorepo.
+
+This implements a self-contained compiler for a linear algebra set of operations
+inspired by XLA
+[HLO IR](https://www.tensorflow.org/xla/architecture#how_does_xla_work) using
+MLIR components. It is designed to provide an end-to-end flow independent of
+TensorFlow and XLA, but usable inside of these projects.
+
+Coding practice and conventions in this repository follow the
+[MLIR Developer Guide](https://mlir.llvm.org/getting_started/DeveloperGuide/) in
+this repo as part of the intent to act as an incubator for technology to
+upstream.
+
+## QuickStart: building and testing
+
+TODO
+
+## Overview
+
+[XLA](https://www.tensorflow.org/xla/) (Accelerated Linear Algebra) is a
+domain-specific compiler framework and execution environment for linear algebra,
+which powers code-generation for ML frameworks like TensorFlow, JAX, and others.
+
+A cornerstone of XLA is the HLO (High Level Optimizer) IR, which offers a
+carefully fixed selected list of operations, mostly orthogonal to each other. It
+provides an efficient optimizer for computations expressed with this set of
+operations and generate codes for hardware platforms like CPU, GPU, and TPUs.
+Its goal is to provide a uniform interface to compile and execute these
+optimized HLO programs independently of the targeted device. It is not a
+front-end ML system like TensorFlow or JAX, rather it is a backend framework
+that optimizes HLO and lowers to machine code.
+
+The HLO set of operations is closed and has well defined semantics. HLO
+operations operate on immutable Tensors with static shapes (actually bounded
+shapes to be exact) and explicit broadcasts.
+
+[MLIR](https://mlir.llvm.org/) is a compiler infrastructure which intends to
+come with "battery included", as such it intends to provide all the blocks
+required to assemble graph optimization and codegen pipelines. The longer term
+roadmap for MLIR is to provide a
+[Tensor Compute Primitive](https://llvm.discourse.group/c/mlir/MLIR-TCP-WG/36)
+(TCP) dialect, which should hopefully be general enough to model what HLO
+represents today (see
+[slides](https://drive.google.com/open?id=1iljcpTQ5NPaMfGpoPDFml1XkYxjK_6A4) and
+[recording](https://drive.google.com/open?id=1jSPa8TwPKUt0WuLquGc8OgSUVYJHMvWZ)
+for a technical discussion on this topic).
+
+The work on MLIR-HLO can be seen as a stepping stone towards building TCP, while
+integrating intermediate components into XLA itself by relying on the
+well-proven HLO IR and introducing more pieces from upstream MLIR
+([Linalg](https://mlir.llvm.org/docs/Dialects/Linalg/),
+[Vector](https://mlir.llvm.org/docs/Dialects/Vector/),
+[GPU](https://mlir.llvm.org/docs/Dialects/GPU/) dialect, ...).
+[This document](https://www.tensorflow.org/mlir/xla_gpu_codegen) provides more
+information on the current migration of the XLA GPU codegen.
+
+## MLIR Dialects for XLA-style compilation
+
+This repository defines three dialects to support a HLO-like compilation
+pipeline using MLIR:
+
+*   `chlo`: the "client" HLO dialect, intended to be closer to the frontend
+    (including implicit broadcast semantics).
+*   `mhlo`: "meta"-HLO dialect ; similar to `xla_hlo`, but with extensions for
+    dynamic shape support.
+*   `lmhlo`: "late"-"meta"-HLO, it is the IR after buffer allocation is
+    performed. In XLA the buffer allocation is a side-datastructure which keeps
+    track of these informations, while this separate dialect materializes it in
+    the IR.
+
+We describe these in more details below.
+
+### HLO Client Dialect: `chlo`.
+
+*   It was originaly designed to map the
+    [XLA client APIs](https://www.tensorflow.org/xla/operation_semantics) (e.g.,
+    ops supports implicit broadcast and roughly modeled on XlaBuilder API)
+    modulo support for dynamic shapes and additional ops required to support
+    dynamic client side HLOs.
+*   Ops can be from either the XlaBuilder or XLA helper functions can be
+    converted into ops (e.g., given ambiguity in what constitutes these ops,
+    there is some freedom to decide), the goal of this dialect is to correspond
+    close to client level and enable a thin layer between client use and op
+    construction (making it cheap to construct and optimizations on the dialect
+    close to optimizations on the client ops).
+
+Entry:
+
+*   The vast majority of old "client" interactions are via the XlaBuilder APIs.
+    These APIs are used by TF2XLA kernels, JAX, PyTorch bridge and directly. The
+    legalization path (described below) can also reuse the XlaBuilder's APIs to
+    construct XLA Client HLO ops directly (this uses MlirXlaBuilder which is a
+    subclass of XlaBuilder).
+*   The other entry point is during legalization from TensorFlow ops in the TF
+    Graph Compiler and other tools (e.g., SavedModel lowering and TFCompile).
+
+Exit:
+
+*   MHLO
+*   May be exported to xla::HloInstructionProto by invoking the XlaBuilder APIs
+    (with regular XlaBuilder)
+
+The `chlo` dialect started originally as mapping to the XLA client Builder APIs.
+It enables it to both be constructed and converted back to existing XLA
+interfaces using the XlaBuilder API. Due to the way that translation into and
+out of the dialect works, there is no expectation that this dialect roundtrips
+to XLA (e.g., it is only intended to be translated to MLIR and then legalized to
+another dialect or translated to HloInstructionProto).
+
+The export approach of reusing the XlaBuilders enables reusing a lot of logic
+that was already implemented in terms of computing shapes, inserting broadcasts
+etc.
+
+An important topic here is that XLA Client HLO ops are not a well defined set.
+And in particular what some would consider helper functions, others would
+consider ops. It should be easy to move between these and so define a new op
+along with the helper function or autogenerate the helper functions from the
+descriptions of the ops. For the former, a simple approach would be to simply
+consider the context in which the op is being constructed and if an MLIR one,
+construct a op in the client dialect instead of further calls into XlaBuilder.
+The latter could be implemented by adding the op and a legalization of the op to
+other known ops, from which a helper function can get generated that could be
+used as regular.
+
+Status: Exists but need to be cleaned up.
+
+### Meta HLO Dialect `mhlo`
+
+*   Dialect is closer to current HLO server ops (e.g., no implicit broadcast)
+*   MHLO dialect where we can deviate from the requirements of the client or
+    server dialect, in particular:
+    *   Control flow ops with implicit capture to enable simpler optimizations
+        (e.g., generic LICM, unroll & jam, etc.)
+    *   Multiple results ops (e.g., no tuples)
+    *   More ops (for example, unique op or assert op), and ops that don't need
+        to be added to either client or server dialect.
+    *   Op set not constrained by implementation (e.g., hlo.add operating on say
+        i79 or !mydialect.weird_type is allowed even though no XLA backend
+        supports it). Verification on types happening at the boundaries.
+    *   It does not need to preserve some deprecated XLA constructs (e.g.
+        stateful RNG HLO).
+    *   More dynamic shape support ops without need for updating all
+        users/backends.
+*   This dialect enables evolving HLO independently from XLA in order to
+    experiment with features we'd like to upstream in MLIR TCP. In particular it
+    intends to be user-extensible through
+    [interfaces](https://mlir.llvm.org/docs/Interfaces/).
+*   It should have no TensorFlow, or proto, or other Google internal
+    dependencies.
+*   It need not be a complete superset of ops compared to XLA HLO dialect.
+
+Entry:
+
+*   Legalization from `chlo` dialect or conversion from XLA HLO.
+*   Directly emitted from TF Graph Compiler;
+*   Builder call (e.g., EDSL);
+
+Exit:
+
+*   LMHLO, Linalg IREE, directly used in codegen.
+*   XLA HLO.
+
+The MHLO dialect has no direct export format, it is only meant as an
+intermediate optimization dialect/format. It is also where we can experiment
+cheaply with new ops. This format will be where the representation would differ
+from existing end points.
+
+Status: Exists but need to be cleaned up and evolved, in particular with respect
+to supporting dynamic shapes.
+
+### LMHLO
+
+LMHLO corresponds to late `mhlo` and operates on buffer domain (e.g., memref)
+with side-effecting operations. The lowering from `mhlo` dialect proceeds by way
+of scheduling, memory and buffer allocation. The current mapping is directly on
+XLA Client HLOs but without implicit broadcast and with operation on memrefs.
+This dialect will instead be rebased on `mhlo` dialect but operating on buffers
+still.
+
+Entry:
+
+*   Post buffer assignment on `mhlo` dialect, or from XLA after buffer
+    assignment.
+
+Exit:
+
+*   Codegen (LLVM IR in the common cases at the moment)
+
+## End-to-End pipeline
+
+TODO