From f044cf780c5af8a7b1983c357869c1a5e8e632b8 Mon Sep 17 00:00:00 2001 From: Mehdi Amini Date: Mon, 3 Aug 2020 19:23:23 -0700 Subject: [PATCH] Introduce early documentation on mlir-hlo PiperOrigin-RevId: 324732753 --- CONTRIBUTING.md | 5 ++ README.md | 200 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 205 insertions(+) create mode 100644 CONTRIBUTING.md create mode 100644 README.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..3ee8420 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,5 @@ +# Contributing + +This repository does not accept direct contributions at this time, but +contributions can be sent to +https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/hlo diff --git a/README.md b/README.md new file mode 100644 index 0000000..1be6fb2 --- /dev/null +++ b/README.md @@ -0,0 +1,200 @@ +# MLIR-HLO + +The code here exists in two places: + +* https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/mlir/hlo; + this is the canonical location and where contributions should be made using + GitHub pull-requests. +* https://github.com/tensorflow/mlir-hlo; this is a standalone repository with + a view to the same code to allow other projects to use this without + depending on the entire TF monorepo. + +This implements a self-contained compiler for a linear algebra set of operations +inspired by XLA +[HLO IR](https://www.tensorflow.org/xla/architecture#how_does_xla_work) using +MLIR components. It is designed to provide an end-to-end flow independent of +TensorFlow and XLA, but usable inside of these projects. + +Coding practice and conventions in this repository follow the +[MLIR Developer Guide](https://mlir.llvm.org/getting_started/DeveloperGuide/) in +this repo as part of the intent to act as an incubator for technology to +upstream. + +## QuickStart: building and testing + +TODO + +## Overview + +[XLA](https://www.tensorflow.org/xla/) (Accelerated Linear Algebra) is a +domain-specific compiler framework and execution environment for linear algebra, +which powers code-generation for ML frameworks like TensorFlow, JAX, and others. + +A cornerstone of XLA is the HLO (High Level Optimizer) IR, which offers a +carefully fixed selected list of operations, mostly orthogonal to each other. It +provides an efficient optimizer for computations expressed with this set of +operations and generate codes for hardware platforms like CPU, GPU, and TPUs. +Its goal is to provide a uniform interface to compile and execute these +optimized HLO programs independently of the targeted device. It is not a +front-end ML system like TensorFlow or JAX, rather it is a backend framework +that optimizes HLO and lowers to machine code. + +The HLO set of operations is closed and has well defined semantics. HLO +operations operate on immutable Tensors with static shapes (actually bounded +shapes to be exact) and explicit broadcasts. + +[MLIR](https://mlir.llvm.org/) is a compiler infrastructure which intends to +come with "battery included", as such it intends to provide all the blocks +required to assemble graph optimization and codegen pipelines. The longer term +roadmap for MLIR is to provide a +[Tensor Compute Primitive](https://llvm.discourse.group/c/mlir/MLIR-TCP-WG/36) +(TCP) dialect, which should hopefully be general enough to model what HLO +represents today (see +[slides](https://drive.google.com/open?id=1iljcpTQ5NPaMfGpoPDFml1XkYxjK_6A4) and +[recording](https://drive.google.com/open?id=1jSPa8TwPKUt0WuLquGc8OgSUVYJHMvWZ) +for a technical discussion on this topic). + +The work on MLIR-HLO can be seen as a stepping stone towards building TCP, while +integrating intermediate components into XLA itself by relying on the +well-proven HLO IR and introducing more pieces from upstream MLIR +([Linalg](https://mlir.llvm.org/docs/Dialects/Linalg/), +[Vector](https://mlir.llvm.org/docs/Dialects/Vector/), +[GPU](https://mlir.llvm.org/docs/Dialects/GPU/) dialect, ...). +[This document](https://www.tensorflow.org/mlir/xla_gpu_codegen) provides more +information on the current migration of the XLA GPU codegen. + +## MLIR Dialects for XLA-style compilation + +This repository defines three dialects to support a HLO-like compilation +pipeline using MLIR: + +* `chlo`: the "client" HLO dialect, intended to be closer to the frontend + (including implicit broadcast semantics). +* `mhlo`: "meta"-HLO dialect ; similar to `xla_hlo`, but with extensions for + dynamic shape support. +* `lmhlo`: "late"-"meta"-HLO, it is the IR after buffer allocation is + performed. In XLA the buffer allocation is a side-datastructure which keeps + track of these informations, while this separate dialect materializes it in + the IR. + +We describe these in more details below. + +### HLO Client Dialect: `chlo`. + +* It was originaly designed to map the + [XLA client APIs](https://www.tensorflow.org/xla/operation_semantics) (e.g., + ops supports implicit broadcast and roughly modeled on XlaBuilder API) + modulo support for dynamic shapes and additional ops required to support + dynamic client side HLOs. +* Ops can be from either the XlaBuilder or XLA helper functions can be + converted into ops (e.g., given ambiguity in what constitutes these ops, + there is some freedom to decide), the goal of this dialect is to correspond + close to client level and enable a thin layer between client use and op + construction (making it cheap to construct and optimizations on the dialect + close to optimizations on the client ops). + +Entry: + +* The vast majority of old "client" interactions are via the XlaBuilder APIs. + These APIs are used by TF2XLA kernels, JAX, PyTorch bridge and directly. The + legalization path (described below) can also reuse the XlaBuilder's APIs to + construct XLA Client HLO ops directly (this uses MlirXlaBuilder which is a + subclass of XlaBuilder). +* The other entry point is during legalization from TensorFlow ops in the TF + Graph Compiler and other tools (e.g., SavedModel lowering and TFCompile). + +Exit: + +* MHLO +* May be exported to xla::HloInstructionProto by invoking the XlaBuilder APIs + (with regular XlaBuilder) + +The `chlo` dialect started originally as mapping to the XLA client Builder APIs. +It enables it to both be constructed and converted back to existing XLA +interfaces using the XlaBuilder API. Due to the way that translation into and +out of the dialect works, there is no expectation that this dialect roundtrips +to XLA (e.g., it is only intended to be translated to MLIR and then legalized to +another dialect or translated to HloInstructionProto). + +The export approach of reusing the XlaBuilders enables reusing a lot of logic +that was already implemented in terms of computing shapes, inserting broadcasts +etc. + +An important topic here is that XLA Client HLO ops are not a well defined set. +And in particular what some would consider helper functions, others would +consider ops. It should be easy to move between these and so define a new op +along with the helper function or autogenerate the helper functions from the +descriptions of the ops. For the former, a simple approach would be to simply +consider the context in which the op is being constructed and if an MLIR one, +construct a op in the client dialect instead of further calls into XlaBuilder. +The latter could be implemented by adding the op and a legalization of the op to +other known ops, from which a helper function can get generated that could be +used as regular. + +Status: Exists but need to be cleaned up. + +### Meta HLO Dialect `mhlo` + +* Dialect is closer to current HLO server ops (e.g., no implicit broadcast) +* MHLO dialect where we can deviate from the requirements of the client or + server dialect, in particular: + * Control flow ops with implicit capture to enable simpler optimizations + (e.g., generic LICM, unroll & jam, etc.) + * Multiple results ops (e.g., no tuples) + * More ops (for example, unique op or assert op), and ops that don't need + to be added to either client or server dialect. + * Op set not constrained by implementation (e.g., hlo.add operating on say + i79 or !mydialect.weird_type is allowed even though no XLA backend + supports it). Verification on types happening at the boundaries. + * It does not need to preserve some deprecated XLA constructs (e.g. + stateful RNG HLO). + * More dynamic shape support ops without need for updating all + users/backends. +* This dialect enables evolving HLO independently from XLA in order to + experiment with features we'd like to upstream in MLIR TCP. In particular it + intends to be user-extensible through + [interfaces](https://mlir.llvm.org/docs/Interfaces/). +* It should have no TensorFlow, or proto, or other Google internal + dependencies. +* It need not be a complete superset of ops compared to XLA HLO dialect. + +Entry: + +* Legalization from `chlo` dialect or conversion from XLA HLO. +* Directly emitted from TF Graph Compiler; +* Builder call (e.g., EDSL); + +Exit: + +* LMHLO, Linalg IREE, directly used in codegen. +* XLA HLO. + +The MHLO dialect has no direct export format, it is only meant as an +intermediate optimization dialect/format. It is also where we can experiment +cheaply with new ops. This format will be where the representation would differ +from existing end points. + +Status: Exists but need to be cleaned up and evolved, in particular with respect +to supporting dynamic shapes. + +### LMHLO + +LMHLO corresponds to late `mhlo` and operates on buffer domain (e.g., memref) +with side-effecting operations. The lowering from `mhlo` dialect proceeds by way +of scheduling, memory and buffer allocation. The current mapping is directly on +XLA Client HLOs but without implicit broadcast and with operation on memrefs. +This dialect will instead be rebased on `mhlo` dialect but operating on buffers +still. + +Entry: + +* Post buffer assignment on `mhlo` dialect, or from XLA after buffer + assignment. + +Exit: + +* Codegen (LLVM IR in the common cases at the moment) + +## End-to-End pipeline + +TODO