This is consistent with the design of LMHLO FusionOp, and it simplifies the
usage. Before the change, those redundant operands ended up unused as all sub-regions can already capture needed buffers.
PiperOrigin-RevId: 362381155
- Extract verification of source target pairs attached to collective permute into a common
helper function and use that to verify both MHLO and LMHLO variants.
- Change MlirGpuTestBase::ParseMlirModule to allow returning back a failure, and use
that to update the mlir_gpu_compile_test to check the new behavior.
PiperOrigin-RevId: 362156962
For now, the pass only reifies the required shape computations. Moving
broadcasts will follow to allow for fusion across them.
PiperOrigin-RevId: 362033715
Previously this would be too strict and fail if dynamic and static dims were
compared. Dynamic/unknown are treated as "maybe equal" to a static value without further info, so at this layer don't flag as invalid unless truly are.
PiperOrigin-RevId: 360189086
This op is useful for rank specialization of broadcasts. Kernel Generator
needs to generate one kernel for each rank, so if we can minimize the rank
of the broadcast shape, we can support more cases with the same number of
special-cased kernels.
PiperOrigin-RevId: 360137827
- Cleanup CMakeLists file to remove unused argument and use a new function for
setting up lmhlo and lmhlo_gpu dialect targets.
- Fix inconsistently formatted copyright comment and fix header include guards.
PiperOrigin-RevId: 358865838
- XLA:HLO -> LMHLO conversion drops all token arguments and return values, however
custom calls that users write still expect to get buffer pointers for these token types.
- To be able to support this, add an optional call target argument mapping attribute to
LMHLO custom calls. When this attribute is present, it indicates the number of
arguments and returns that the custom call expects and also indicates which LMHLO
arg() or output() maps to which arg or result number of the custom call.
PiperOrigin-RevId: 358826664
`mhlo.dot_general` and `mhlo.convolution` result element type might be different from operand element type. See `preferred_element_type` attribute that allows i8xi8 to i32 dot computation. `mhlo` to HLO exporter should pass the result element type to Xla builder to override the shape inference of XLA.
PiperOrigin-RevId: 358580718
Also remove BitcastOp. XLA bitcast requires the input buffer to alias the output buffer, which makes bitcast always a no-op.
PiperOrigin-RevId: 356884383
- Use a common base class to for AllReduce, AllGather, and AllToAll in the ODS spec.
- Add basic verification for replica groups attribute.
PiperOrigin-RevId: 354969654
This is the right behavior for TF and JAX and matches what TF does on GPU. It
doesn't match TF on CPU, but that's really a TF bug.
PiperOrigin-RevId: 353657779
This is the right behavior for TF and JAX and matches what TF does on GPU. It
doesn't match TF on CPU, but that's really a TF bug.
PiperOrigin-RevId: 353628258
This is the right behavior for TF and JAX and matches what TF does on GPU. It
doesn't match TF on CPU, but that's really a TF bug.
PiperOrigin-RevId: 353624935
Allow for relative tolerance in unary kernel tests. In case of the cosh kernels,
this allows to accept an observed difference of 5.6e-8 between the kernel and
the `std::cosh` reference (32829984.568665262 vs. 32829984.568665318) in one of
the test cases.
PiperOrigin-RevId: 351983698
For floating point operations, this uses std.pow.
For integer operations, this lowers to a loop.
This adds a dependency on scf.
PiperOrigin-RevId: 348537232
Shape inference in case of ops with complex element types need to use the element type of complex as the result element type and not the full operand type.
Before:
"mhlo.abs"(%arg0) : (tensor<4xcomplex<f32>>) -> tensor<4xtensor<4xcomplex<f32>>>
After:
"mhlo.abs"(%arg0) : (tensor<4xcomplex<f32>>) -> tensor<4xf32>
PiperOrigin-RevId: 348123967
- MLIR MemRefs do not preserve layout information correctly when unit dimensions
are involved. Operations like convolution that use cuDNN however need the correct
layout to be preserved so that we do not end up creating an incompatible combination
of input/filter/output layout that is not supported by cuDNN.
- Add these layouts to convolution attributes in the form of I32ArrayAttr for representing
the layout in "minor_to_major" form similar to XLA.
PiperOrigin-RevId: 348034757
It didn't return 0 for 0.0 and -0.0.
Currently we emit -0.0 for -0.0 which is correct according to the HLO dialect.
For the TF_SignOp we should emit 0.0 in that case, we will leave that as a TODO.
Enable the tests which work now, and add another one for Int64.
Also improve the registration code, we should not register the Int32 kernel.
PiperOrigin-RevId: 347981124
It didn't return 0 for 0.0 and -0.0.
Currently we emit -0.0 for -0.0 which is correct according to the HLO dialect.
For the TF_SignOp we should emit 0.0 in that case, we will leave that as a TODO.
Enable the tests which work now, and add another one for Int64.
Also improve the registration code, we should not register the Int32 kernel.
PiperOrigin-RevId: 347602378
It didn't return 0 for 0.0 and -0.0.
Currently we emit -0.0 for -0.0 which is correct according to the HLO dialect.
For the TF_SignOp we should emit 0.0 in that case, we will leave that as a TODO.
Enable the tests which work now, and add another one for Int64.
Also improve the registration code, we should not register the Int32 kernel.
PiperOrigin-RevId: 347590340
- Add this attribute to match the corresponding XLA HLO attribute on convolution
operations.
- A true value indicates a reversal of the corresponding kernel spatial dimension.
- Since XLA builder does not support this attribute, use a custom HLO converted to map
from mlir::mhlo::ConvOp to XLA.
PiperOrigin-RevId: 346891737
- Add a variant of the fused convolution that does not need a side input and side input scale.
- Rename the existing one to `ConvForwardFusedSideInputOp`.
- Update tests to exercise all variants of the convolution ops in the GPU dialect.
- Eliminate unused `LHLO_ExtentBuffer` and changed LHLO_Buffer to allow any integer element
type to match what XLA can generate sometimes for scratch buffers.
PiperOrigin-RevId: 345701569
- Split out enum definitions in hlo dialect into a separate .td file (similar to structs)
and generate enum decl/defs for these enums.
- Also split out the LHLO GPU enums into a separate .td file and generate enum
decl/defs for these enums as well.
- Remove unused dialect from ConvolutionAttributes and generate lhlo_gpu enums.
- Add appropriate namespace for all the enums.
PiperOrigin-RevId: 345277240
- Map Custom call for GEMM in XLA HLO to Gemm/Gemm bias operations in LHLO GPU
dialect.
- Make 'algorithm' an optional attribute to better match with XLA HLO backend config.
- Replace 'alpha' with 'alpha_real' and 'alpha_complex' to support complex GEMM correctly.
- Generate GemmThunk off of LHLO GPU Gemm operations.
PiperOrigin-RevId: 345250840
- Restructured LHLO GPU Cholesky to better match XLA HLO by eliminating the
untyped buffer and changing is_upper attribute to is_lower.
- Change LhloDialectEmitter to emit LHLO GPU Cholesky operation.
PiperOrigin-RevId: 343873516