Previously this would be too strict and fail if dynamic and static dims were
compared. Dynamic/unknown are treated as "maybe equal" to a static value without further info, so at this layer don't flag as invalid unless truly are.
PiperOrigin-RevId: 360189086
This op is useful for rank specialization of broadcasts. Kernel Generator
needs to generate one kernel for each rank, so if we can minimize the rank
of the broadcast shape, we can support more cases with the same number of
special-cased kernels.
PiperOrigin-RevId: 360137827
- Cleanup CMakeLists file to remove unused argument and use a new function for
setting up lmhlo and lmhlo_gpu dialect targets.
- Fix inconsistently formatted copyright comment and fix header include guards.
PiperOrigin-RevId: 358865838
- XLA:HLO -> LMHLO conversion drops all token arguments and return values, however
custom calls that users write still expect to get buffer pointers for these token types.
- To be able to support this, add an optional call target argument mapping attribute to
LMHLO custom calls. When this attribute is present, it indicates the number of
arguments and returns that the custom call expects and also indicates which LMHLO
arg() or output() maps to which arg or result number of the custom call.
PiperOrigin-RevId: 358826664
`mhlo.dot_general` and `mhlo.convolution` result element type might be different from operand element type. See `preferred_element_type` attribute that allows i8xi8 to i32 dot computation. `mhlo` to HLO exporter should pass the result element type to Xla builder to override the shape inference of XLA.
PiperOrigin-RevId: 358580718
Also remove BitcastOp. XLA bitcast requires the input buffer to alias the output buffer, which makes bitcast always a no-op.
PiperOrigin-RevId: 356884383
- Use a common base class to for AllReduce, AllGather, and AllToAll in the ODS spec.
- Add basic verification for replica groups attribute.
PiperOrigin-RevId: 354969654
This is the right behavior for TF and JAX and matches what TF does on GPU. It
doesn't match TF on CPU, but that's really a TF bug.
PiperOrigin-RevId: 353657779
This is the right behavior for TF and JAX and matches what TF does on GPU. It
doesn't match TF on CPU, but that's really a TF bug.
PiperOrigin-RevId: 353628258
This is the right behavior for TF and JAX and matches what TF does on GPU. It
doesn't match TF on CPU, but that's really a TF bug.
PiperOrigin-RevId: 353624935
Allow for relative tolerance in unary kernel tests. In case of the cosh kernels,
this allows to accept an observed difference of 5.6e-8 between the kernel and
the `std::cosh` reference (32829984.568665262 vs. 32829984.568665318) in one of
the test cases.
PiperOrigin-RevId: 351983698