Lower ReduceMean op to Krnl dialect (#318)

* Improve support for krnl.dim (#317)

* Reorganize main function.

* Follow review comments.

* Emit constants are globals in Krnl and LLVM dialects.

* Make krnl dim more robust.

* Format.

* Update comments.

* Change pass name.

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

* Initial Location info support (#302)

* NFC: Attribute cleanup (remove references of attributes)  (#286)

* Define krnl.permute op.

* Support krnl.permute operation.

* Properly remove loop references.

* Re-push, Github was down.

* Need to debug interpretOp error.

* Fix lowering bug by erasing ops after full krnl IR interpretation is done, and clean up & comment code.

* Introduce permute, unroll operations.

* More debug.

* Remove std::set.

* krnl.terminate fails to be converted.

* Pass all tests, need to add legal ops as well as part of the conversion target.

* Change test format to new permute spec.

* Bug fix for nested iterate op lowering.

* Simplify error reporting.

* Fix compilation error.

* Increase comments coverage.

* Remove unnecessary imports.

* Re-trigger Jenkins

* Add permute/unroll tests.

* Retrigger Jenkins

* remove & (ref) for Attributes

Co-authored-by: Tian Jin <tjingrant@gmail.com>
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Syntax highlighting for mlir code in README (#276)

* Syntax highlighting for mlir code in README

* Restart Jenkins

Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Co-authored-by: Tian Jin <tjingrant@gmail.com>
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* use print not dump

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* add semicolon

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* syntax

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* add code to preserve locations

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* format

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Emit the dynamic memory pool (#290)

* Reorganize main function.

* Follow review comments.

* Emit constants are globals in Krnl and LLVM dialects.

* Add support for bundling dynamic memory pools.

* Add dynamic bundling.

* Clean-up code.

* Clean-up file.

* Add test for bundling dynamic memory pool.

* Fixes. Simplify data structure. Add mixed test.

* Remove unused import.

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Fix wrong type for llvm::loadop (#293)

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Update llvm commit ID to 1d01fc1 (#292)

* Fix for LLVM revision D85495

* Fix for LLVM revision DD86121

* Fix for LLVM revision D85622 (f9dc2b7)
TODO: Change preloadDialectsInContext to false

Memo for previous fixes: D86121 (250f43d), D85495 (575b22b)

* clang-format

* Update llvm commit ID of README and clone-mlir.sh

* Updated llvm commit ID of README.md

* Fix for passing backend tests

* Removed the commented code

* Empty commit for triggering rebuild

* Test multi-stage travis build

* Specify stage order.

* Empty commit for triggering rebuild

* Update prereq.s390x.Dockerfile

Make it possible to execute s390x prereq docker multiple times.

* Build prereq for each arch

* Fix multi-arch prereq build.

* timeout at 40m

* Update .travis.yml

* add ppc64le prereq builder

* Run ppc docker prereq build multiple times

* Do not test branch update unless it's mater.

* Fix dockerfile.

* Fix typo in travis.yml.

* Fix ppc64 docker file

* Update .travis.yml

* turn off metacopy on ppc64le

* Update .travis.yml

* Turn off metacopy.

* Turn off metacopy inside Dockerfile in ppc64.

* No sudo in Docker.

* Remove metacopy config from Dockerfile.

* Change base image to be bionic.

* Using newer linux distro for ppc64.

* Turn off metacopy in before_install.

* Fix sudo permission issue.

* Run docker info.

* Allow amd64 docker file to be built multiple times

* Support building amd64 prereq.

* Fix amd64 docker file typo.

* fix ppc64le dockerfile typo.

* timeout from 40m -> 30m

* 40m->30m

* 40m->30m

* fix bug preventing incremental build.

* fix bug preventing incremental build.

* Bump CircleCI cache version.

* Push to production prereq container repository and condition prereq docker rebuild on commit message.

* Rebuild prereq docker.

* Move default script to top-level.

* Python not properly installed.

* amd64 -> x86

* Rebuild prereq docker.

* Rebuild prereq docker.

* Rebuild prereq docker.

* Restart all CI.

* Disallow cache on Jenkins docker build.

* Restart zJenkins.

* Restart zJenkins.

Co-authored-by: Haruki Imai <imaihal@jp.ibm.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Using onnx-mlir through incremental stages (#257)

* Add lowering of Vector dialect for lower-all-llvm pass

* Fix generating CallOp instructions when return type is void

* Fix lowering of memref

* Reformat using clang-format

* Record more context.

* Reflow comments.

Co-authored-by: Tian Jin <tjingrant@gmail.com>
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Dropout elimination & Conv Bugfix (#297)

* Dropout elimination.

* Test VGG19.

* Add shufflenet.

* Fix grouped convolution bug.

* Fix lit test failure.

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Rewrite shape and size OP  (#285)

* add shape inference

* Revert "add shape inference"

This reverts commit f9d42f39e68e14b5648abccfc8617fff00244d16.

* add rewrite rules

* test cases

* format

* add constraint

* response to review

* response to review

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* initial code for handling custom ops (#288)

* initial code for handling custom ops

* format

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* ShapeInference for SizeOp (#299)

* add shape inference

* Revert "add shape inference"

This reverts commit f9d42f39e68e14b5648abccfc8617fff00244d16.

* shape inference

* test case

* format

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Gather ONNX to Kernel Lowering (#294)

* Define krnl.permute op.

* Support krnl.permute operation.

* Properly remove loop references.

* Re-push, Github was down.

* Need to debug interpretOp error.

* Fix lowering bug by erasing ops after full krnl IR interpretation is done, and clean up & comment code.

* Introduce permute, unroll operations.

* More debug.

* Remove std::set.

* krnl.terminate fails to be converted.

* Pass all tests, need to add legal ops as well as part of the conversion target.

* Change test format to new permute spec.

* Bug fix for nested iterate op lowering.

* Simplify error reporting.

* Fix compilation error.

* Increase comments coverage.

* Remove unnecessary imports.

* Re-trigger Jenkins

* Add permute/unroll tests.

* Retrigger Jenkins

* initial implementation of gather

* added tests

* format

* remove affine load for second load, as it uses an indirection

* changes suggested by reviewers

* remove backend tests until I can verify them locally

Co-authored-by: Tian Jin <tjingrant@gmail.com>
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* add lit test
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* fix option spelling
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* braces in wrong place
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* add lit test
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* remove duplicate code from lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* Simplify lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* remove attributes from lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* add onnx-mlir-opt to tool names
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* add printIR to second RUN
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* redo adding printIR
Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* fix bug

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* format

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

* fix typo in test

Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com>

Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Co-authored-by: Tian Jin <tjingrant@gmail.com>
Co-authored-by: Tung D. Le <tung@jp.ibm.com>
Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com>
Co-authored-by: Haruki Imai <imaihal@jp.ibm.com>
Co-authored-by: Kevin Wu <6334443+kwu91@users.noreply.github.com>
Co-authored-by: chentong319 <chentong@us.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>

* Support ReduceMean

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

* Add lit tests

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

* Fix unknown dimensions for type f32

Signed-off-by: Tung D. Le <tung@jp.ibm.com>

Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com>
Co-authored-by: Kevin O'Brien <caomhin@us.ibm.com>
Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com>
Co-authored-by: Tian Jin <tjingrant@gmail.com>
Co-authored-by: Haruki Imai <imaihal@jp.ibm.com>
Co-authored-by: Kevin Wu <6334443+kwu91@users.noreply.github.com>
Co-authored-by: chentong319 <chentong@us.ibm.com>
This commit is contained in:
Tung D. Le 2020-10-11 11:09:42 +09:00 committed by GitHub
parent 81c774ba5b
commit 6bd9471262
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 228 additions and 17 deletions

View File

@ -37,6 +37,12 @@ Value getIdentityValue<ONNXReduceSumOp>(
return emitConstantOp(rewriter, loc, type, 0);
}
template <>
Value getIdentityValue<ONNXReduceMeanOp>(
ConversionPatternRewriter &rewriter, Location loc, Type type) {
return emitConstantOp(rewriter, loc, type, 0);
}
// Scalar ops
template <>
struct ScalarOp<ONNXReduceProdOp> {
@ -50,6 +56,50 @@ struct ScalarOp<ONNXReduceSumOp> {
using IOp = AddIOp;
};
template <>
struct ScalarOp<ONNXReduceMeanOp> {
using FOp = AddFOp;
using IOp = AddIOp;
};
/// Helper function to get the size of a MemRef in a given type.
Value getSizeInType(ConversionPatternRewriter &rewriter, Location loc,
Value memRef, Type elementType) {
auto shape = memRef.getType().cast<MemRefType>().getShape();
// We accumulate static dimensions first and then unknown dimensions.
int64_t staticNumElement = 1;
bool allStaticDimensions = true;
// 1. Static dimensions.
for (unsigned i = 0; i < shape.size(); i++) {
if (shape[i] != -1)
staticNumElement *= shape[i];
else
allStaticDimensions = false;
}
// 2. Unknown dimensions.
Value sizeVal = emitConstantOp(rewriter, loc, elementType, staticNumElement);
if (!allStaticDimensions) {
for (unsigned i = 0; i < shape.size(); i++) {
if (shape[i] == -1) {
Value index = rewriter.create<DimOp>(loc, memRef, i);
if (elementType.isa<FloatType>()) {
Value dim =
rewriter.create<IndexCastOp>(loc, index, rewriter.getI64Type());
dim = rewriter.create<UIToFPOp>(loc, dim, elementType);
sizeVal = rewriter.create<MulFOp>(loc, sizeVal, dim);
} else if (elementType.isa<IntegerType>()) {
Value dim = rewriter.create<IndexCastOp>(loc, index, elementType);
sizeVal = rewriter.create<MulIOp>(loc, sizeVal, dim);
} else
llvm_unreachable("unsupported element type");
}
}
}
return sizeVal;
}
//===----------------------------------------------------------------------===//
// Scalar unary ops for lowering ONNXReduceMaxOp
//===----------------------------------------------------------------------===//
@ -97,8 +147,12 @@ Value emitScalarOpFor<ONNXReduceMinOp>(ConversionPatternRewriter &rewriter,
template <typename ONNXReductionOp>
struct ONNXReductionOpLowering : public ConversionPattern {
ONNXReductionOpLowering(MLIRContext *ctx)
: ConversionPattern(ONNXReductionOp::getOperationName(), 1, ctx) {}
bool computeMean = false;
ONNXReductionOpLowering(MLIRContext *ctx, bool computeMean = false)
: ConversionPattern(ONNXReductionOp::getOperationName(), 1, ctx) {
this->computeMean = computeMean;
}
LogicalResult matchAndRewrite(Operation *op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const final {
@ -123,7 +177,8 @@ struct ONNXReductionOpLowering : public ConversionPattern {
*
*/
auto loc = op->getLoc();
auto memRefInType = operands[0].getType().cast<MemRefType>();
auto input = operands[0];
auto memRefInType = input.getType().cast<MemRefType>();
auto memRefInShape = memRefInType.getShape();
auto memRefOutType = convertToMemRefType(*op->result_type_begin());
int64_t inRank = memRefInType.getRank();
@ -165,7 +220,7 @@ struct ONNXReductionOpLowering : public ConversionPattern {
SmallVector<Value, 2> allocOperands;
for (decltype(outRank) i = 0; i < outRank; ++i) {
if (memRefOutShape[i] < 0) {
auto dim = rewriter.create<DimOp>(loc, operands[0], outInDimMap[i]);
auto dim = rewriter.create<DimOp>(loc, input, outInDimMap[i]);
allocOperands.push_back(dim);
}
}
@ -177,11 +232,12 @@ struct ONNXReductionOpLowering : public ConversionPattern {
}
}
// There are two Krnl loops:
// - One to initialize the result memref, and
// - One to do reduction
// There are two required and one optional Krnl loops:
// - One to initialize the result memref,
// - One to do reduction, and
// - One to compute mean (optional).
// Define loops to initialize the result.
// 1. Define loops to initialize the result.
std::vector<Value> originalLoopsInit;
defineLoops(rewriter, loc, originalLoopsInit, outRank);
@ -208,14 +264,15 @@ struct ONNXReductionOpLowering : public ConversionPattern {
getIdentityValue<ONNXReductionOp>(rewriter, loc, elementOutType);
rewriter.create<AffineStoreOp>(loc, identity, alloc, loopIVs);
// Define an Krnl loop to do reduction.
// 2. Define an Krnl loop to do reduction.
rewriter.setInsertionPointAfter(iterateOpInit);
auto ipMainRegion = rewriter.saveInsertionPoint();
std::vector<Value> originalLoops;
defineLoops(rewriter, loc, originalLoops, inRank);
// Iteration information
KrnlIterateOperandPack pack(rewriter, originalLoops);
for (decltype(inRank) i = 0; i < inRank; ++i) {
addDimensionToPack(rewriter, loc, pack, operands[0], i);
addDimensionToPack(rewriter, loc, pack, input, i);
}
auto iterateOp = rewriter.create<KrnlIterateOp>(loc, pack);
Block &iterationBlock = iterateOp.bodyRegion().front();
@ -245,12 +302,44 @@ struct ONNXReductionOpLowering : public ConversionPattern {
}
Value next, accumulated;
next = rewriter.create<AffineLoadOp>(loc, operands[0], inLoopIVs);
next = rewriter.create<AffineLoadOp>(loc, input, inLoopIVs);
accumulated = rewriter.create<AffineLoadOp>(loc, alloc, outLoopIVs);
accumulated = emitScalarOpFor<ONNXReductionOp>(
rewriter, loc, op, memRefOutType.getElementType(), {accumulated, next});
rewriter.create<AffineStoreOp>(loc, accumulated, alloc, outLoopIVs);
// 3. Define an Krnl loop to compute mean (optional).
rewriter.restoreInsertionPoint(ipMainRegion);
if (computeMean) {
Type elementType = memRefOutType.getElementType();
// Compute the divisor that is the number of elements participated in
// reduction, i.e., 'divisor = size of input / size of output'
Value inputSize = getSizeInType(rewriter, loc, input, elementType);
Value outputSize = getSizeInType(rewriter, loc, alloc, elementType);
Value divisor;
if (elementType.isa<FloatType>())
divisor = rewriter.create<DivFOp>(loc, inputSize, outputSize);
else if (elementType.isa<IntegerType>())
divisor = rewriter.create<SignedDivIOp>(loc, inputSize, outputSize);
else
llvm_unreachable("unsupported element type");
// Compute mean
BuildKrnlLoop meanLoops(rewriter, loc, outRank);
meanLoops.createDefineAndIterateOp(alloc);
rewriter.setInsertionPointToStart(meanLoops.getIterateBlock());
auto meanIVs = meanLoops.getAllInductionVar();
auto loadData = rewriter.create<AffineLoadOp>(loc, alloc, meanIVs);
Value meanVal;
if (elementType.isa<FloatType>())
meanVal = rewriter.create<DivFOp>(loc, loadData, divisor);
else if (elementType.isa<IntegerType>())
meanVal = rewriter.create<SignedDivIOp>(loc, loadData, divisor);
else
llvm_unreachable("unsupported element type");
rewriter.create<AffineStoreOp>(loc, meanVal, alloc, meanIVs);
}
rewriter.replaceOp(op, alloc);
return success();
}
@ -262,4 +351,6 @@ void populateLoweringONNXReductionOpPattern(
ONNXReductionOpLowering<mlir::ONNXReduceMinOp>,
ONNXReductionOpLowering<mlir::ONNXReduceProdOp>,
ONNXReductionOpLowering<mlir::ONNXReduceSumOp>>(ctx);
patterns.insert<ONNXReductionOpLowering<mlir::ONNXReduceMeanOp>>(
ctx, /*computeMean=*/true);
}

View File

@ -71,7 +71,6 @@ struct ONNXSplitOpLowering : public ConversionPattern {
// Create loop.
BuildKrnlLoop outputLoops(rewriter, loc, rank);
outputLoops.createDefineAndIterateOp(allocs[i]);
outputLoops.createIterateOp();
rewriter.setInsertionPointToStart(outputLoops.getIterateBlock());
// Indices for the read and write.
SmallVector<Value, 4> readIndices;

View File

@ -271,4 +271,9 @@ BlockArgument &BuildKrnlLoop::getInductionVar(int originalLoopIndex) {
return iterBlock->getArguments()[originalLoopIndex];
}
ArrayRef<BlockArgument> BuildKrnlLoop::getAllInductionVar() {
return ArrayRef<BlockArgument>(
iterBlock->getArguments().begin(), iterBlock->getArguments().end());
}
} // namespace mlir

View File

@ -186,6 +186,9 @@ public:
// index. Use the index returned when pushing the bounds.
BlockArgument &getInductionVar(int originalLoopIndex);
// Get all of the (original loop) induction variables.
ArrayRef<BlockArgument> getAllInductionVar();
// Get a reference to the code region of the optimization operation.
// This allows us to set the insertion point to the inner block of the
// loop nest optimization operation.

View File

@ -290,6 +290,16 @@ test_to_enable = [
"test_reduce_sum_square_negative_axes_keepdims_example_cpu",
"test_reduce_sum_square_negative_axes_keepdims_random_cpu",
# ReduceMean
"test_reduce_mean_default_axes_keepdims_example_cpu",
"test_reduce_mean_default_axes_keepdims_random_cpu",
"test_reduce_mean_do_not_keepdims_example_cpu",
"test_reduce_mean_do_not_keepdims_random_cpu",
"test_reduce_mean_keepdims_example_cpu",
"test_reduce_mean_keepdims_random_cpu",
"test_reduce_mean_negative_axes_keepdims_example_cpu",
"test_reduce_mean_negative_axes_keepdims_random_cpu",
# Selu Op:
"test_selu_cpu",
"test_selu_default_cpu",

View File

@ -709,7 +709,7 @@ func @test_reducemax(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[CMP:%.+]] = cmpf "ogt", [[LOAD2]], [[LOAD1]] : f32
// CHECK: [[SELECT:%.+]] = select [[CMP]], [[LOAD2]], [[LOAD1]] : f32
// CHECK: store [[SELECT]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
@ -733,7 +733,7 @@ func @test_reducemin(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[CMP:%.+]] = cmpf "olt", [[LOAD2]], [[LOAD1]] : f32
// CHECK: [[SELECT:%.+]] = select [[CMP]], [[LOAD2]], [[LOAD1]] : f32
// CHECK: affine.store [[SELECT]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
@ -757,7 +757,7 @@ func @test_reduceprod(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[REDUCE:%.+]] = mulf [[LOAD2]], [[LOAD1]] : f32
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: }
@ -780,13 +780,116 @@ func @test_reducesum(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[REDUCE:%.+]] = addf [[LOAD2]], [[LOAD1]] : f32
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: }
// CHECK: return [[RES]] : memref<3x2xf32>
}
// -----
/// Check ReduceMean with f32.
func @test_reducemean_f32(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x2x2xf32>)-> tensor<*xf32>
"std.return"(%0) : (tensor<*xf32>) -> ()
// CHECK-LABEL: test_reducemean_f32
// CHECK: [[RES:%.+]] = alloc() : memref<3x2xf32>
// CHECK: [[DEF_LOOPS1:%.+]]:2 = krnl.define_loops 2
// CHECK: krnl.iterate([[DEF_LOOPS1]]#0, [[DEF_LOOPS1]]#1) with ([[DEF_LOOPS1]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS1]]#1 -> %arg2 = 0 to 2) {
// CHECK: [[IDENTITY:%.+]] = constant 0.000000e+00 : f32
// CHECK: affine.store [[IDENTITY]], [[RES]][%arg1, %arg2] : memref<3x2xf32>
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: [[REDUCE:%.+]] = addf [[LOAD2]], [[LOAD1]] : f32
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
// CHECK: }
// CHECK: [[INPUT_SIZE:%.+]] = constant 1.200000e+01 : f32
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6.000000e+00 : f32
// CHECK: [[DIVISOR:%.+]] = divf [[INPUT_SIZE]], [[OUTPUT_SIZE]] : f32
// CHECK: [[DEF_MEAN_LOOPS:%.+]]:2 = krnl.define_loops 2
// CHECK: krnl.iterate([[DEF_MEAN_LOOPS]]#0, [[DEF_MEAN_LOOPS]]#1) with ([[DEF_MEAN_LOOPS]]#0 -> %arg1 = 0 to 3, [[DEF_MEAN_LOOPS]]#1 -> %arg2 = 0 to 2) {
// CHECK: [[LOAD3:%.+]] = affine.load [[RES]][%arg1, %arg2] : memref<3x2xf32>
// CHECK: [[MEAN:%.+]] = divf [[LOAD3]], [[DIVISOR]] : f32
// CHECK: affine.store [[MEAN]], [[RES]][%arg1, %arg2] : memref<3x2xf32>
// CHECK: }
// CHECK: return [[RES]] : memref<3x2xf32>
}
// -----
/// Check ReduceMean with i32.
func @test_reducemean_i32(%arg0 : tensor<3x2x2xi32>) -> tensor<*xi32> {
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x2x2xi32>)-> tensor<*xi32>
"std.return"(%0) : (tensor<*xi32>) -> ()
// CHECK-LABEL: test_reducemean_i32
// CHECK: [[RES:%.+]] = alloc() : memref<3x2xi32>
// CHECK: [[DEF_LOOPS1:%.+]]:2 = krnl.define_loops 2
// CHECK: krnl.iterate([[DEF_LOOPS1]]#0, [[DEF_LOOPS1]]#1) with ([[DEF_LOOPS1]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS1]]#1 -> %arg2 = 0 to 2) {
// CHECK: [[IDENTITY:%.+]] = constant 0 : i32
// CHECK: affine.store [[IDENTITY]], [[RES]][%arg1, %arg2] : memref<3x2xi32>
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xi32>
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xi32>
// CHECK: [[REDUCE:%.+]] = addi [[LOAD2]], [[LOAD1]] : i32
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xi32>
// CHECK: }
// CHECK: [[INPUT_SIZE:%.+]] = constant 12 : i32
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6 : i32
// CHECK: [[DIVISOR:%.+]] = divi_signed [[INPUT_SIZE]], [[OUTPUT_SIZE]] : i32
// CHECK: [[DEF_MEAN_LOOPS:%.+]]:2 = krnl.define_loops 2
// CHECK: krnl.iterate([[DEF_MEAN_LOOPS]]#0, [[DEF_MEAN_LOOPS]]#1) with ([[DEF_MEAN_LOOPS]]#0 -> %arg1 = 0 to 3, [[DEF_MEAN_LOOPS]]#1 -> %arg2 = 0 to 2) {
// CHECK: [[LOAD3:%.+]] = affine.load [[RES]][%arg1, %arg2] : memref<3x2xi32>
// CHECK: [[MEAN:%.+]] = divi_signed [[LOAD3]], [[DIVISOR]] : i32
// CHECK: affine.store [[MEAN]], [[RES]][%arg1, %arg2] : memref<3x2xi32>
// CHECK: }
// CHECK: return [[RES]] : memref<3x2xi32>
}
// -----
/// Check computing the divisor in ReduceMean
/// when the input has unknown dimensions and is of i32.
func @test_reducemean_i32_unknown_dims(%arg0 : tensor<3x?x2xi32>) -> tensor<*xi32> {
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x?x2xi32>)-> tensor<*xi32>
"std.return"(%0) : (tensor<*xi32>) -> ()
// CHECK-LABEL: test_reducemean_i32_unknown_dims
// CHECK: [[INPUT_SIZE_CONSTANT:%.+]] = constant 6 : i32
// CHECK: [[ONE:%.+]] = constant 1 : index
// CHECK: [[DIM:%.+]] = dim %arg0, [[ONE]] : memref<3x?x2xi32>
// CHECK: [[UNKNOWN_DIM:%.+]] = index_cast [[DIM]] : index to i32
// CHECK: [[INPUT_SIZE:%.+]] = muli [[INPUT_SIZE_CONSTANT]], [[UNKNOWN_DIM]] : i32
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6 : i32
// CHECK: [[DIVISOR:%.+]] = divi_signed [[INPUT_SIZE]], [[OUTPUT_SIZE]] : i32
}
// -----
/// Check computing the divisor in ReduceMean
/// when the input has unknown dimensions and is of f32.
func @test_reducemean_f32_unknown_dims(%arg0 : tensor<3x?x2xf32>) -> tensor<*xf32> {
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x?x2xf32>)-> tensor<*xf32>
"std.return"(%0) : (tensor<*xf32>) -> ()
// CHECK-LABEL: test_reducemean_f32_unknown_dims
// CHECK: [[INPUT_SIZE_CONSTANT:%.+]] = constant 6.000000e+00 : f32
// CHECK: [[ONE:%.+]] = constant 1 : index
// CHECK: [[DIM:%.+]] = dim %arg0, [[ONE]] : memref<3x?x2xf32>
// CHECK: [[UNKNOWN_DIM_i64:%.+]] = index_cast [[DIM]] : index to i64
// CHECK: [[UNKNOWN_DIM:%.+]] = uitofp [[UNKNOWN_DIM_i64]] : i64 to f32
// CHECK: [[INPUT_SIZE:%.+]] = mulf [[INPUT_SIZE_CONSTANT]], [[UNKNOWN_DIM]] : f32
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6.000000e+00 : f32
// CHECK: [[DIVISOR:%.+]] = divf [[INPUT_SIZE]], [[OUTPUT_SIZE]] : f32
}
// -----
func @test_softmax(%arg0 : tensor<10x10xf32>) -> tensor<*xf32> {