Lower ReduceMean op to Krnl dialect (#318)
* Improve support for krnl.dim (#317) * Reorganize main function. * Follow review comments. * Emit constants are globals in Krnl and LLVM dialects. * Make krnl dim more robust. * Format. * Update comments. * Change pass name. Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Initial Location info support (#302) * NFC: Attribute cleanup (remove references of attributes) (#286) * Define krnl.permute op. * Support krnl.permute operation. * Properly remove loop references. * Re-push, Github was down. * Need to debug interpretOp error. * Fix lowering bug by erasing ops after full krnl IR interpretation is done, and clean up & comment code. * Introduce permute, unroll operations. * More debug. * Remove std::set. * krnl.terminate fails to be converted. * Pass all tests, need to add legal ops as well as part of the conversion target. * Change test format to new permute spec. * Bug fix for nested iterate op lowering. * Simplify error reporting. * Fix compilation error. * Increase comments coverage. * Remove unnecessary imports. * Re-trigger Jenkins * Add permute/unroll tests. * Retrigger Jenkins * remove & (ref) for Attributes Co-authored-by: Tian Jin <tjingrant@gmail.com> Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Syntax highlighting for mlir code in README (#276) * Syntax highlighting for mlir code in README * Restart Jenkins Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Co-authored-by: Tian Jin <tjingrant@gmail.com> Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * use print not dump Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * add semicolon Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * syntax Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * add code to preserve locations Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * format Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Emit the dynamic memory pool (#290) * Reorganize main function. * Follow review comments. * Emit constants are globals in Krnl and LLVM dialects. * Add support for bundling dynamic memory pools. * Add dynamic bundling. * Clean-up code. * Clean-up file. * Add test for bundling dynamic memory pool. * Fixes. Simplify data structure. Add mixed test. * Remove unused import. Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Fix wrong type for llvm::loadop (#293) Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Update llvm commit ID to 1d01fc1 (#292) * Fix for LLVM revision D85495 * Fix for LLVM revision DD86121 * Fix for LLVM revision D85622 (f9dc2b7) TODO: Change preloadDialectsInContext to false Memo for previous fixes: D86121 (250f43d), D85495 (575b22b) * clang-format * Update llvm commit ID of README and clone-mlir.sh * Updated llvm commit ID of README.md * Fix for passing backend tests * Removed the commented code * Empty commit for triggering rebuild * Test multi-stage travis build * Specify stage order. * Empty commit for triggering rebuild * Update prereq.s390x.Dockerfile Make it possible to execute s390x prereq docker multiple times. * Build prereq for each arch * Fix multi-arch prereq build. * timeout at 40m * Update .travis.yml * add ppc64le prereq builder * Run ppc docker prereq build multiple times * Do not test branch update unless it's mater. * Fix dockerfile. * Fix typo in travis.yml. * Fix ppc64 docker file * Update .travis.yml * turn off metacopy on ppc64le * Update .travis.yml * Turn off metacopy. * Turn off metacopy inside Dockerfile in ppc64. * No sudo in Docker. * Remove metacopy config from Dockerfile. * Change base image to be bionic. * Using newer linux distro for ppc64. * Turn off metacopy in before_install. * Fix sudo permission issue. * Run docker info. * Allow amd64 docker file to be built multiple times * Support building amd64 prereq. * Fix amd64 docker file typo. * fix ppc64le dockerfile typo. * timeout from 40m -> 30m * 40m->30m * 40m->30m * fix bug preventing incremental build. * fix bug preventing incremental build. * Bump CircleCI cache version. * Push to production prereq container repository and condition prereq docker rebuild on commit message. * Rebuild prereq docker. * Move default script to top-level. * Python not properly installed. * amd64 -> x86 * Rebuild prereq docker. * Rebuild prereq docker. * Rebuild prereq docker. * Restart all CI. * Disallow cache on Jenkins docker build. * Restart zJenkins. * Restart zJenkins. Co-authored-by: Haruki Imai <imaihal@jp.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Using onnx-mlir through incremental stages (#257) * Add lowering of Vector dialect for lower-all-llvm pass * Fix generating CallOp instructions when return type is void * Fix lowering of memref * Reformat using clang-format * Record more context. * Reflow comments. Co-authored-by: Tian Jin <tjingrant@gmail.com> Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Dropout elimination & Conv Bugfix (#297) * Dropout elimination. * Test VGG19. * Add shufflenet. * Fix grouped convolution bug. * Fix lit test failure. Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Rewrite shape and size OP (#285) * add shape inference * Revert "add shape inference" This reverts commit f9d42f39e68e14b5648abccfc8617fff00244d16. * add rewrite rules * test cases * format * add constraint * response to review * response to review Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * initial code for handling custom ops (#288) * initial code for handling custom ops * format Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * ShapeInference for SizeOp (#299) * add shape inference * Revert "add shape inference" This reverts commit f9d42f39e68e14b5648abccfc8617fff00244d16. * shape inference * test case * format Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Gather ONNX to Kernel Lowering (#294) * Define krnl.permute op. * Support krnl.permute operation. * Properly remove loop references. * Re-push, Github was down. * Need to debug interpretOp error. * Fix lowering bug by erasing ops after full krnl IR interpretation is done, and clean up & comment code. * Introduce permute, unroll operations. * More debug. * Remove std::set. * krnl.terminate fails to be converted. * Pass all tests, need to add legal ops as well as part of the conversion target. * Change test format to new permute spec. * Bug fix for nested iterate op lowering. * Simplify error reporting. * Fix compilation error. * Increase comments coverage. * Remove unnecessary imports. * Re-trigger Jenkins * Add permute/unroll tests. * Retrigger Jenkins * initial implementation of gather * added tests * format * remove affine load for second load, as it uses an indirection * changes suggested by reviewers * remove backend tests until I can verify them locally Co-authored-by: Tian Jin <tjingrant@gmail.com> Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * add lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * fix option spelling Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * braces in wrong place Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * add lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * remove duplicate code from lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * Simplify lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * remove attributes from lit test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * add onnx-mlir-opt to tool names Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * add printIR to second RUN Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * redo adding printIR Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * fix bug Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * format Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> * fix typo in test Signed-off-by: Kevin O'Brien <caomhin@us.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Co-authored-by: Tian Jin <tjingrant@gmail.com> Co-authored-by: Tung D. Le <tung@jp.ibm.com> Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com> Co-authored-by: Haruki Imai <imaihal@jp.ibm.com> Co-authored-by: Kevin Wu <6334443+kwu91@users.noreply.github.com> Co-authored-by: chentong319 <chentong@us.ibm.com> Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Support ReduceMean Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Add lit tests Signed-off-by: Tung D. Le <tung@jp.ibm.com> * Fix unknown dimensions for type f32 Signed-off-by: Tung D. Le <tung@jp.ibm.com> Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com> Co-authored-by: Kevin O'Brien <caomhin@us.ibm.com> Co-authored-by: Alexandre Eichenberger <alexe@us.ibm.com> Co-authored-by: Tian Jin <tjingrant@gmail.com> Co-authored-by: Haruki Imai <imaihal@jp.ibm.com> Co-authored-by: Kevin Wu <6334443+kwu91@users.noreply.github.com> Co-authored-by: chentong319 <chentong@us.ibm.com>
This commit is contained in:
parent
81c774ba5b
commit
6bd9471262
|
@ -37,6 +37,12 @@ Value getIdentityValue<ONNXReduceSumOp>(
|
||||||
return emitConstantOp(rewriter, loc, type, 0);
|
return emitConstantOp(rewriter, loc, type, 0);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
template <>
|
||||||
|
Value getIdentityValue<ONNXReduceMeanOp>(
|
||||||
|
ConversionPatternRewriter &rewriter, Location loc, Type type) {
|
||||||
|
return emitConstantOp(rewriter, loc, type, 0);
|
||||||
|
}
|
||||||
|
|
||||||
// Scalar ops
|
// Scalar ops
|
||||||
template <>
|
template <>
|
||||||
struct ScalarOp<ONNXReduceProdOp> {
|
struct ScalarOp<ONNXReduceProdOp> {
|
||||||
|
@ -50,6 +56,50 @@ struct ScalarOp<ONNXReduceSumOp> {
|
||||||
using IOp = AddIOp;
|
using IOp = AddIOp;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
template <>
|
||||||
|
struct ScalarOp<ONNXReduceMeanOp> {
|
||||||
|
using FOp = AddFOp;
|
||||||
|
using IOp = AddIOp;
|
||||||
|
};
|
||||||
|
|
||||||
|
/// Helper function to get the size of a MemRef in a given type.
|
||||||
|
Value getSizeInType(ConversionPatternRewriter &rewriter, Location loc,
|
||||||
|
Value memRef, Type elementType) {
|
||||||
|
auto shape = memRef.getType().cast<MemRefType>().getShape();
|
||||||
|
|
||||||
|
// We accumulate static dimensions first and then unknown dimensions.
|
||||||
|
int64_t staticNumElement = 1;
|
||||||
|
bool allStaticDimensions = true;
|
||||||
|
|
||||||
|
// 1. Static dimensions.
|
||||||
|
for (unsigned i = 0; i < shape.size(); i++) {
|
||||||
|
if (shape[i] != -1)
|
||||||
|
staticNumElement *= shape[i];
|
||||||
|
else
|
||||||
|
allStaticDimensions = false;
|
||||||
|
}
|
||||||
|
// 2. Unknown dimensions.
|
||||||
|
Value sizeVal = emitConstantOp(rewriter, loc, elementType, staticNumElement);
|
||||||
|
if (!allStaticDimensions) {
|
||||||
|
for (unsigned i = 0; i < shape.size(); i++) {
|
||||||
|
if (shape[i] == -1) {
|
||||||
|
Value index = rewriter.create<DimOp>(loc, memRef, i);
|
||||||
|
if (elementType.isa<FloatType>()) {
|
||||||
|
Value dim =
|
||||||
|
rewriter.create<IndexCastOp>(loc, index, rewriter.getI64Type());
|
||||||
|
dim = rewriter.create<UIToFPOp>(loc, dim, elementType);
|
||||||
|
sizeVal = rewriter.create<MulFOp>(loc, sizeVal, dim);
|
||||||
|
} else if (elementType.isa<IntegerType>()) {
|
||||||
|
Value dim = rewriter.create<IndexCastOp>(loc, index, elementType);
|
||||||
|
sizeVal = rewriter.create<MulIOp>(loc, sizeVal, dim);
|
||||||
|
} else
|
||||||
|
llvm_unreachable("unsupported element type");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return sizeVal;
|
||||||
|
}
|
||||||
|
|
||||||
//===----------------------------------------------------------------------===//
|
//===----------------------------------------------------------------------===//
|
||||||
// Scalar unary ops for lowering ONNXReduceMaxOp
|
// Scalar unary ops for lowering ONNXReduceMaxOp
|
||||||
//===----------------------------------------------------------------------===//
|
//===----------------------------------------------------------------------===//
|
||||||
|
@ -97,8 +147,12 @@ Value emitScalarOpFor<ONNXReduceMinOp>(ConversionPatternRewriter &rewriter,
|
||||||
|
|
||||||
template <typename ONNXReductionOp>
|
template <typename ONNXReductionOp>
|
||||||
struct ONNXReductionOpLowering : public ConversionPattern {
|
struct ONNXReductionOpLowering : public ConversionPattern {
|
||||||
ONNXReductionOpLowering(MLIRContext *ctx)
|
bool computeMean = false;
|
||||||
: ConversionPattern(ONNXReductionOp::getOperationName(), 1, ctx) {}
|
|
||||||
|
ONNXReductionOpLowering(MLIRContext *ctx, bool computeMean = false)
|
||||||
|
: ConversionPattern(ONNXReductionOp::getOperationName(), 1, ctx) {
|
||||||
|
this->computeMean = computeMean;
|
||||||
|
}
|
||||||
|
|
||||||
LogicalResult matchAndRewrite(Operation *op, ArrayRef<Value> operands,
|
LogicalResult matchAndRewrite(Operation *op, ArrayRef<Value> operands,
|
||||||
ConversionPatternRewriter &rewriter) const final {
|
ConversionPatternRewriter &rewriter) const final {
|
||||||
|
@ -123,7 +177,8 @@ struct ONNXReductionOpLowering : public ConversionPattern {
|
||||||
*
|
*
|
||||||
*/
|
*/
|
||||||
auto loc = op->getLoc();
|
auto loc = op->getLoc();
|
||||||
auto memRefInType = operands[0].getType().cast<MemRefType>();
|
auto input = operands[0];
|
||||||
|
auto memRefInType = input.getType().cast<MemRefType>();
|
||||||
auto memRefInShape = memRefInType.getShape();
|
auto memRefInShape = memRefInType.getShape();
|
||||||
auto memRefOutType = convertToMemRefType(*op->result_type_begin());
|
auto memRefOutType = convertToMemRefType(*op->result_type_begin());
|
||||||
int64_t inRank = memRefInType.getRank();
|
int64_t inRank = memRefInType.getRank();
|
||||||
|
@ -165,7 +220,7 @@ struct ONNXReductionOpLowering : public ConversionPattern {
|
||||||
SmallVector<Value, 2> allocOperands;
|
SmallVector<Value, 2> allocOperands;
|
||||||
for (decltype(outRank) i = 0; i < outRank; ++i) {
|
for (decltype(outRank) i = 0; i < outRank; ++i) {
|
||||||
if (memRefOutShape[i] < 0) {
|
if (memRefOutShape[i] < 0) {
|
||||||
auto dim = rewriter.create<DimOp>(loc, operands[0], outInDimMap[i]);
|
auto dim = rewriter.create<DimOp>(loc, input, outInDimMap[i]);
|
||||||
allocOperands.push_back(dim);
|
allocOperands.push_back(dim);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -177,11 +232,12 @@ struct ONNXReductionOpLowering : public ConversionPattern {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// There are two Krnl loops:
|
// There are two required and one optional Krnl loops:
|
||||||
// - One to initialize the result memref, and
|
// - One to initialize the result memref,
|
||||||
// - One to do reduction
|
// - One to do reduction, and
|
||||||
|
// - One to compute mean (optional).
|
||||||
|
|
||||||
// Define loops to initialize the result.
|
// 1. Define loops to initialize the result.
|
||||||
std::vector<Value> originalLoopsInit;
|
std::vector<Value> originalLoopsInit;
|
||||||
defineLoops(rewriter, loc, originalLoopsInit, outRank);
|
defineLoops(rewriter, loc, originalLoopsInit, outRank);
|
||||||
|
|
||||||
|
@ -208,14 +264,15 @@ struct ONNXReductionOpLowering : public ConversionPattern {
|
||||||
getIdentityValue<ONNXReductionOp>(rewriter, loc, elementOutType);
|
getIdentityValue<ONNXReductionOp>(rewriter, loc, elementOutType);
|
||||||
rewriter.create<AffineStoreOp>(loc, identity, alloc, loopIVs);
|
rewriter.create<AffineStoreOp>(loc, identity, alloc, loopIVs);
|
||||||
|
|
||||||
// Define an Krnl loop to do reduction.
|
// 2. Define an Krnl loop to do reduction.
|
||||||
rewriter.setInsertionPointAfter(iterateOpInit);
|
rewriter.setInsertionPointAfter(iterateOpInit);
|
||||||
|
auto ipMainRegion = rewriter.saveInsertionPoint();
|
||||||
std::vector<Value> originalLoops;
|
std::vector<Value> originalLoops;
|
||||||
defineLoops(rewriter, loc, originalLoops, inRank);
|
defineLoops(rewriter, loc, originalLoops, inRank);
|
||||||
// Iteration information
|
// Iteration information
|
||||||
KrnlIterateOperandPack pack(rewriter, originalLoops);
|
KrnlIterateOperandPack pack(rewriter, originalLoops);
|
||||||
for (decltype(inRank) i = 0; i < inRank; ++i) {
|
for (decltype(inRank) i = 0; i < inRank; ++i) {
|
||||||
addDimensionToPack(rewriter, loc, pack, operands[0], i);
|
addDimensionToPack(rewriter, loc, pack, input, i);
|
||||||
}
|
}
|
||||||
auto iterateOp = rewriter.create<KrnlIterateOp>(loc, pack);
|
auto iterateOp = rewriter.create<KrnlIterateOp>(loc, pack);
|
||||||
Block &iterationBlock = iterateOp.bodyRegion().front();
|
Block &iterationBlock = iterateOp.bodyRegion().front();
|
||||||
|
@ -245,12 +302,44 @@ struct ONNXReductionOpLowering : public ConversionPattern {
|
||||||
}
|
}
|
||||||
|
|
||||||
Value next, accumulated;
|
Value next, accumulated;
|
||||||
next = rewriter.create<AffineLoadOp>(loc, operands[0], inLoopIVs);
|
next = rewriter.create<AffineLoadOp>(loc, input, inLoopIVs);
|
||||||
accumulated = rewriter.create<AffineLoadOp>(loc, alloc, outLoopIVs);
|
accumulated = rewriter.create<AffineLoadOp>(loc, alloc, outLoopIVs);
|
||||||
accumulated = emitScalarOpFor<ONNXReductionOp>(
|
accumulated = emitScalarOpFor<ONNXReductionOp>(
|
||||||
rewriter, loc, op, memRefOutType.getElementType(), {accumulated, next});
|
rewriter, loc, op, memRefOutType.getElementType(), {accumulated, next});
|
||||||
rewriter.create<AffineStoreOp>(loc, accumulated, alloc, outLoopIVs);
|
rewriter.create<AffineStoreOp>(loc, accumulated, alloc, outLoopIVs);
|
||||||
|
|
||||||
|
// 3. Define an Krnl loop to compute mean (optional).
|
||||||
|
rewriter.restoreInsertionPoint(ipMainRegion);
|
||||||
|
if (computeMean) {
|
||||||
|
Type elementType = memRefOutType.getElementType();
|
||||||
|
// Compute the divisor that is the number of elements participated in
|
||||||
|
// reduction, i.e., 'divisor = size of input / size of output'
|
||||||
|
Value inputSize = getSizeInType(rewriter, loc, input, elementType);
|
||||||
|
Value outputSize = getSizeInType(rewriter, loc, alloc, elementType);
|
||||||
|
Value divisor;
|
||||||
|
if (elementType.isa<FloatType>())
|
||||||
|
divisor = rewriter.create<DivFOp>(loc, inputSize, outputSize);
|
||||||
|
else if (elementType.isa<IntegerType>())
|
||||||
|
divisor = rewriter.create<SignedDivIOp>(loc, inputSize, outputSize);
|
||||||
|
else
|
||||||
|
llvm_unreachable("unsupported element type");
|
||||||
|
|
||||||
|
// Compute mean
|
||||||
|
BuildKrnlLoop meanLoops(rewriter, loc, outRank);
|
||||||
|
meanLoops.createDefineAndIterateOp(alloc);
|
||||||
|
rewriter.setInsertionPointToStart(meanLoops.getIterateBlock());
|
||||||
|
auto meanIVs = meanLoops.getAllInductionVar();
|
||||||
|
auto loadData = rewriter.create<AffineLoadOp>(loc, alloc, meanIVs);
|
||||||
|
Value meanVal;
|
||||||
|
if (elementType.isa<FloatType>())
|
||||||
|
meanVal = rewriter.create<DivFOp>(loc, loadData, divisor);
|
||||||
|
else if (elementType.isa<IntegerType>())
|
||||||
|
meanVal = rewriter.create<SignedDivIOp>(loc, loadData, divisor);
|
||||||
|
else
|
||||||
|
llvm_unreachable("unsupported element type");
|
||||||
|
rewriter.create<AffineStoreOp>(loc, meanVal, alloc, meanIVs);
|
||||||
|
}
|
||||||
|
|
||||||
rewriter.replaceOp(op, alloc);
|
rewriter.replaceOp(op, alloc);
|
||||||
return success();
|
return success();
|
||||||
}
|
}
|
||||||
|
@ -262,4 +351,6 @@ void populateLoweringONNXReductionOpPattern(
|
||||||
ONNXReductionOpLowering<mlir::ONNXReduceMinOp>,
|
ONNXReductionOpLowering<mlir::ONNXReduceMinOp>,
|
||||||
ONNXReductionOpLowering<mlir::ONNXReduceProdOp>,
|
ONNXReductionOpLowering<mlir::ONNXReduceProdOp>,
|
||||||
ONNXReductionOpLowering<mlir::ONNXReduceSumOp>>(ctx);
|
ONNXReductionOpLowering<mlir::ONNXReduceSumOp>>(ctx);
|
||||||
|
patterns.insert<ONNXReductionOpLowering<mlir::ONNXReduceMeanOp>>(
|
||||||
|
ctx, /*computeMean=*/true);
|
||||||
}
|
}
|
||||||
|
|
|
@ -71,7 +71,6 @@ struct ONNXSplitOpLowering : public ConversionPattern {
|
||||||
// Create loop.
|
// Create loop.
|
||||||
BuildKrnlLoop outputLoops(rewriter, loc, rank);
|
BuildKrnlLoop outputLoops(rewriter, loc, rank);
|
||||||
outputLoops.createDefineAndIterateOp(allocs[i]);
|
outputLoops.createDefineAndIterateOp(allocs[i]);
|
||||||
outputLoops.createIterateOp();
|
|
||||||
rewriter.setInsertionPointToStart(outputLoops.getIterateBlock());
|
rewriter.setInsertionPointToStart(outputLoops.getIterateBlock());
|
||||||
// Indices for the read and write.
|
// Indices for the read and write.
|
||||||
SmallVector<Value, 4> readIndices;
|
SmallVector<Value, 4> readIndices;
|
||||||
|
|
|
@ -271,4 +271,9 @@ BlockArgument &BuildKrnlLoop::getInductionVar(int originalLoopIndex) {
|
||||||
return iterBlock->getArguments()[originalLoopIndex];
|
return iterBlock->getArguments()[originalLoopIndex];
|
||||||
}
|
}
|
||||||
|
|
||||||
|
ArrayRef<BlockArgument> BuildKrnlLoop::getAllInductionVar() {
|
||||||
|
return ArrayRef<BlockArgument>(
|
||||||
|
iterBlock->getArguments().begin(), iterBlock->getArguments().end());
|
||||||
|
}
|
||||||
|
|
||||||
} // namespace mlir
|
} // namespace mlir
|
||||||
|
|
|
@ -186,6 +186,9 @@ public:
|
||||||
// index. Use the index returned when pushing the bounds.
|
// index. Use the index returned when pushing the bounds.
|
||||||
BlockArgument &getInductionVar(int originalLoopIndex);
|
BlockArgument &getInductionVar(int originalLoopIndex);
|
||||||
|
|
||||||
|
// Get all of the (original loop) induction variables.
|
||||||
|
ArrayRef<BlockArgument> getAllInductionVar();
|
||||||
|
|
||||||
// Get a reference to the code region of the optimization operation.
|
// Get a reference to the code region of the optimization operation.
|
||||||
// This allows us to set the insertion point to the inner block of the
|
// This allows us to set the insertion point to the inner block of the
|
||||||
// loop nest optimization operation.
|
// loop nest optimization operation.
|
||||||
|
|
|
@ -290,6 +290,16 @@ test_to_enable = [
|
||||||
"test_reduce_sum_square_negative_axes_keepdims_example_cpu",
|
"test_reduce_sum_square_negative_axes_keepdims_example_cpu",
|
||||||
"test_reduce_sum_square_negative_axes_keepdims_random_cpu",
|
"test_reduce_sum_square_negative_axes_keepdims_random_cpu",
|
||||||
|
|
||||||
|
# ReduceMean
|
||||||
|
"test_reduce_mean_default_axes_keepdims_example_cpu",
|
||||||
|
"test_reduce_mean_default_axes_keepdims_random_cpu",
|
||||||
|
"test_reduce_mean_do_not_keepdims_example_cpu",
|
||||||
|
"test_reduce_mean_do_not_keepdims_random_cpu",
|
||||||
|
"test_reduce_mean_keepdims_example_cpu",
|
||||||
|
"test_reduce_mean_keepdims_random_cpu",
|
||||||
|
"test_reduce_mean_negative_axes_keepdims_example_cpu",
|
||||||
|
"test_reduce_mean_negative_axes_keepdims_random_cpu",
|
||||||
|
|
||||||
# Selu Op:
|
# Selu Op:
|
||||||
"test_selu_cpu",
|
"test_selu_cpu",
|
||||||
"test_selu_default_cpu",
|
"test_selu_default_cpu",
|
||||||
|
|
|
@ -709,7 +709,7 @@ func @test_reducemax(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
|
||||||
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
||||||
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
||||||
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
||||||
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
// CHECK: [[CMP:%.+]] = cmpf "ogt", [[LOAD2]], [[LOAD1]] : f32
|
// CHECK: [[CMP:%.+]] = cmpf "ogt", [[LOAD2]], [[LOAD1]] : f32
|
||||||
// CHECK: [[SELECT:%.+]] = select [[CMP]], [[LOAD2]], [[LOAD1]] : f32
|
// CHECK: [[SELECT:%.+]] = select [[CMP]], [[LOAD2]], [[LOAD1]] : f32
|
||||||
// CHECK: store [[SELECT]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: store [[SELECT]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
|
@ -733,7 +733,7 @@ func @test_reducemin(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
|
||||||
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
||||||
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
||||||
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
||||||
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
// CHECK: [[CMP:%.+]] = cmpf "olt", [[LOAD2]], [[LOAD1]] : f32
|
// CHECK: [[CMP:%.+]] = cmpf "olt", [[LOAD2]], [[LOAD1]] : f32
|
||||||
// CHECK: [[SELECT:%.+]] = select [[CMP]], [[LOAD2]], [[LOAD1]] : f32
|
// CHECK: [[SELECT:%.+]] = select [[CMP]], [[LOAD2]], [[LOAD1]] : f32
|
||||||
// CHECK: affine.store [[SELECT]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: affine.store [[SELECT]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
|
@ -757,7 +757,7 @@ func @test_reduceprod(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
|
||||||
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
||||||
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
||||||
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
||||||
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
// CHECK: [[REDUCE:%.+]] = mulf [[LOAD2]], [[LOAD1]] : f32
|
// CHECK: [[REDUCE:%.+]] = mulf [[LOAD2]], [[LOAD1]] : f32
|
||||||
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
// CHECK: }
|
// CHECK: }
|
||||||
|
@ -780,13 +780,116 @@ func @test_reducesum(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
|
||||||
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
||||||
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
||||||
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
||||||
// CHECK: [[LOAD2:%.+]] = affine.load %0[%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
// CHECK: [[REDUCE:%.+]] = addf [[LOAD2]], [[LOAD1]] : f32
|
// CHECK: [[REDUCE:%.+]] = addf [[LOAD2]], [[LOAD1]] : f32
|
||||||
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
// CHECK: }
|
// CHECK: }
|
||||||
// CHECK: return [[RES]] : memref<3x2xf32>
|
// CHECK: return [[RES]] : memref<3x2xf32>
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// -----
|
||||||
|
|
||||||
|
/// Check ReduceMean with f32.
|
||||||
|
func @test_reducemean_f32(%arg0 : tensor<3x2x2xf32>) -> tensor<*xf32> {
|
||||||
|
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x2x2xf32>)-> tensor<*xf32>
|
||||||
|
"std.return"(%0) : (tensor<*xf32>) -> ()
|
||||||
|
|
||||||
|
// CHECK-LABEL: test_reducemean_f32
|
||||||
|
// CHECK: [[RES:%.+]] = alloc() : memref<3x2xf32>
|
||||||
|
// CHECK: [[DEF_LOOPS1:%.+]]:2 = krnl.define_loops 2
|
||||||
|
// CHECK: krnl.iterate([[DEF_LOOPS1]]#0, [[DEF_LOOPS1]]#1) with ([[DEF_LOOPS1]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS1]]#1 -> %arg2 = 0 to 2) {
|
||||||
|
// CHECK: [[IDENTITY:%.+]] = constant 0.000000e+00 : f32
|
||||||
|
// CHECK: affine.store [[IDENTITY]], [[RES]][%arg1, %arg2] : memref<3x2xf32>
|
||||||
|
|
||||||
|
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
||||||
|
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
||||||
|
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xf32>
|
||||||
|
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
|
// CHECK: [[REDUCE:%.+]] = addf [[LOAD2]], [[LOAD1]] : f32
|
||||||
|
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xf32>
|
||||||
|
// CHECK: }
|
||||||
|
|
||||||
|
// CHECK: [[INPUT_SIZE:%.+]] = constant 1.200000e+01 : f32
|
||||||
|
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6.000000e+00 : f32
|
||||||
|
// CHECK: [[DIVISOR:%.+]] = divf [[INPUT_SIZE]], [[OUTPUT_SIZE]] : f32
|
||||||
|
// CHECK: [[DEF_MEAN_LOOPS:%.+]]:2 = krnl.define_loops 2
|
||||||
|
// CHECK: krnl.iterate([[DEF_MEAN_LOOPS]]#0, [[DEF_MEAN_LOOPS]]#1) with ([[DEF_MEAN_LOOPS]]#0 -> %arg1 = 0 to 3, [[DEF_MEAN_LOOPS]]#1 -> %arg2 = 0 to 2) {
|
||||||
|
// CHECK: [[LOAD3:%.+]] = affine.load [[RES]][%arg1, %arg2] : memref<3x2xf32>
|
||||||
|
// CHECK: [[MEAN:%.+]] = divf [[LOAD3]], [[DIVISOR]] : f32
|
||||||
|
// CHECK: affine.store [[MEAN]], [[RES]][%arg1, %arg2] : memref<3x2xf32>
|
||||||
|
// CHECK: }
|
||||||
|
// CHECK: return [[RES]] : memref<3x2xf32>
|
||||||
|
}
|
||||||
|
|
||||||
|
// -----
|
||||||
|
|
||||||
|
/// Check ReduceMean with i32.
|
||||||
|
func @test_reducemean_i32(%arg0 : tensor<3x2x2xi32>) -> tensor<*xi32> {
|
||||||
|
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x2x2xi32>)-> tensor<*xi32>
|
||||||
|
"std.return"(%0) : (tensor<*xi32>) -> ()
|
||||||
|
|
||||||
|
// CHECK-LABEL: test_reducemean_i32
|
||||||
|
// CHECK: [[RES:%.+]] = alloc() : memref<3x2xi32>
|
||||||
|
// CHECK: [[DEF_LOOPS1:%.+]]:2 = krnl.define_loops 2
|
||||||
|
// CHECK: krnl.iterate([[DEF_LOOPS1]]#0, [[DEF_LOOPS1]]#1) with ([[DEF_LOOPS1]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS1]]#1 -> %arg2 = 0 to 2) {
|
||||||
|
// CHECK: [[IDENTITY:%.+]] = constant 0 : i32
|
||||||
|
// CHECK: affine.store [[IDENTITY]], [[RES]][%arg1, %arg2] : memref<3x2xi32>
|
||||||
|
|
||||||
|
// CHECK: [[DEF_LOOPS2:%.+]]:3 = krnl.define_loops 3
|
||||||
|
// CHECK: krnl.iterate([[DEF_LOOPS2]]#0, [[DEF_LOOPS2]]#1, [[DEF_LOOPS2]]#2) with ([[DEF_LOOPS2]]#0 -> %arg1 = 0 to 3, [[DEF_LOOPS2]]#1 -> %arg2 = 0 to 2, [[DEF_LOOPS2]]#2 -> %arg3 = 0 to 2) {
|
||||||
|
// CHECK: [[LOAD1:%.+]] = affine.load %arg0[%arg1, %arg2, %arg3] : memref<3x2x2xi32>
|
||||||
|
// CHECK: [[LOAD2:%.+]] = affine.load [[RES]][%arg1, %arg3] : memref<3x2xi32>
|
||||||
|
// CHECK: [[REDUCE:%.+]] = addi [[LOAD2]], [[LOAD1]] : i32
|
||||||
|
// CHECK: affine.store [[REDUCE]], [[RES]][%arg1, %arg3] : memref<3x2xi32>
|
||||||
|
// CHECK: }
|
||||||
|
|
||||||
|
// CHECK: [[INPUT_SIZE:%.+]] = constant 12 : i32
|
||||||
|
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6 : i32
|
||||||
|
// CHECK: [[DIVISOR:%.+]] = divi_signed [[INPUT_SIZE]], [[OUTPUT_SIZE]] : i32
|
||||||
|
// CHECK: [[DEF_MEAN_LOOPS:%.+]]:2 = krnl.define_loops 2
|
||||||
|
// CHECK: krnl.iterate([[DEF_MEAN_LOOPS]]#0, [[DEF_MEAN_LOOPS]]#1) with ([[DEF_MEAN_LOOPS]]#0 -> %arg1 = 0 to 3, [[DEF_MEAN_LOOPS]]#1 -> %arg2 = 0 to 2) {
|
||||||
|
// CHECK: [[LOAD3:%.+]] = affine.load [[RES]][%arg1, %arg2] : memref<3x2xi32>
|
||||||
|
// CHECK: [[MEAN:%.+]] = divi_signed [[LOAD3]], [[DIVISOR]] : i32
|
||||||
|
// CHECK: affine.store [[MEAN]], [[RES]][%arg1, %arg2] : memref<3x2xi32>
|
||||||
|
// CHECK: }
|
||||||
|
// CHECK: return [[RES]] : memref<3x2xi32>
|
||||||
|
}
|
||||||
|
|
||||||
|
// -----
|
||||||
|
|
||||||
|
/// Check computing the divisor in ReduceMean
|
||||||
|
/// when the input has unknown dimensions and is of i32.
|
||||||
|
func @test_reducemean_i32_unknown_dims(%arg0 : tensor<3x?x2xi32>) -> tensor<*xi32> {
|
||||||
|
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x?x2xi32>)-> tensor<*xi32>
|
||||||
|
"std.return"(%0) : (tensor<*xi32>) -> ()
|
||||||
|
// CHECK-LABEL: test_reducemean_i32_unknown_dims
|
||||||
|
// CHECK: [[INPUT_SIZE_CONSTANT:%.+]] = constant 6 : i32
|
||||||
|
// CHECK: [[ONE:%.+]] = constant 1 : index
|
||||||
|
// CHECK: [[DIM:%.+]] = dim %arg0, [[ONE]] : memref<3x?x2xi32>
|
||||||
|
// CHECK: [[UNKNOWN_DIM:%.+]] = index_cast [[DIM]] : index to i32
|
||||||
|
// CHECK: [[INPUT_SIZE:%.+]] = muli [[INPUT_SIZE_CONSTANT]], [[UNKNOWN_DIM]] : i32
|
||||||
|
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6 : i32
|
||||||
|
// CHECK: [[DIVISOR:%.+]] = divi_signed [[INPUT_SIZE]], [[OUTPUT_SIZE]] : i32
|
||||||
|
}
|
||||||
|
|
||||||
|
// -----
|
||||||
|
|
||||||
|
/// Check computing the divisor in ReduceMean
|
||||||
|
/// when the input has unknown dimensions and is of f32.
|
||||||
|
func @test_reducemean_f32_unknown_dims(%arg0 : tensor<3x?x2xf32>) -> tensor<*xf32> {
|
||||||
|
%0 ="onnx.ReduceMean"(%arg0) {axes=[1], keepdims = 0 : si64} : (tensor<3x?x2xf32>)-> tensor<*xf32>
|
||||||
|
"std.return"(%0) : (tensor<*xf32>) -> ()
|
||||||
|
// CHECK-LABEL: test_reducemean_f32_unknown_dims
|
||||||
|
// CHECK: [[INPUT_SIZE_CONSTANT:%.+]] = constant 6.000000e+00 : f32
|
||||||
|
// CHECK: [[ONE:%.+]] = constant 1 : index
|
||||||
|
// CHECK: [[DIM:%.+]] = dim %arg0, [[ONE]] : memref<3x?x2xf32>
|
||||||
|
// CHECK: [[UNKNOWN_DIM_i64:%.+]] = index_cast [[DIM]] : index to i64
|
||||||
|
// CHECK: [[UNKNOWN_DIM:%.+]] = uitofp [[UNKNOWN_DIM_i64]] : i64 to f32
|
||||||
|
// CHECK: [[INPUT_SIZE:%.+]] = mulf [[INPUT_SIZE_CONSTANT]], [[UNKNOWN_DIM]] : f32
|
||||||
|
// CHECK: [[OUTPUT_SIZE:%.+]] = constant 6.000000e+00 : f32
|
||||||
|
// CHECK: [[DIVISOR:%.+]] = divf [[INPUT_SIZE]], [[OUTPUT_SIZE]] : f32
|
||||||
|
}
|
||||||
|
|
||||||
// -----
|
// -----
|
||||||
|
|
||||||
func @test_softmax(%arg0 : tensor<10x10xf32>) -> tensor<*xf32> {
|
func @test_softmax(%arg0 : tensor<10x10xf32>) -> tensor<*xf32> {
|
||||||
|
|
Loading…
Reference in New Issue