[RFC] Doc-check utility. (#12)

* 1. Implement doc-check utility.

* 1. Move ONNF installation script to a standalone script file.

* 1. Modify build script to install llvm-project next to ONNF. The build script used to install llvm-project inside ONNF, which didn't make sense.

* 1. Check out code to ONNF directory.

* 1. Pass path parameter correctly.

* 1. Debugging buildbot.

* 1. Remove debug code.

* 1. Update installation instructions in README.md.
2. Enforce consistency with scripts used in testing using doc-check.

* 1. Fix error with respect to syntax to build multiple CMake targets.

* 1. Move doc-check to doc_check.
2. Remove directive_config in top-level driver.

* 1. Build onnf and check-mlir-lit separately because only CMake 3.15+ supports building multiple targets in one cmake --build run.

* 1. Use new env variables to locate LLVM-Project.

* 1. Documentation nits.

* 1. Prettify buildbot scripts.

* 1. Fix build script error.

* 1. Support exclude_dirs in DocCheck.
2. Add README for DocCheck.

* 1. Mark python3 interpreter as required.
2. Use imported interpreter target.

* 1. Automatically deduce doc file extension in DocCheckCtx.
2. Rename ctx.open -> ctx.open_doc since it should only be used to open doc file.
3. Always read line in parser, instead of reading lines in driver and then passing it to parser.py.

* 1. Rename parser -> doc_parser due to name conflict with python built-in module.
2. Explose doc_check module directory first before importing; otherwise if the doc_check utility is invoked by other script, importing will not work correctly.

* 1. Keep renaming parser -> doc_parser.
2. Explicitly define a default configuration parser that parses the configuration into a python dictionary.

* 1. Add test for doc-check.
2. Exclude doc-check tests from project dock-check because base directory is different.

* 1. Raise ValueError if directive configuration fails to parse.
2. Format code.

* Shorten test case documentation.
Show example of using same-as-file directive, check with DocCheck.

* 1. Shorten test case documentation.
2. More documentation, check documentation with DocCheck.

* 1. Add copyright notice.

* 1. Make documentation clearer.
2. Prettify build-scripts.

* 1. Provide more documentation.
2. Fix some non-compliance with pep8 recommendations.

Co-authored-by: Gheorghe-Teodor Bercea <gt.bercea@gmail.com>
This commit is contained in:
Tian Jin 2020-01-09 18:35:52 -05:00 committed by GitHub
parent 7607edefe9
commit 1ebcc2eb64
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
22 changed files with 576 additions and 19 deletions

View File

@ -5,11 +5,6 @@ jobs:
- image: circleci/python
steps:
- checkout
- run:
name: Pull Submodules
command: |
git submodule update --init --recursive
- run:
name: Installing GCC, CMake, Ninja, Protobuf
command: sudo apt-get update && sudo apt-get install -y gcc g++ cmake ninja-build protobuf-compiler
@ -23,19 +18,25 @@ jobs:
# mlir-opt executable exists.
if [ ! -f llvm-project/build/bin/mlir-opt ]; then
export MAKEFLAGS=-j4
source .circleci/install-mlir.sh
source utils/install-mlir.sh
fi
- save_cache:
key: V2-LLVM-PROJECT-{{ arch }}
paths:
- llvm-project
- checkout:
path: ONNF
- run:
name: Pull Submodules
command: |
cd ONNF
git submodule update --init --recursive
- run:
name: Install ONNF
command: |
mkdir build && cd build
LLVM_PROJ_SRC=$(pwd)/../llvm-project/ LLVM_PROJ_BUILD=$(pwd)/../llvm-project/build cmake ..
make all
LIT_OPTS=-v make check-mlir-lit
command: source ONNF/utils/install-onnf.sh
- run:
name: Run DocCheck
command: cd ONNF/build && cmake --build . --target check-doc
- run:
name: Print the Current Time
command: date

View File

@ -25,6 +25,6 @@ add_subdirectory(third_party/pybind11)
set(CMAKE_CXX_STANDARD 14)
add_subdirectory(src)
add_subdirectory(doc)
add_subdirectory(test)

View File

@ -5,19 +5,44 @@ Open Neural Network Frontend : an ONNX frontend for MLIR.
## Installation
We assume an existing installation of MLIR. The LLVM-Project repo commit hash we used to test against is 9b6ad8466bb8b97082b705270603ad7f4559e931 and the MLIR repo commit hash we used is 0710266d0f56cf6ab0f437badbd7416b6cecdf5f.
Firstly, install MLIR (as a part of LLVM-Project):
[same-as-file]: <> (utils/install-mlir.sh)
``` bash
git clone https://github.com/llvm/llvm-project.git
mkdir llvm-project/build
cd llvm-project/build
cmake -G Ninja ../llvm \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_BUILD_EXAMPLES=ON \
-DLLVM_TARGETS_TO_BUILD="host" \
-DCMAKE_BUILD_TYPE=Release \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_ENABLE_RTTI=ON
cmake --build . --target check-mlir -- ${MAKEFLAGS}
```
Two environment variables need to be set:
- LLVM_SRC should point to the llvm src directory (e.g., llvm-project/llvm).
- LLVM_BUILD should point to the llvm build directory (e.g., llvm-project/build).
- LLVM_PROJ_SRC should point to the llvm src directory (e.g., llvm-project/llvm).
- LLVM_PROJ_BUILD should point to the llvm build directory (e.g., llvm-project/build).
To build ONNF, use the following command:
[same-as-file]: <> ({"ref": "utils/install-onnf.sh", "skip-doc": 2})
```
git clone --recursive git@github.com:clang-ykt/ONNF.git
mkdir build
cd build
# Export environment variables pointing to LLVM-Projects.
export LLVM_PROJ_SRC=$(pwd)/llvm-project/
export LLVM_PROJ_BUILD=$(pwd)/llvm-project/build
mkdir ONNF/build && cd ONNF/build
cmake ..
cmake --build . --target all
cmake --build . --target onnf
# Run FileCheck tests:
cmake --build . --target check-mlir-lit
```
After the above commands succeed, an `onnf` executable should appear in the `bin` directory.
@ -63,4 +88,3 @@ module {
}
}
```

1
doc/CMakeLists.txt Normal file
View File

@ -0,0 +1 @@
add_subdirectory(doc_check)

View File

@ -0,0 +1,9 @@
find_package(Python3 REQUIRED
COMPONENTS Interpreter)
add_custom_target(check-doc
COMMAND Python3::Interpreter
${CMAKE_CURRENT_SOURCE_DIR}/check.py
${CMAKE_SOURCE_DIR}
--exclude_dirs third_party doc/doc_check/test)

77
doc/doc_check/README.md Normal file
View File

@ -0,0 +1,77 @@
# DocCheck
### Goal
DocCheck provides a set of utilities to enforce invariant properties of artifacts (e.g., code snippets or
output of execution) presented in the software documentation. They can be used to ensure that these
artifacts are always compatible and up-to-date with the state of software development.
### Directives
DocCheck provides a set of directives that can be used in documentations to enforce desired invariants.
A directive is a comment with a specific format/syntax to communicate the intent to check certain invariants to the
DocCheck checker. Generally, a directive has the following syntax in markdown:
```markdown
[{directive}]: <> ({configuration})
```
Where {directive} specifies the type of invariance checking intended and {configuration} expresses the specific
parameters of this directive. In general, a directive configuration is expressed using a python dictionary literal,
but special shorthands exist for each directive individually.
##### `same-as-file`:
Use `same-as-file` directive to ensure that the code section following this directive is the same as a source file.
This is useful primarily because testing code snippet in documentation directly is often impossible. However,
unit tests can be written utilizing an exact copy of the code snippet content. We can use `same-as-file` directive
to ensure the code snippet is always the same as its copy used in some unit tests,
`same-as-file` directive supports a convenient short-hand configuration format where the directive configuration can be fully specified using the name of the reference file to check against.
For example, to ensure a code snippet is the same as a unit-tested file `reference.cpp`, use the following directive as shown in the documentation snippet:
[same-as-file]: <> (doc/doc_check/test/same-as-file/simple/README.md)
````markdown
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
[same-as-file]: <> (reference.cpp)
```cpp
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}
```
````
In the canonical form of directive configuration (as a python dictionary literal), this directive supports these parameters in it:
`ref` (string): reference file to check against.
`skip-doc` (int): number of lines to skip when checking the documentation.
`skip-ref` (int): number of lines to skip when scanning the reference file.
For example, to ensure the following code snippet is the same as a unit-tested file `reference.cpp`, except for the first 2 lines of the code used in documentation, and the first 3 lines of code used in the reference file, the following directive configuration can be used:
[same-as-file]: <> (doc/doc_check/test/same-as-file/skip-doc-ref/README.md)
````markdown
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
[same-as-file]: <> ({"ref": "reference.cpp", "skip-doc": 2, "skip-ref": 3})
```cpp
// First line unique to documentation
// Second line unique to documentation
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}
```
````

51
doc/doc_check/check.py Normal file
View File

@ -0,0 +1,51 @@
# ===-------------------- check.py - Documentation Checker ----------------===//
#
# Copyright 2019-2020 The IBM Research Authors.
#
# =============================================================================
#
# ===----------------------------------------------------------------------===//
import argparse
import os, sys
from pathlib import Path
from utils import setup_logger, DocCheckerCtx
logger = setup_logger("doc-check")
# Make common utilities visible by adding them to system paths.
doc_check_base_dir = os.path.dirname(os.path.realpath(__file__))
sys.path.append(doc_check_base_dir)
from doc_parser import try_parse_and_handle_directive
parser = argparse.ArgumentParser()
parser.add_argument(
"root_dir",
help="directory in which to look for documentation to operate on")
parser.add_argument('--exclude_dirs', nargs='+',
help='a set of directories to exclude, with path specified relative to root_dir', default=[])
def main(root_dir, exclude_dirs):
for i, exclude_dir in enumerate(exclude_dirs):
exclude_dirs[i] = os.path.join(root_dir, exclude_dir)
ctx = DocCheckerCtx(root_dir)
for doc_file in Path(root_dir).rglob('*.md'):
# Skip, if doc file is in directories to be excluded.
if any([str(doc_file).startswith(exclude_dir) for exclude_dir in exclude_dirs]):
continue
logger.info("Checking {}...".format(doc_file))
with ctx.open_doc(doc_file) as markdown_file:
while not markdown_file.eof():
try_parse_and_handle_directive(ctx)
if __name__ == "__main__":
args = parser.parse_args()
main(**vars(args))

View File

@ -0,0 +1,81 @@
# ===----------------- directive.py - Directive Base Class ----------------===//
#
# Copyright 2019-2020 The IBM Research Authors.
#
# =============================================================================
#
# ===----------------------------------------------------------------------===//
import re
import ast
from typing import List, Dict, Callable, Any, Pattern, Tuple
from doc_parser import failure, success, succeeded
from utils import DocCheckerCtx
DirectiveConfigList = List[Dict[str, Any]]
ConfigParseResult = Tuple[str, Dict[str, Any]]
class Directive(object):
""""""
def __init__(self, ext_to_regexes: Dict[str, str],
config_parsers: List[Callable[[str, DirectiveConfigList],
ConfigParseResult]],
handler: Callable[[Dict[str, Any], DocCheckerCtx], None]):
"""
:param ext_to_regexes: specify a regex expression to match the directive (for each file extension type).
:param config_parsers: specify a list of parsers to parse configuration. They will be invoked in order until one indicates parsing is successful.
:param handler: a function to perform the invariance check specified by the directive.
"""
self.ext_to_patterns: Dict[str, Pattern] = {}
for ext, pattern in ext_to_regexes.items():
self.ext_to_patterns[ext] = re.compile(pattern)
self.config_parsers: List[Callable[[str, DirectiveConfigList],
ConfigParseResult]] = config_parsers
self.handler = handler
def try_parse_directive(
self, ctx: DocCheckerCtx,
directive_config: DirectiveConfigList) -> Tuple[str, Any]:
"""
:param ctx: parser context.
:param directive_config: a list used to output parsed directive configuration.
:return: parse result.
"""
line = ctx.doc_file.next_non_empty_line()
matches = self.ext_to_patterns[ctx.doc_file_ext()].findall(line)
if len(matches) > 1:
raise ValueError("more than one directives in a line")
match = matches[0] if len(matches) else None
if match:
for parser in self.config_parsers:
if succeeded(parser(match, directive_config)):
return success()
raise ValueError("Failed to parse configuration.")
else:
return failure()
def handle(self, config, ctx):
self.handler(config, ctx)
def generic_config_parser(
match: str, directive_config: DirectiveConfigList) -> Tuple[str, Any]:
"""
Generic configuration parser.
Will return success if and only if configuration is specified as a python dictionary literal.
@param match: the content from which to parse the directive configuration.
@param directive_config: a list to output the parsed directive_config.
@return: parsing result.
"""
try:
directive_config.append(ast.literal_eval(match))
return success()
except (SyntaxError, ValueError):
# If literal_eval failed, return parsing failure.
return failure()

View File

@ -0,0 +1,52 @@
# ===-------------------------- utils.py - Utility ------------------------===//
#
# Copyright 2019-2020 The IBM Research Authors.
#
# =============================================================================
#
# ===----------------------------------------------------------------------===//
import logging
logger = logging.getLogger('doc-check')
from doc_parser import *
from utils import *
def parse(line, directive_configs):
directive_configs.append({'ref': line})
return success()
def handle(config, ctx):
logger.debug("Handling a same-as-file directive with config {}".format(config))
ref_file_path = config["ref"]
doc_file = ctx.doc_file
parse_code_section_delimiter(ctx)
with WrappedFile(open(
os.path.join(ctx.root_dir, ref_file_path))) as ref_file:
doc_file.skip_lines(config.get("skip-doc", 0))
ref_file.skip_lines(config.get("skip-ref", 0))
while not ref_file.eof():
ref_line = ref_file.readline().rstrip('\r\n')
doc_line = doc_file.readline().rstrip('\r\n')
loc = (doc_file.f.name, doc_file.line,
ref_file_path, ref_file.line)
loc_info = "\ndoc file {}, line no. {}. ref file {}, line no. {}.".format(*loc)
if doc_file.eof():
raise ValueError("Check failed because doc file is "
"shorter than reference file." + loc_info)
if ref_line != doc_line:
doc_line_info = "\nDoc line : {}".format(doc_line)
ref_line_info = "\nReference line: {}".format(ref_line)
raise ValueError(
"Check failed because doc file content is not "
"the same as that of reference file." +
doc_line_info + ref_line_info)
parse_code_section_delimiter(ctx)
ext_to_patterns = {'.md': '\\[same-as-file\\]: <> \\(([^)]*)\\)'}

View File

@ -0,0 +1,35 @@
# ===------------ doc_parser.py - Documentation Parsing Utility ------------===//
#
# Copyright 2019-2020 The IBM Research Authors.
#
# =============================================================================
#
# ===----------------------------------------------------------------------===//
from typing import List
from utils import *
from directive import Directive, generic_config_parser
def parse_code_section_delimiter(ctx):
assert ctx.doc_file_ext() == ".md"
if not ctx.doc_file.next_non_empty_line().strip().startswith("```"):
raise ValueError("Did not parse a code section delimiter")
def try_parse_and_handle_directive(ctx):
from directive_impl import same_as_file
# Register all directives.
all_directives: List[Directive] = [
Directive(same_as_file.ext_to_patterns, [generic_config_parser, same_as_file.parse], same_as_file.handle)
]
for directive in all_directives:
directive_config = []
if succeeded(directive.try_parse_directive(ctx, directive_config)):
directive.handle(directive_config.pop(), ctx)
return success(directive_config)
return failure()

View File

@ -0,0 +1,11 @@
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
[same-as-file]: <> (reference.cpp)
```cpp
#include<iostream>
int main() {
cout<<"Hello World";
return 0;
}
```

View File

@ -0,0 +1,8 @@
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}

View File

@ -0,0 +1,7 @@
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
[same-as-file]: <> (reference.cpp)
```cpp
#include<iostream>
```

View File

@ -0,0 +1,8 @@
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}

View File

@ -0,0 +1,13 @@
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
[same-as-file]: <> (reference.cpp)
```cpp
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}
```

View File

@ -0,0 +1,8 @@
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}

View File

@ -0,0 +1,15 @@
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
[same-as-file]: <> ({"ref": "reference.cpp", "skip-doc": 2, "skip-ref": 3})
```cpp
// First line unique to documentation
// Second line unique to documentation
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}
```

View File

@ -0,0 +1,11 @@
// First line unique to reference
// Second line unique to reference
// Third line unique to reference
#include<iostream>
using namespace std;
int main() {
cout<<"Hello World";
return 0;
}

View File

@ -0,0 +1,44 @@
# ===------- test-same-as-file.py - Test for same-as-file directive -------===//
#
# Copyright 2019-2020 The IBM Research Authors.
#
# =============================================================================
#
# ===----------------------------------------------------------------------===//
import unittest
import os
import sys
# Make common utilities visible by adding them to system paths.
test_dir = os.path.dirname(os.path.realpath(__file__))
doc_check_base_dir = os.path.abspath(os.path.join(test_dir, os.pardir))
print(doc_check_base_dir)
sys.path.append(doc_check_base_dir)
import check
class TestStringMethods(unittest.TestCase):
def test_basic(self):
check.main('./same-as-file/simple/', [])
def test_different(self):
with self.assertRaises(ValueError) as context:
check.main("./same-as-file/error-doc-different-from-ref/", [])
self.assertTrue('Check failed because doc file content is not the same as that of reference file.' in str(
context.exception))
def test_doc_shorter_than_ref(self):
# check.main('./same-as-file/error-doc-shorter-than-ref/', [])
with self.assertRaises(ValueError) as context:
check.main('./same-as-file/error-doc-shorter-than-ref/', [])
self.assertTrue('Check failed because doc file is shorter than reference file.' in str(
context.exception))
def test_skip_doc_ref(self):
check.main('./same-as-file/skip-doc-ref/', [])
if __name__ == '__main__':
unittest.main()

91
doc/doc_check/utils.py Normal file
View File

@ -0,0 +1,91 @@
# ===-------------------------- utils.py - Utility ------------------------===//
#
# Copyright 2019-2020 The IBM Research Authors.
#
# =============================================================================
#
# ===----------------------------------------------------------------------===//
import logging
import os
# Based on https://stackoverflow.com/a/6367075.
class WrappedFile(object):
"""
Complements the standard python file object in two ways:
- Implements a line number based EOF checker.
- Allows retriving the current line position of the cursor, which is
good for error localization.
"""
def __init__(self, f):
self.f = f
self.line = 0
# Compute the total number of lines.
self.num_lines = len(f.readlines())
f.seek(0)
def close(self):
return self.f.close()
def readline(self):
self.line += 1
return self.f.readline()
def next_non_empty_line(self):
while not self.eof():
line = self.readline()
if len(line.strip()):
return line
raise RuntimeError("Enf of file.")
def skip_lines(self, num_lines):
for i in range(num_lines):
self.readline()
def eof(self):
return self.line >= self.num_lines
# to allow using in 'with' statements
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
class DocCheckerCtx(object):
def __init__(self, root_dir: str):
self.root_dir = root_dir
self.doc_file = None
def open_doc(self, file_name):
self.doc_file = WrappedFile(open(file_name, 'r'))
return self.doc_file
def doc_file_ext(self):
assert self.doc_file is not None, "hasn't opened any doc file"
_, file_extension = os.path.splitext(self.doc_file.f.name)
return file_extension
def success(states=None):
return "ok", states
def failure(states=None):
return "failed", states
def succeeded(states):
return states[0] == "ok"
def setup_logger(name):
handler = logging.StreamHandler()
logger = logging.getLogger(name)
logger.setLevel(logging.DEBUG)
logger.addHandler(handler)
return logger

10
utils/install-onnf.sh Normal file
View File

@ -0,0 +1,10 @@
# Export environment variables pointing to LLVM-Projects.
export LLVM_PROJ_SRC=$(pwd)/llvm-project/
export LLVM_PROJ_BUILD=$(pwd)/llvm-project/build
mkdir ONNF/build && cd ONNF/build
cmake ..
cmake --build . --target onnf
# Run FileCheck tests:
cmake --build . --target check-mlir-lit