Update embench config and readme

2023-03-31 03:02:06 +01:00 · 2023-03-31 03:02:06 +01:00 · 86fc4e3f2d
parent ca40c077be
commit 86fc4e3f2d
3 changed files with 56 additions and 5 deletions
--- a/Readme.md
+++ b/Readme.md
@ -87,6 +87,11 @@ This build will also install an appropriate gdb as `riscv32-unknown-elf-gdb`.
 The `--with-multilib-generator=` flag builds multiple versions of the standard library, to match possible `-march` flags provided at link time. If there is no _exact_ match, the linker falls back to the architecture specified by the `--with-arch` flag, which in this case is the fairly conservative RV32IA. This will become worse with GCC 12, where for example the CSR instructions have moved from `I` to `Zicsr`, and the entire arch string must still be matched to get the non-fallback library.
 Make sure this toolchain can be found on your `PATH` (as `riscv32-unknown-elf-*`):
 ```bash
 export PATH="$PATH:/opt/riscv/bin"
 ```
 ## Actually Running Hello World
--- a/test/sim/embench/Readme.md
+++ b/test/sim/embench/Readme.md
@ -7,10 +7,56 @@ To run these benchmarks, first make sure the embench-iot submodule is checked ou
 cd embench-iot
 # Make sure testbench is up to date
 make -C ../../tb_cxxrtl tb
-./build_all.py --arch riscv32 --chip hazard3 --board hazard3tb
+./build_all.py --arch riscv32 --chip hazard3 --board hazard3tb --clean
-./benchmark_speed.py --target-module run_hazard3tb --timeout 120
+# You might need a longer timeout -- some sims are tens of millions of cycles
 ./benchmark_speed.py --target-module run_hazard3tb --timeout 180 --sim-parallel
 ```
-The compiler specified in `config/riscv32/chips/hazard3/chip.cfg` is `/opt/riscv/unstable/bin/riscv32-unknown-elf-gcc`, which is where I have an unstable GCC 12 build installed on my machine. You need to have a recent upstream master build to support the Zba/Zbb/Zbc/Zbs instructions. If you don't care about these, you can use whatever `riscv32-unknown-elf` compiler you have, and also edit `cflags` in that `.cfg` file to not include the bitmanip extensions in `march`.
+The compiler specified in `config/riscv32/chips/hazard3/chip.cfg` is `riscv32-unknown-elf-gcc` (no directory prefix -- just whichever is on PATH). On my machine this is currently GCC12, which is the first stable release to have support for the bitmanip instructions. These instructions seem to be worth about 6% performance overall -- you can disable them in the `chip.cfg` file if your local gcc12 doesn't support them.
-You can also add `--sim-parallel` to the `benchmark_speed` for faster simulations.
+If you want to play with the Zcb/Zcmp instructions, these currently (March 2023) require the CORE-V development toolchains, and you will have to edit the `chip.cfg` accordingly.
 ## Caution on stdlib/multilib
 Some of the Embench benchmarks are essentially benchmarks of the soft float routines in `libm`, which respond well to the bitmanip instructions. I have a rather sprawling multilib setup which ensures I get Zb instructions in my stdlib when I link with Zb in my `-march`. However, depending on the default architecture your toolchain was configured with, you might find that it falls back to something incredibly conservative like RV32I-only if there is no exact multilib match.
 To give you an idea, here the left hand side is the speed benchmark result for `-O2 -ffunction-sections` on GCC 12 with `-march=rv32im_zba_zbb_zbs` and an RV32I stdlib, and the right hand side is with the correct standard library:
 ```
 RV32IMZbaZbbZbs with RV32I library:         With correct standard library:
 Benchmark           Speed                    Benchmark           Speed
 ---------           -----                    ---------           -----
 aha-mont64         0.8323                    aha-mont64         0.8318
 crc32              0.9209                    crc32              0.9593
 cubic              0.1346                    cubic              0.5378
 edn                1.1087                    edn                1.1460
 huffbench          1.5361                    huffbench          1.6016
 matmult-int        1.2319                    matmult-int        1.2308
 minver             0.2693                    minver             0.7535
 nbody              0.1901                    nbody              0.8496
 nettle-aes         0.8800                    nettle-aes         1.0828
 nettle-sha256      0.9981                    nettle-sha256      1.3922
 nsichneu           1.1674                    nsichneu           1.1674
 picojpeg           0.9904                    picojpeg           1.0812
 primecount         1.5352                    primecount         1.3765
 qrduino            1.3538                    qrduino            1.3717
 sglib-combined     1.2075                    sglib-combined     1.2749
 slre               1.3740                    slre               1.4178
 st                 0.2250                    st                 1.0011
 statemate          2.2047                    statemate          2.1999
 tarfind            2.2567                    tarfind            2.2568
 ud                 0.9269                    ud                 1.0073
 wikisort           0.7946                    wikisort           1.9289
 ---------           -----                    ---------           -----
 Geometric mean     0.8488                    Geometric mean     1.1905
 Geometric SD       2.1454                    Geometric SD       1.4039
 Geometric range    1.4254                    Geometric range    0.8233
 All benchmarks run successfully              All benchmarks run successfully
 ```
 (Hazard3 configuration: all ISA options enabled, FAST_MUL=1, FASTER_MUL=1, FAST_MULH=1, BRANCH_PREDICTOR=1, MULDIV_UNROLL=2, REDUCED_BYPASS=0)
 Also note I don't run the MD5 benchmark because it tries to do a hosted printf inside of the benchmarked section (which is a known issue with Embench).
 I haven't yet compiled a full set of benchmark results for Hazard3, partly because it means coming up with a list of specimen configurations and then doing hardware characterisation on them. This is just a note in case you try to run Embench and wonder why the performance is through the floor.
--- a/test/sim/embench/embench-iot
+++ b/test/sim/embench/embench-iot
@ -1 +1 @@
-Subproject commit d85ddd2e94f2bb6684a96359e6eade59b4f6339b
+Subproject commit d9b3a6f9491db0f7a8a9641c78b491e4fa97b860
		`@ -1 +1 @@`
			`Subproject commit d85ddd2e94f2bb6684a96359e6eade59b4f6339b`				`Subproject commit d9b3a6f9491db0f7a8a9641c78b491e4fa97b860`