The boundary_conditions process in hazard3_frontend needs to be
scheduled at least twice to resolve to the correct values. There are
multiple possible interleavings, which should all result in the same
result. However Verilator schedules the process only once.
Work around this by moving the tie-off of the problematic variable into
the synchronous update process.
While NA4 and NAPOT are the only "naturally aligned" addressing modes
in the RISC-V PMP (Privileged) Spec, calling their support out by
name, and clearly stating that the TOR addressing mode is not
supported, can clarify this fact for software / OS developers.
This is a common point of confusion and frustration when porting to
new RISC-V chips and so increased visbility of this limitation in the
documentation and README might help.
* jalr-01 uses 'la x0' which is rejected as invalid by recent binutils
* cebreak-01 miscompares due to hardwired mtval (common upstream test issue)
Correct fix is blocked on bringing up the latest riscv-arch-test build
system for hazard3 (seems to have been completely rewritten again)
Fill out port definitions for the AHB5 interfaces.
Snip the appendix of instruction pseudocode as it's strictly redundant
vs the specs. No need to pad this document.
Rearrange the implementation section to put ports before parameters, and
add some brief notes on synthesis.
OR-of-ANDs style mux is used because it maps well on FPGA (particularly
this 3-input mux maps straight onto LUT6) and because this allows the
zeroing of x0 to be implemented directly in the mux
This was done because the implementation depends on the value of PC
(it's a jump-to-next), and PC-dependent instructions are permitted to
be flagged as invalid in debug mode, to permit sharing of PC and the
dpc CSR.
However this is not valid in this case because the dependency on PC is
an implementation detail, not an architected dependency. Instead just
suppress the jump in debug mode. Suppressing the jump is still required
to avoid flushing following program buffer entries from the prefetch
queue during debug mode execution.
From a functional point of view not much has changed, it just removes
an inconsistency where fence.i appeared to be implemented in M/U mode
but not in debug mode. This removes a complaint from openocd when it
executes a fence + fence.i after writing to memory.
Decode is now split into a block which depends only on the instruction
bits, and a block which gates critical decode signals based on fetch
faults, invalidity etc.
Apply a similar transform to the gating of the uop counter update.
cxxrtl performance seems unchanged after removing the event loops, but
verilator and live-scheduled simulators should improve.