This was done because the implementation depends on the value of PC
(it's a jump-to-next), and PC-dependent instructions are permitted to
be flagged as invalid in debug mode, to permit sharing of PC and the
dpc CSR.
However this is not valid in this case because the dependency on PC is
an implementation detail, not an architected dependency. Instead just
suppress the jump in debug mode. Suppressing the jump is still required
to avoid flushing following program buffer entries from the prefetch
queue during debug mode execution.
From a functional point of view not much has changed, it just removes
an inconsistency where fence.i appeared to be implemented in M/U mode
but not in debug mode. This removes a complaint from openocd when it
executes a fence + fence.i after writing to memory.
Decode is now split into a block which depends only on the instruction
bits, and a block which gates critical decode signals based on fetch
faults, invalidity etc.
Apply a similar transform to the gating of the uop counter update.
cxxrtl performance seems unchanged after removing the event loops, but
verilator and live-scheduled simulators should improve.
There was one genuine issue introduced by PPA changes in 78a5cb98e which
affected instruction injection on multiple harts from the DM (indicating
SMP debug testing needs to be part of regular automated regressions,
instead of semi-manual...). The rest are cosmetic.
default case to DM acmd state machine. Also remove unnecessary clear
of JTAG DR shifter on TAP reset state, which saves a bit of logic. Two
width mismatches are left unfixed in the DTM (the ones with shifts)
because they bizarrely increase area by 100 LUT4s when fixed.
Extend umode_wfi testcase to cover this, and in particular to check
that when entering U-mode with IRQs pending, the IRQs execute before any
exceptions occurring as a result of the U-mode instructions.
None of upstream tests used for Hazard3 seem to cover X != R. The
Hazard3 tests covered this case, but the header file for the tests has
the same mistake. Fix the header.
Interrupting the PC-setting step of a cm.popret (only) can sample the return target
as the exception return PC, which will cause the stack pointer adjust to be skipped
when returning from the IRQ. Fix this by making the PC-setting step uninterruptible
(note the PC-setting step is the instruction we execute first out of the group
of instructions specified in the Zc spec as being atomic wrt interrupts. This
does not itself imply that the PC-setting step is uninterruptible, it just
requires that when the PC-setting step retires, all following steps also retire.
However this is not sufficient given the special case logic that allows the jr
ra PC-setting step to execute before the final stack adjust as an optimisation.)