Hazard3/doc/sections/debug.adoc

== Debug

Hazard3, along with its external debug components, implements version 0.13.2 of the RISC-V debug specification. It supports the following:

* Run/halt/reset control as required
* Abstract GPR access as required
* Program Buffer, 2 words plus `impebreak`
* Automatic trigger of abstract command (`abstractauto`) on `data0` or Program Buffer access for efficient memory block transfers from the host
* Support for multiple harts (multiple Hazard3 cores) connected to a single Debug Module (DM)
* The hart array mask registers, for applying run/halt/reset controls to multiple cores simultaneously
* (Optional) System Bus Access, either through a dedicated AHB5 manager interface, or multiplexed with a processor load/store port
* (Optional) An instruction address trigger unit (hardware breakpoints)

=== Debug Topologies

Hazard3's Debug Module (DM) has the following interfaces:

* An upstream AMBA 3 APB port -- the "Debug Module Interface" -- for host access to the Debug Module
* A downstream Hazard3-specific interface to one or more cores
* Some reset request/acknowledge signals which require careful handshaking with system-level reset logic

This is shown in the example topology below.

image::diagrams/debug_topology.png[pdfwidth=50%]

The DM _must_ be connected directly to the processors without intervening registers. This implies the DM is in the same clock domain as the processors, so multiple processors on the same DM must share a common clock.

Upstream of the DM is at least one Debug Transport Module, which bridges some host-facing interface such as JTAG to the APB DM Interface. Hazard3 provides an implementation of a standard RISC-V JTAG-DTM, but any APB master could be used. The DM requires at least 7 bits of word addressing, i.e. 9 bits of byte address space.

An APB arbiter could be inserted here, to allow multiple transports to be used, provided the host(s) avoid using multiple transports concurrently. This also admits simple implementation of self-hosted debug, by mapping the DM to a system-level peripheral address space.

The clock domain crossing (if any) occurs on the downstream port of the Debug Transport Module. Hazard3's JTAG-DTM implementation runs entirely in the TCK domain, and instantiates a bus clock-crossing module internally to bridge a TCK-domain internal APB bus to an external bus in the processor clock domain.

It is possible to instantiate multiple DMs, one per core, and attach them to a single Debug Transport Module. This is not the preferred topology, but it does allow multiple cores to be independently clocked. In this case, the first DM must be located at address `0x0` in the DMI address space, and you must set the `NEXT_DM_ADDR` parameter on each DM so that the debugger can walk the (null-terminated) linked list and discover all the DMs.

=== Implementation-defined behaviour

Features implemented by the Hazard3 Debug Module (beyond the mandatory):

* Halt-on-reset, selectable per-hart
* Program Buffer, size 2 words, `impebreak` = 1.
* A single data register (`data0`) is implemented as a per-hart CSR accessible by the DM
* `abstractauto` is supported on the `data0` register
* Up to 32 harts selectable via `hartsel`

Not implemented:

* Hart array mask selection
* Abstract access memory
* Abstract access CSR
* Post-incrementing abstract access GPR
* System bus access

The core behaves as follows:

* Branch, `jal`, `jalr` and `auipc` are illegal in debug mode, because they observe PC: attempting to execute will halt Program Buffer execution and report an exception in `abstractcs.cmderr`
* The `dret` instruction is not implemented (a special purpose DM-to-core signal is used to signal resume)
* The `dscratch` CSRs are not implemented
* The DM's `data0` register is mapped into the core as a CSR, <<reg-dmdata0>>, address `0xbff`.
** Raises an illegal instruction exception when accessed outside of Debug Mode
** The DM ignores attempted core writes to the CSR, unless the DM is currently executing an abstract command on that core
** Used by the DM to implement abstract GPR access, by injecting CSR read/write instructions
* `dcsr.stepie` is hardwired to 0 (no interrupts during single stepping)
* `dcsr.stopcount` and `dcsr.stoptime` are hardwired to 1 (no counter or internal timer increment in debug mode)
* `dcsr.mprven` is hardwired to 0
* `dcsr.prv` is hardwired to 3 (M-mode)

See also <<debug-csr-section>> for more details on the core-side Debug Mode registers.

The debug host must use the Program Buffer to access CSRs and memory. This carries some overhead for individual accesses, but is efficient for bulk transfers: the `abstractauto` feature allows the DM to trigger the Program Buffer and/or a GPR tranfer automatically following every `data0` access, which can be used for e.g. autoincrementing read/write memory bursts. Program Buffer read/writes can also be used as `abstractauto` triggers: this is less useful than the `data0` trigger, but takes little extra effort to implement, and can be used to read/write a large number of CSRs efficiently.

Abstract memory access is not implemented because, for bulk transfers, it offers no better throughput than Program Buffer execution with `abstractauto`. Non-bulk transfers, while slower, are still instantaneous from the perspective of the human at the other end of the wire.

The Hazard3 DM has experimental support for multi-core debug. Each core possesses exactly one hardware thread (hart) which is exposed to the debugger. The RISC-V specification does not mandate what mapping is used between the DM hart index `hartsel` and each core's `mhartid` CSR, but a 1:1 match of these values is the least likely to cause issues. Each core's `mhartid` can be configured using the `MHARTID_VAL` parameter during instantiation.

=== Debug Module to Core Interface

The DM can inject instructions directly into the core's instruction prefetch buffer. This mechanism is used to execute the Program Buffer, or used directly by the DM, issuing hardcoded instructions to manipulate core state.

The DM's `data0` register is exposed to the core as a debug mode CSR. By issuing instructions to make the core read or write this dummy CSR, the DM can exchange data with the core. To read from a GPR `x` into `data0`, the DM issues a `csrw data0, x` instruction. Similarly `csrr x, data0` will write `data0` to that GPR. The DM always follows the CSR instruction with an `ebreak`, just like the implicit `ebreak` at the end of the Program Buffer, so that it is notified by the core when the GPR read instruction sequence completes.
Add draft UART DTM 2021-07-09 00:57:46 +08:00			`== Debug`

Reorganise CSR section of docs 2021-12-02 09:35:18 +08:00			`Hazard3, along with its external debug components, implements version 0.13.2 of the RISC-V debug specification. It supports the following:`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
			`* Run/halt/reset control as required`
			`* Abstract GPR access as required`
Associated doc updates 2021-07-11 01:53:59 +08:00			* Program Buffer, 2 words plus `impebreak`
Some doc updates 2021-07-17 19:58:08 +08:00			* Automatic trigger of abstract command (`abstractauto`) on `data0` or Program Buffer access for efficient memory block transfers from the host
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`* Support for multiple harts (multiple Hazard3 cores) connected to a single Debug Module (DM)`
			`* The hart array mask registers, for applying run/halt/reset controls to multiple cores simultaneously`
			`* (Optional) System Bus Access, either through a dedicated AHB5 manager interface, or multiplexed with a processor load/store port`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			`* (Optional) An instruction address trigger unit (hardware breakpoints)`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00			`=== Debug Topologies`

Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`Hazard3's Debug Module (DM) has the following interfaces:`
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00
			`* An upstream AMBA 3 APB port -- the "Debug Module Interface" -- for host access to the Debug Module`
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`* A downstream Hazard3-specific interface to one or more cores`
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00			`* Some reset request/acknowledge signals which require careful handshaking with system-level reset logic`

Reorganise CSR section of docs 2021-12-02 09:35:18 +08:00			`This is shown in the example topology below.`
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00
			`image::diagrams/debug_topology.png[pdfwidth=50%]`

Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`The DM _must_ be connected directly to the processors without intervening registers. This implies the DM is in the same clock domain as the processors, so multiple processors on the same DM must share a common clock.`
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`Upstream of the DM is at least one Debug Transport Module, which bridges some host-facing interface such as JTAG to the APB DM Interface. Hazard3 provides an implementation of a standard RISC-V JTAG-DTM, but any APB master could be used. The DM requires at least 7 bits of word addressing, i.e. 9 bits of byte address space.`
Reorganise CSR section of docs 2021-12-02 09:35:18 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`An APB arbiter could be inserted here, to allow multiple transports to be used, provided the host(s) avoid using multiple transports concurrently. This also admits simple implementation of self-hosted debug, by mapping the DM to a system-level peripheral address space.`
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00
			`The clock domain crossing (if any) occurs on the downstream port of the Debug Transport Module. Hazard3's JTAG-DTM implementation runs entirely in the TCK domain, and instantiates a bus clock-crossing module internally to bridge a TCK-domain internal APB bus to an external bus in the processor clock domain.`

Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			It is possible to instantiate multiple DMs, one per core, and attach them to a single Debug Transport Module. This is not the preferred topology, but it does allow multiple cores to be independently clocked. In this case, the first DM must be located at address `0x0` in the DMI address space, and you must set the `NEXT_DM_ADDR` parameter on each DM so that the debugger can walk the (null-terminated) linked list and discover all the DMs.
Describe possible debug topologies. Update pdf. 2021-11-28 17:01:23 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			`=== Implementation-defined behaviour`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Reorganise CSR section of docs 2021-12-02 09:35:18 +08:00			`Features implemented by the Hazard3 Debug Module (beyond the mandatory):`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			`* Halt-on-reset, selectable per-hart`
			* Program Buffer, size 2 words, `impebreak` = 1.
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* A single data register (`data0`) is implemented as a per-hart CSR accessible by the DM
Sync doc updates 2021-07-13 05:13:31 +08:00			* `abstractauto` is supported on the `data0` register
			* Up to 32 harts selectable via `hartsel`

			`Not implemented:`

			`* Hart array mask selection`
			`* Abstract access memory`
			`* Abstract access CSR`
			`* Post-incrementing abstract access GPR`
			`* System bus access`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Reorganise CSR section of docs 2021-12-02 09:35:18 +08:00			`The core behaves as follows:`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			* Branch, `jal`, `jalr` and `auipc` are illegal in debug mode, because they observe PC: attempting to execute will halt Program Buffer execution and report an exception in `abstractcs.cmderr`
			* The `dret` instruction is not implemented (a special purpose DM-to-core signal is used to signal resume)
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* The `dscratch` CSRs are not implemented
More docs cleanup 2021-12-02 10:29:34 +08:00			* The DM's `data0` register is mapped into the core as a CSR, <<reg-dmdata0>>, address `0xbff`.
			`** Raises an illegal instruction exception when accessed outside of Debug Mode`
			`** The DM ignores attempted core writes to the CSR, unless the DM is currently executing an abstract command on that core`
			`** Used by the DM to implement abstract GPR access, by injecting CSR read/write instructions`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* `dcsr.stepie` is hardwired to 0 (no interrupts during single stepping)
Associated doc updates 2021-07-11 01:53:59 +08:00			* `dcsr.stopcount` and `dcsr.stoptime` are hardwired to 1 (no counter or internal timer increment in debug mode)
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* `dcsr.mprven` is hardwired to 0
			* `dcsr.prv` is hardwired to 3 (M-mode)

More docs cleanup 2021-12-02 10:29:34 +08:00			`See also <<debug-csr-section>> for more details on the core-side Debug Mode registers.`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Reorganise CSR section of docs 2021-12-02 09:35:18 +08:00			The debug host must use the Program Buffer to access CSRs and memory. This carries some overhead for individual accesses, but is efficient for bulk transfers: the `abstractauto` feature allows the DM to trigger the Program Buffer and/or a GPR tranfer automatically following every `data0` access, which can be used for e.g. autoincrementing read/write memory bursts. Program Buffer read/writes can also be used as `abstractauto` triggers: this is less useful than the `data0` trigger, but takes little extra effort to implement, and can be used to read/write a large number of CSRs efficiently.
Add draft UART DTM 2021-07-09 00:57:46 +08:00
More docs cleanup 2021-12-02 10:29:34 +08:00			Abstract memory access is not implemented because, for bulk transfers, it offers no better throughput than Program Buffer execution with `abstractauto`. Non-bulk transfers, while slower, are still instantaneous from the perspective of the human at the other end of the wire.
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			The Hazard3 DM has experimental support for multi-core debug. Each core possesses exactly one hardware thread (hart) which is exposed to the debugger. The RISC-V specification does not mandate what mapping is used between the DM hart index `hartsel` and each core's `mhartid` CSR, but a 1:1 match of these values is the least likely to cause issues. Each core's `mhartid` can be configured using the `MHARTID_VAL` parameter during instantiation.
Add draft UART DTM 2021-07-09 00:57:46 +08:00
More docs cleanup 2021-12-02 10:29:34 +08:00			`=== Debug Module to Core Interface`

			`The DM can inject instructions directly into the core's instruction prefetch buffer. This mechanism is used to execute the Program Buffer, or used directly by the DM, issuing hardcoded instructions to manipulate core state.`

			The DM's `data0` register is exposed to the core as a debug mode CSR. By issuing instructions to make the core read or write this dummy CSR, the DM can exchange data with the core. To read from a GPR `x` into `data0`, the DM issues a `csrw data0, x` instruction. Similarly `csrr x, data0` will write `data0` to that GPR. The DM always follows the CSR instruction with an `ebreak`, just like the implicit `ebreak` at the end of the Program Buffer, so that it is notified by the core when the GPR read instruction sequence completes.