Hazard3/doc/sections/debug.adoc

== Debug

Hazard3, along with its external debug components, implements version 0.13.2 of the RISC-V debug specification. The goals of this implementation are:

* Minimal impact on core timing when present
* No external components which need integrating at the other end of your bus fabric -- just slap the Debug Module onto the core and away you go
* Efficient block data transfers to target RAM for faster edit-compile-run cycle

Hazard3's debug support implements the following:

* Run/halt/reset control as required
* Abstract GPR access as required
* Program Buffer, 2 words plus `impebreak`
* Automatic trigger of abstract command (`abstractauto`) on `data0` or Program Buffer access for efficient memory block transfers from the host
* (TODO) Some minimum useful trigger unit -- likely just breakpoints, no watchpoints

The DM can inject instructions directly into the core's instruction prefetch buffer. This mechanism is used to execute the Program Buffer, or used directly by the DM, issuing hardcoded instructions to manipulate core state.

The DM's `data0` register is exposed to the core as a debug mode CSR. By issuing instructions to make the core read or write this dummy CSR, the DM can exchange data with the core. To read from a GPR `x` into `data0`, the DM issues a `csrw data0, x` instruction. Similarly `csrr x, data0` will write `data0` to that GPR. The DM always follows the CSR instruction with an `ebreak`, just like the implicit `ebreak` at the end of the Program Buffer, so that it is notified by the core when the GPR read instruction sequence completes.

The debug host must use the Program Buffer to access CSRs and memory. This carries some overhead for individual accesses, but is efficient for bulk transfers: the `abstractauto` feature allows the DM to trigger the Program Buffer and/or a GPR tranfer automatically following every `data0` access, which can be used for e.g. autoincrementing read/write memory bursts. Program Buffer read/writes can also be used as `abstractauto` triggers: this is less useful than the `data0` trigger, but takes little extra effort to implement, and can be used to read/write a large number of CSRs efficiently.

Abstract memory access is not implemented because it offers no better throughput than Program Buffer execution with `abstractauto` for bulk transfers, and non-bulk transfers are still instantaneous from the perspective of the human at the other end of the wire.

=== Implementation-defined behaviour

Features implemented by DM (beyond the mandatory):

* Halt-on-reset, selectable per-hart
* Program Buffer, size 2 words, `impebreak` = 1.
* A single data register (`data0`) is implemented as a per-hart CSR accessible by the DM
* `abstractauto` is supported on the `data0` register
* Up to 32 harts selectable via `hartsel`

Not implemented:

* Hart array mask selection
* Abstract access memory
* Abstract access CSR
* Post-incrementing abstract access GPR
* System bus access

Core behaviour:

* Branch, `jal`, `jalr` and `auipc` are illegal in debug mode, because they observe PC: attempting to execute will halt Program Buffer execution and report an exception in `abstractcs.cmderr`
* The `dret` instruction is not implemented (a special purpose DM-to-core signal is used to signal resume)
* The `dscratch` CSRs are not implemented
* External `data0` register is exposed as a dummy CSR mapped at `0x7b2` (the location of `dscratch0`), readable and writable by the DM.
** This is a debug mode CSR, so raises an illegal instruction exception when accessed in machine mode
** The DM ignores writes unless it is currently executing an abstract command on this core (`hartsel` = this core, `abstractcs.busy` = 1)
* `dcsr.stepie` is hardwired to 0 (no interrupts during single stepping)
* `dcsr.stopcount` and `dcsr.stoptime` are hardwired to 1 (no counter or internal timer increment in debug mode)
* `dcsr.mprven` is hardwired to 0
* `dcsr.prv` is hardwired to 3 (M-mode)

=== UART DTM

Hazard3 defines a minimal UART Debug Transport Module, which allows the Debug Module to be accessed via a standard 8n1 asynchronous serial port. The UART DTM is always accessed by the host using a two-wire serial interface (TXD RXD) running at 1 Mbaud. The interface between the DTM and DM is an AMBA 3 APB port with a 32-bit data bus and 8-bit address bus.

This is a quick hack, and not suitable for production systems:

* Debug hardware should not expect a frequency reference for a UART to be present
* The UART DTM does not implement any flow control or error detection/correction

The host may send the following commands:

[cols="20h,~,~", options="header"]
|===
| Command | To DTM | From DTM
| `0x00` NOP | - | -
| `0x01` Read ID | - | 4-byte ID, same format as JTAG-DTM ID (JEP106-compatible)
| `0x02` Read DMI | 1 address byte | 4 data bytes
| `0x03` Write DMI | 1 address byte, 4 data bytes | data bytes echoed back
| `0xa5` Disconnect | - | -
|===

Initially after power-on the DTM is in the Dormant state, and will ignore any commands. The host sends the magic sequence `"SUP?"` (`0x53, 0x55, 0x50, 0x3f`) to wake the DTM, and then issues a Read ID command to check the link is up. The DTM can be returned to the Dormant state at any time using the `0xa5` Disconnect command.

So that the host can queue up batches of commands in its transmit buffer, without overrunning the DTM's transmit bandwidth, it's recommended to pad each command with NOPs so that it is strictly larger than the response. For example, a Read ID should be followed by four NOPs, and a Read DMI should be followed by 3 NOPs.

To recover command framing, write 6 NOP commands (the length of the longest commands). This will be interpreted as between 1 and 6 NOPs depending on the DTM's state.

This interface assumes the DMI data transfer takes very little time compared with the UART access (typically less than one baud period). When the host-to-DTM bandwidth is kept greater than the DTM-to-host bandwidth, thanks to appropriate NOP padding, the host can queue up batches of commands in its transmit buffer, and this should never overrun the DTM's response channel. So, the 1 Mbaud 8n1 UART link provides 67 kB/s of half-duplex data bandwidth between host and DM, which is enough to get your system off the ground.
Add draft UART DTM 2021-07-09 00:57:46 +08:00			`== Debug`

Sync doc updates 2021-07-13 05:13:31 +08:00			`Hazard3, along with its external debug components, implements version 0.13.2 of the RISC-V debug specification. The goals of this implementation are:`

			`* Minimal impact on core timing when present`
			`* No external components which need integrating at the other end of your bus fabric -- just slap the Debug Module onto the core and away you go`
Some doc updates 2021-07-17 19:58:08 +08:00			`* Efficient block data transfers to target RAM for faster edit-compile-run cycle`
Sync doc updates 2021-07-13 05:13:31 +08:00
			`Hazard3's debug support implements the following:`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
			`* Run/halt/reset control as required`
			`* Abstract GPR access as required`
Associated doc updates 2021-07-11 01:53:59 +08:00			* Program Buffer, 2 words plus `impebreak`
Some doc updates 2021-07-17 19:58:08 +08:00			* Automatic trigger of abstract command (`abstractauto`) on `data0` or Program Buffer access for efficient memory block transfers from the host
Sync doc updates 2021-07-13 05:13:31 +08:00			`* (TODO) Some minimum useful trigger unit -- likely just breakpoints, no watchpoints`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Some doc updates 2021-07-17 19:58:08 +08:00			`The DM can inject instructions directly into the core's instruction prefetch buffer. This mechanism is used to execute the Program Buffer, or used directly by the DM, issuing hardcoded instructions to manipulate core state.`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Some doc updates 2021-07-17 19:58:08 +08:00			The DM's `data0` register is exposed to the core as a debug mode CSR. By issuing instructions to make the core read or write this dummy CSR, the DM can exchange data with the core. To read from a GPR `x` into `data0`, the DM issues a `csrw data0, x` instruction. Similarly `csrr x, data0` will write `data0` to that GPR. The DM always follows the CSR instruction with an `ebreak`, just like the implicit `ebreak` at the end of the Program Buffer, so that it is notified by the core when the GPR read instruction sequence completes.
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Some doc updates 2021-07-17 19:58:08 +08:00			The debug host must use the Program Buffer to access CSRs and memory. This carries some overhead for individual accesses, but is efficient for bulk transfers: the `abstractauto` feature allows the DM to trigger the Program Buffer and/or a GPR tranfer automatically following every `data0` access, which can be used for e.g. autoincrementing read/write memory bursts. Program Buffer read/writes can also be used as `abstractauto` triggers: this is less useful than the `data0` trigger, but takes little extra effort to implement, and can be used to read/write a large number of CSRs efficiently.
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			Abstract memory access is not implemented because it offers no better throughput than Program Buffer execution with `abstractauto` for bulk transfers, and non-bulk transfers are still instantaneous from the perspective of the human at the other end of the wire.
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			`=== Implementation-defined behaviour`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			`Features implemented by DM (beyond the mandatory):`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
Sync doc updates 2021-07-13 05:13:31 +08:00			`* Halt-on-reset, selectable per-hart`
			* Program Buffer, size 2 words, `impebreak` = 1.
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* A single data register (`data0`) is implemented as a per-hart CSR accessible by the DM
Sync doc updates 2021-07-13 05:13:31 +08:00			* `abstractauto` is supported on the `data0` register
			* Up to 32 harts selectable via `hartsel`

			`Not implemented:`

			`* Hart array mask selection`
			`* Abstract access memory`
			`* Abstract access CSR`
			`* Post-incrementing abstract access GPR`
			`* System bus access`
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00
			`Core behaviour:`

Sync doc updates 2021-07-13 05:13:31 +08:00			* Branch, `jal`, `jalr` and `auipc` are illegal in debug mode, because they observe PC: attempting to execute will halt Program Buffer execution and report an exception in `abstractcs.cmderr`
			* The `dret` instruction is not implemented (a special purpose DM-to-core signal is used to signal resume)
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* The `dscratch` CSRs are not implemented
Some doc updates 2021-07-17 19:58:08 +08:00			* External `data0` register is exposed as a dummy CSR mapped at `0x7b2` (the location of `dscratch0`), readable and writable by the DM.
			`** This is a debug mode CSR, so raises an illegal instruction exception when accessed in machine mode`
			** The DM ignores writes unless it is currently executing an abstract command on this core (`hartsel` = this core, `abstractcs.busy` = 1)
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* `dcsr.stepie` is hardwired to 0 (no interrupts during single stepping)
Associated doc updates 2021-07-11 01:53:59 +08:00			* `dcsr.stopcount` and `dcsr.stoptime` are hardwired to 1 (no counter or internal timer increment in debug mode)
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			* `dcsr.mprven` is hardwired to 0
			* `dcsr.prv` is hardwired to 3 (M-mode)

Add draft UART DTM 2021-07-09 00:57:46 +08:00			`=== UART DTM`

			`Hazard3 defines a minimal UART Debug Transport Module, which allows the Debug Module to be accessed via a standard 8n1 asynchronous serial port. The UART DTM is always accessed by the host using a two-wire serial interface (TXD RXD) running at 1 Mbaud. The interface between the DTM and DM is an AMBA 3 APB port with a 32-bit data bus and 8-bit address bus.`

Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			`This is a quick hack, and not suitable for production systems:`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
			`* Debug hardware should not expect a frequency reference for a UART to be present`
			`* The UART DTM does not implement any flow control or error detection/correction`

Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			`The host may send the following commands:`

			`[cols="20h,~,~", options="header"]`
			`\|===`
			`\| Command \| To DTM \| From DTM`
			\| `0x00` NOP \| - \| -
			\| `0x01` Read ID \| - \| 4-byte ID, same format as JTAG-DTM ID (JEP106-compatible)
			\| `0x02` Read DMI \| 1 address byte \| 4 data bytes
			\| `0x03` Write DMI \| 1 address byte, 4 data bytes \| data bytes echoed back
			\| `0xa5` Disconnect \| - \| -
			`\|===`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			Initially after power-on the DTM is in the Dormant state, and will ignore any commands. The host sends the magic sequence `"SUP?"` (`0x53, 0x55, 0x50, 0x3f`) to wake the DTM, and then issues a Read ID command to check the link is up. The DTM can be returned to the Dormant state at any time using the `0xa5` Disconnect command.
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			`So that the host can queue up batches of commands in its transmit buffer, without overrunning the DTM's transmit bandwidth, it's recommended to pad each command with NOPs so that it is strictly larger than the response. For example, a Read ID should be followed by four NOPs, and a Read DMI should be followed by 3 NOPs.`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			`To recover command framing, write 6 NOP commands (the length of the longest commands). This will be interpreted as between 1 and 6 NOPs depending on the DTM's state.`
Add draft UART DTM 2021-07-09 00:57:46 +08:00
Add Read ID command to UART DTM 2021-07-10 23:14:35 +08:00			This interface assumes the DMI data transfer takes very little time compared with the UART access (typically less than one baud period). When the host-to-DTM bandwidth is kept greater than the DTM-to-host bandwidth, thanks to appropriate NOP padding, the host can queue up batches of commands in its transmit buffer, and this should never overrun the DTM's response channel. So, the 1 Mbaud 8n1 UART link provides 67 kB/s of half-duplex data bandwidth between host and DM, which is enough to get your system off the ground.
Add draft UART DTM 2021-07-09 00:57:46 +08:00