Hazard3/doc/sections/introduction.adoc

== Introduction

Hazard3 is a configurable 3-stage RISC-V processor, implementing:

* `RV32I`: 32-bit base instruction set
* `M`: integer multiply/divide/modulo
* `A`: atomic memory operations, with AHB5 global exclusives
* `C`: compressed instructions
* `Zicsr`: CSR access
* `Zba`: address generation
* `Zbb`: basic bit manipulation
* `Zbc`: carry-less multiplication
* `Zbs`: single-bit manipulation
* `Zbkb`: basic bit manipulation for scalar cryptography
* `Zcb`: basic additional compressed instructions
* `Zcmp`: push/pop and double-move compressed instructions
* Debug, Machine and User privilege/execution modes
* Privileged instructions `ECALL`, `EBREAK`, `MRET` and `WFI`
* External debug support
* Instruction address trigger unit (hardware breakpoints)

=== Architectural Overview

==== Pipeline Stages

The three stages are:

* `F`: Fetch
** Contains the data phase for instruction fetch
** Contains the instruction prefetch buffer
** Predecodes register numbers `rs1`/`rs2`, for faster register file read and register bypass
** Contains the address match logic for the optional branch predictor
* `X`: Execute
** Decodes and execute instructions
** Drives the address phase for load/store/AMO
** Generates jump/branch addresses
** Contains the read and write ports for the CSR file
** Unbypassed register values are available at the beginning of stage `X`
** The ALU result is valid by the end of stage `X`
* `M`: Memory
** Contains the data phase for load/store/AMO
** Generates exception addresses
** Register writeback is at the end of stage `M`

The instruction fetch address phase is best thought of as residing in stage `X`. The 2-cycle feedback loop between jump/branch decode into address issue in stage `X`, and the fetch data phase in stage `F`, is what defines Hazard3's jump/branch performance.

This document often refers to `F`, `X` and `M` as stages 1, 2 and 3 respectively. This numbering is useful when describing dependencies between values held in different pipeline stages, as it makes the direction and distance of the dependency more apparent.

==== Bus Interfaces

Hazard3 implements either one or two AHB5 bus manager ports. Use the single-port configuration when ease of integration is a priority, since it supports simpler bus topologies. The dual-port configuration adds a dedicated port for instruction fetch. Use the dual-port configuration for maximum frequency and the best clock-for-clock performance.

Hazard3 uses AHB5 specifically, rather than older versions of the AHB standard, because of its support for global exclusives. This is a bus feature that allows a processor to perform an ordered read-modify-write sequence with a guarantee that no other processor has written to the same address range in between. Hazard3 uses this to implement multiprocessor support for the A (atomics) extension. Single-processor support for the A extension does not require these additional signals.

AHB5 is one of the two protocols described in the https://documentation-service.arm.com/static/5f91607cf86e16515cdc3b4b[AMBA 5 AHB protocol specification]. Its full name is (perhaps surprisingly) AMBA 5 AHB5. Refer to the protocol specification for more information about this standard bus protocol.

==== Multiply/Divide

For minimal M-extension support, as enabled by <<param-EXTENSION_M>>, Hazard3 instantiates a sequential multiply/divide circuit (restoring divide, naive repeated-addition multiply). Instructions stall in stage `X` until the multiply/divide completes. Optionally, the circuit can be unrolled by a small factor to produce multiple bits ber clock. A throughput of one, two or four bits per cycle is achievable in practice, with the internal logic delay becoming quite significant at four.

Set <<param-MUL_FAST>> to instantiate the single-cycle multiplier circuit. The fast multiplier returns results either to stage 3 or stage 2, depending on the <<param-MUL_FASTER>> parameter.

By default the single-cycle multiplier only supports 32-bit `mul`, which is by far the most common of the four multiply instructions. The remaining instructions still execute on the sequential multiply/divide circuit. Set the <<param-MULH_FAST>> parameter to add single-cycle support for the high-half instructions (`mulh`, `mulhu` and `mulhsu`), at the cost of additional logic delay and area.

The single-cycle multiplier is implemented as a simple `*` behavioural multiply, so that your tools can infer the best multiply circuit for your platform. For example, Yosys infers DSP tiles on iCE40 UP5k FPGAs. The multiplier is a self-contained module (in `hdl/arith/hazard3_mul_fast.v`), so you can replace its implementation if you know of a faster or lower-area method for your platform.

// ** magic comment to reset sublime text asciidoc lexer

=== List of RISC-V Specifications

These are links to the ratified versions of the base instruction set and extensions implemented by Hazard3.

[%autowidth.stretch, options="header"]
|===
| Extension         | Specification
| `RV32I` v2.1      | https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
| `M` v2.0          | https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
| `A` v2.1          | https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
| `C` v2.0          | https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
| `Zicsr` v2.0      | https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
| `Zifencei` v2.0   | https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
| `Zba` v1.0.0      | https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
| `Zbb` v1.0.0      | https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
| `Zbc` v1.0.0      | https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
| `Zbs` v1.0.0      | https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
| `Zbkb` v1.0.1     | https://github.com/riscv/riscv-crypto/releases/download/v1.0.1-scalar/riscv-crypto-spec-scalar-v1.0.1.pdf[Scalar Cryptography ISA extensions 20220218]
| `Zcb` v1.0.3-1    | https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.3-1/Zc-v1.0.3-1.pdf[Code Size Reduction extensions frozen v1.0.3-1]
| `Zcmp` v1.0.3-1   | https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.3-1/Zc-v1.0.3-1.pdf[Code Size Reduction extensions frozen v1.0.3-1]
| Machine ISA v1.12 | https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf[Privileged Architecture 20211203]
| Debug v0.13.2     | https://riscv.org/wp-content/uploads/2019/03/riscv-debug-release.pdf[RISC-V External Debug Support 20190322]
|===
Document some IRQ CSRs, and instruction timings 2021-05-31 22:57:05 +08:00			`== Introduction`

Editing 2022-08-28 03:49:55 +08:00			`Hazard3 is a configurable 3-stage RISC-V processor, implementing:`
Document some IRQ CSRs, and instruction timings 2021-05-31 22:57:05 +08:00
			* `RV32I`: 32-bit base instruction set
Update docs with bitmanip instructions 2021-11-28 11:16:45 +08:00			* `M`: integer multiply/divide/modulo
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			* `A`: atomic memory operations, with AHB5 global exclusives
Update docs with bitmanip instructions 2021-11-28 11:16:45 +08:00			* `C`: compressed instructions
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			* `Zicsr`: CSR access
Update docs with bitmanip instructions 2021-11-28 11:16:45 +08:00			* `Zba`: address generation
			* `Zbb`: basic bit manipulation
			* `Zbc`: carry-less multiplication
			* `Zbs`: single-bit manipulation
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			* `Zbkb`: basic bit manipulation for scalar cryptography
List Zcb/Zcmp in docs, and rebuild PDF 2023-03-22 10:55:07 +08:00			* `Zcb`: basic additional compressed instructions
			* `Zcmp`: push/pop and double-move compressed instructions
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			`* Debug, Machine and User privilege/execution modes`
			* Privileged instructions `ECALL`, `EBREAK`, `MRET` and `WFI`
			`* External debug support`
			`* Instruction address trigger unit (hardware breakpoints)`

			`=== Architectural Overview`

Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`==== Pipeline Stages`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
Editing 2022-08-28 03:49:55 +08:00			`The three stages are:`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
			* `F`: Fetch
			`** Contains the data phase for instruction fetch`
			`** Contains the instruction prefetch buffer`
			** Predecodes register numbers `rs1`/`rs2`, for faster register file read and register bypass
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`** Contains the address match logic for the optional branch predictor`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			* `X`: Execute
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`** Decodes and execute instructions`
			`** Drives the address phase for load/store/AMO`
			`** Generates jump/branch addresses`
			`** Contains the read and write ports for the CSR file`
			** Unbypassed register values are available at the beginning of stage `X`
			** The ALU result is valid by the end of stage `X`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			* `M`: Memory
			`** Contains the data phase for load/store/AMO`
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`** Generates exception addresses`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			** Register writeback is at the end of stage `M`

Editing 2022-08-28 03:49:55 +08:00			The instruction fetch address phase is best thought of as residing in stage `X`. The 2-cycle feedback loop between jump/branch decode into address issue in stage `X`, and the fetch data phase in stage `F`, is what defines Hazard3's jump/branch performance.
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			This document often refers to `F`, `X` and `M` as stages 1, 2 and 3 respectively. This numbering is useful when describing dependencies between values held in different pipeline stages, as it makes the direction and distance of the dependency more apparent.

Editing 2022-08-28 03:49:55 +08:00			`==== Bus Interfaces`

Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`Hazard3 implements either one or two AHB5 bus manager ports. Use the single-port configuration when ease of integration is a priority, since it supports simpler bus topologies. The dual-port configuration adds a dedicated port for instruction fetch. Use the dual-port configuration for maximum frequency and the best clock-for-clock performance.`

			`Hazard3 uses AHB5 specifically, rather than older versions of the AHB standard, because of its support for global exclusives. This is a bus feature that allows a processor to perform an ordered read-modify-write sequence with a guarantee that no other processor has written to the same address range in between. Hazard3 uses this to implement multiprocessor support for the A (atomics) extension. Single-processor support for the A extension does not require these additional signals.`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`AHB5 is one of the two protocols described in the https://documentation-service.arm.com/static/5f91607cf86e16515cdc3b4b[AMBA 5 AHB protocol specification]. Its full name is (perhaps surprisingly) AMBA 5 AHB5. Refer to the protocol specification for more information about this standard bus protocol.`
Editing 2022-08-28 03:49:55 +08:00
			`==== Multiply/Divide`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			For minimal M-extension support, as enabled by <<param-EXTENSION_M>>, Hazard3 instantiates a sequential multiply/divide circuit (restoring divide, naive repeated-addition multiply). Instructions stall in stage `X` until the multiply/divide completes. Optionally, the circuit can be unrolled by a small factor to produce multiple bits ber clock. A throughput of one, two or four bits per cycle is achievable in practice, with the internal logic delay becoming quite significant at four.

			`Set <<param-MUL_FAST>> to instantiate the single-cycle multiplier circuit. The fast multiplier returns results either to stage 3 or stage 2, depending on the <<param-MUL_FASTER>> parameter.`

			By default the single-cycle multiplier only supports 32-bit `mul`, which is by far the most common of the four multiply instructions. The remaining instructions still execute on the sequential multiply/divide circuit. Set the <<param-MULH_FAST>> parameter to add single-cycle support for the high-half instructions (`mulh`, `mulhu` and `mulhsu`), at the cost of additional logic delay and area.

			The single-cycle multiplier is implemented as a simple `*` behavioural multiply, so that your tools can infer the best multiply circuit for your platform. For example, Yosys infers DSP tiles on iCE40 UP5k FPGAs. The multiplier is a self-contained module (in `hdl/arith/hazard3_mul_fast.v`), so you can replace its implementation if you know of a faster or lower-area method for your platform.
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
Do a quick pass over all the documentation. Fill out port definitions for the AHB5 interfaces. Snip the appendix of instruction pseudocode as it's strictly redundant vs the specs. No need to pad this document. Rearrange the implementation section to put ports before parameters, and add some brief notes on synthesis. 2024-08-08 03:41:37 +08:00			`// ** magic comment to reset sublime text asciidoc lexer`
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00
			`=== List of RISC-V Specifications`
Document some IRQ CSRs, and instruction timings 2021-05-31 22:57:05 +08:00
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			`These are links to the ratified versions of the base instruction set and extensions implemented by Hazard3.`
Document some IRQ CSRs, and instruction timings 2021-05-31 22:57:05 +08:00
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			`[%autowidth.stretch, options="header"]`
			`\|===`
			`\| Extension \| Specification`
			\| `RV32I` v2.1 \| https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
			\| `M` v2.0 \| https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
			\| `A` v2.1 \| https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
			\| `C` v2.0 \| https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
			\| `Zicsr` v2.0 \| https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
			\| `Zifencei` v2.0 \| https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf[Unprivileged ISA 20191213]
			\| `Zba` v1.0.0 \| https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
			\| `Zbb` v1.0.0 \| https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
			\| `Zbc` v1.0.0 \| https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
			\| `Zbs` v1.0.0 \| https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf[Bit Manipulation ISA extensions 20210628]
			\| `Zbkb` v1.0.1 \| https://github.com/riscv/riscv-crypto/releases/download/v1.0.1-scalar/riscv-crypto-spec-scalar-v1.0.1.pdf[Scalar Cryptography ISA extensions 20220218]
List Zcb/Zcmp in docs, and rebuild PDF 2023-03-22 10:55:07 +08:00			\| `Zcb` v1.0.3-1 \| https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.3-1/Zc-v1.0.3-1.pdf[Code Size Reduction extensions frozen v1.0.3-1]
			\| `Zcmp` v1.0.3-1 \| https://github.com/riscv/riscv-code-size-reduction/releases/download/v1.0.3-1/Zc-v1.0.3-1.pdf[Code Size Reduction extensions frozen v1.0.3-1]
Work on docs. Document config options, expand the intro, move instruction timings and pseudocode to appendices. 2022-08-28 03:13:21 +08:00			`\| Machine ISA v1.12 \| https://github.com/riscv/riscv-isa-manual/releases/download/Priv-v1.12/riscv-privileged-20211203.pdf[Privileged Architecture 20211203]`
			`\| Debug v0.13.2 \| https://riscv.org/wp-content/uploads/2019/03/riscv-debug-release.pdf[RISC-V External Debug Support 20190322]`
			`\|===`