== Instruction Cycle Counts All timings are given assuming perfect bus behaviour (no stalls). Stalling of the `I` bus can delay execution indefinitely, as can stalling of the `D` bus during a load or store. === RV32I [%autowidth.stretch, options="header"] |=== | Instruction | Cycles | Note 3+| Integer Register-register | `add rd, rs1, rs2` | 1 | | `sub rd, rs1, rs2` | 1 | | `slt rd, rs1, rs2` | 1 | | `sltu rd, rs1, rs2` | 1 | | `and rd, rs1, rs2` | 1 | | `or rd, rs1, rs2` | 1 | | `xor rd, rs1, rs2` | 1 | | `sll rd, rs1, rs2` | 1 | | `srl rd, rs1, rs2` | 1 | | `sra rd, rs1, rs2` | 1 | 3+| Integer Register-immediate | `addi rd, rs1, imm` | 1 | `nop` is a pseudo-op for `addi x0, x0, 0` | `slti rd, rs1, imm` | 1 | | `sltiu rd, rs1, imm` | 1 | | `andi rd, rs1, imm` | 1 | | `ori rd, rs1, imm` | 1 | | `xori rd, rs1, imm` | 1 | | `slli rd, rs1, imm` | 1 | | `srli rd, rs1, imm` | 1 | | `srai rd, rs1, imm` | 1 | 3+| Large Immediate | `lui rd, imm` | 1 | | `auipc rd, imm` | 1 | 3+| Control Transfer | `jal rd, label` | 2footnote:unaligned_branch[A branch to a 32-bit instruction which is not 32-bit-aligned requires one additional cycle, because two naturally-aligned bus cycles are required to fetch the target instruction.]| | `jalr rd, rs1, imm` | 2footnote:unaligned_branch[] | | `beq rs1, rs2, label`| 1 or 2footnote:unaligned_branch[] | 1 if nontaken, 2 if taken. | `bne rs1, rs2, label`| 1 or 2footnote:unaligned_branch[] | 1 if nontaken, 2 if taken. | `blt rs1, rs2, label`| 1 or 2footnote:unaligned_branch[] | 1 if nontaken, 2 if taken. | `bge rs1, rs2, label`| 1 or 2footnote:unaligned_branch[] | 1 if nontaken, 2 if taken. | `bltu rs1, rs2, label`| 1 or 2footnote:unaligned_branch[] | 1 if nontaken, 2 if taken. | `bgeu rs1, rs2, label`| 1 or 2footnote:unaligned_branch[] | 1 if nontaken, 2 if taken. 3+| Load and Store | `lw rd, imm(rs1)` | 1 or 2 | 1 if next instruction is independent, 2 if dependent.footnote:data_dependency[If an instruction uses load data (from stage 3) in stage 2, a 1-cycle bubble is inserted after the load. Load-data to store-data dependency does not experience this, because the store data is used in stage 3. However, load-data to store-address (or e.g. load-to-add) does qualify.] | `lh rd, imm(rs1)` | 1 or 2 | 1 if next instruction is independent, 2 if dependent.footnote:data_dependency[] | `lhu rd, imm(rs1)` | 1 or 2 | 1 if next instruction is independent, 2 if dependent.footnote:data_dependency[] | `lb rd, imm(rs1)` | 1 or 2 | 1 if next instruction is independent, 2 if dependent.footnote:data_dependency[] | `lbu rd, imm(rs1)` | 1 or 2 | 1 if next instruction is independent, 2 if dependent.footnote:data_dependency[] | `sw rs2, imm(rs1)` | 1 | | `sh rs2, imm(rs1)` | 1 | | `sb rs2, imm(rs1)` | 1 | |=== === M Extension Timings assume the core is configured with `MULDIV_UNROLL = 2` and `MUL_FAST = 1`. I.e. the sequential multiply/divide circuit processes two bits per cycle, and a separate dedicated multiplier is present for the `mul` instruction. [%autowidth.stretch, options="header"] |=== | Instruction | Cycles | Note 3+| 32 {times} 32 -> 32 Multiply | `mul rd, rs1, rs2` | 1 or 2 | 1 if next instruction is independent, 2 if dependent. 3+| 32 {times} 32 -> 64 Multiply, Upper Half | `mulh rd, rs1, rs2` | 18 to 20 | Depending on sign correction | `mulhsu rd, rs1, rs2` | 18 to 20 | Depending on sign correction | `mulhu rd, rs1, rs2` | 18 | 3+| Divide and Remainder | `div rd, rs1, rs2` | 18 or 19 | Depending on sign correction | `divu rd, rs1, rs2` | 18 | | `rem rd, rs1, rs2` | 18 or 19 | Depending on sign correction | `remu rd, rs1, rs2` | 18 | |=== === C Extension All C extension 16-bit instructions on Hazard3 are aliases of base RV32I instructions. They perform identically to their 32-bit counterparts. A consequence of the C extension is that 32-bit instructions can be non-naturally-aligned. This has no penalty during sequential execution, but branching to a 32-bit instruction that is not 32-bit-aligned carries a 1 cycle penalty, because the instruction fetch is cracked into two naturally-aligned bus accesses. === Privileged Instructions (including Zicsr) [%autowidth.stretch, options="header"] |=== | Instruction | Cycles | Note 3+| CSR Access | `csrrw rd, csr, rs1` | 1 | | `csrrc rd, csr, rs1` | 1 | | `csrrs rd, csr, rs1` | 1 | | `csrrwi rd, csr, imm` | 1 | | `csrrci rd, csr, imm` | 1 | | `csrrsi rd, csr, imm` | 1 | 3+| Trap Request | `ecall` | 3 | Time given is for jumping to `mtvec` | `ebreak` | 3 | Time given is for jumping to `mtvec` |=== === Bit Manipulation [%autowidth.stretch, options="header"] |=== | Instruction | Cycles | Note 3+| Zba (address generation) |`sh1add rd, rs1, rs2` | 1 | |`sh2add rd, rs1, rs2` | 1 | |`sh3add rd, rs1, rs2` | 1 | 3+| Zbb (basic bit manipulation) |`andn rd, rs1, rs2` | 1 | |`clz rd, rs1` | 1 | |`cpop rd, rs1` | 1 | |`ctz rd, rs1` | 1 | |`max rd, rs1, rs2` | 1 | |`maxu rd, rs1, rs2` | 1 | |`min rd, rs1, rs2` | 1 | |`minu rd, rs1, rs2` | 1 | |`orc.b rd, rs1` | 1 | |`orn rd, rs1, rs2` | 1 | |`rev8 rd, rs1` | 1 | |`rol rd, rs1, rs2` | 1 | |`ror rd, rs1, rs2` | 1 | |`rori rd, rs1, imm` | 1 | |`sext.b rd, rs1` | 1 | |`sext.h rd, rs1` | 1 | |`xnor rd, rs1, rs2` | 1 | |`zext.h rd, rs1` | 1 | |`zext.b rd, rs1` | 1 | `zext.b` is a pseudo-op for `andi rd, rs1, 0xff` 3+| Zbc (carry-less multiply) |`clmul rd, rs1, rs2` | 1 | |`clmulh rd, rs1, rs2` | 1 | |`clmulr rd, rs1, rs2` | 1 | 3+| Zbs (single-bit manipulation) |`bclr rd, rs1, rs2` | 1 | |`bclri rd, rs1, imm` | 1 | |`bext rd, rs1, rs2` | 1 | |`bexti rd, rs1, imm` | 1 | |`binv rd, rs1, rs2` | 1 | |`binvi rd, rs1, imm` | 1 | |`bset rd, rs1, rs2` | 1 | |`bseti rd, rs1, imm` | 1 | |===