VRoom! blog - Floating Point 4 - exceptions etc31 Jan 2023
We finally finished the FP unit, it passes all the compliance tests and our extended FP data tests.
After finishing the sub-FP functional units (add/multiply/div) the one main missing piece was generating exceptions - on a RISC-V CPU FP exceptions are not system traps, they are accumulated bit masks in the CSR unit - generating them is not hard, getting them correct is harder, and requires lots and lots of testing.
Because we can retire/commit multiple instructions per clock (up to 8) we need to be able to store exception results in each instruction’s commit unit until they are ready to commit (alongside each instruction’s result data in the commit register file).
When the time comes to commit an instruction (write back it’s commit register data into the architectural register file) we logical or together all 5 FP exception bits from each instruction being retired in the same clock (non FP instructions simply report 0) along with a bit that reports that FP state has changed (for example FP loads are non FP instructions that change FP state). The results are accumulated into the CSR unit exceptions register, and the FP MSTATUS.FS (clean/dirty) state.
We’re an out-of-order/speculative system - floating point instructions can be executed out of order, this means that, like FP register data, we can’t commit exception state until an instruction is no longer speculative, also we need to retire exception information in-order with respect to CSR instructions that access and update that same state.
Committing accumulated exception data, as described in the previous section, combined with the fact that the CPU only retires CSR access instructions alone (not with the instructions before and after them) means that at that point everything is synchronized.
One problem with speculative and OO execution is that an instruction that changes the global FP rounding mode may execute AFTER an instruction that depends on it - we solve this problem using an existing mechanism, there are a set of registers in the CSR unit that, if you change their contents, trigger a pipe flush (essentially a null trap that flushes all the subsequent uncommitted instructions and then refetches the next instruction). We use this same mechanism by adding the rounding register to this list, but only if it’s contents change.
One of the hardest bugs to find when we were bringing up linux on the design on the AWS FPGA instance was in the integer divider - a speculative divide would be shot down by a mispredicted branch but it would keep on going, 20 clocks later it would write random data into a commit register - for most functional units this is not an issue as it takes 3 clocks for a pipe break, so units that run in under 3 clocks simply write back into commit registers and the results are ignored. For the FP divide and sqrt we’ve made sure that this works correctly.
Previously we’d been using an older, rather ad-hoc chunk of the standard compliance tests. In these latest checkin we’ve updated to use the current tip of tree with the new mechanisms (thanks to the authors and the riscv-sail people) - our design currently passes all the IMCFD tests. We don’t yet have RISC-V 16-bit FP tests so they are not nearly as well verified.
It also passes a much larger group of stand alone FP tests (240M vectors) - we’re intending to use the same FP submodules in the vector unit so they’ve been built in a modular fashion.
Time to work on something different for a while, next up will be bit field instructions (the B extension), this part of the design is already coded, but not at all tested.
We’ll revisit FP in a while adding in a second FP units and do some benchmarking
Next time: Bit field Instructions