VRoom! blog - Lots of progress Bitfields, Crypto etc
10 Feb 2023Introduction
With FP out of the way the decks have been cleared and we’re quickly moving along, rapidly adding support for more ratified extensions. Lots of new stuff here today!
Results - new Dhrystone numbers ~6.74 DMips/MHz - increased by ~5%
Note: we use the term “crypto” here - in this context it has all to do with cryptography and absolutely nothing to do with cryptocurrencies.
Bitfield instructions
We had implemented the bitfield extensions from the original proposal quite a while ago, but had not tested them, and they were disabled. Now that we’ve spent time with the ratified extension, a lot of the work has been removing instructions that had been removed from the spec. Most of the instructions ended up in the shift ALU (which is essentially a machine generated bunch of muxes) where all instructions run in 1 clock. The instructions that involve adders or arithmetic comparisons (shlXadd and min/max) went into the ALU and clmul which takes 2 clocks went into the multiplier (which itself mostly runs with 2 clock latencies).
All in all this came up really fast, we have now implemented all of the bitfield extensions: Zba, Zbb, Zbc, and Zbs and they pass all the compliance tests.
Crypto instructions
The crypto K extensions include some of the bitfield extensions along with a bunch of more crypto oriented instructions, in particular support for the NIST mandated SHA256/512, AES32/64, and the Chinese SM3 and SM4 instructions.
We have implemented all of the crypto extensions Zbkb, Zbkc, Zbkx, Zknd, Zkne, Zknh, Zksed, Zksh and they pass compliance testing.
We can attest that we meet the “Data Independent Execution Latency Subset: Zkt” as all VRoom!’s instructions (with the exceptions of integer and FP division, which is allowable) run in fixed time irrespective of the value of their operands.
We have also implemented the entropy source register (Zkr) - we have not implemented a real entropy source in the verilog model, nor a whitener - those are really something that needs to be decided in a final implementation. Instead we’ve created a dummy plugin for testing and hardware to distribute entropy to multiple HARTs.
We’ve completed support for all of the Zhn (NIST Algorithm Suite), Zks (ShangMi Algorithm Suite) and Zk (Standard scalar cryptography extension) extensions, and they have passed compliance testing.
RV32
In the past one thing we’ve not been treating 32-bit mode as a full part of our instruction set (running 32-bit instructions (RV32) in our RV64 bit world) - the design was all in there but none of it was tested. With the advent of a whole bunch of new 32-bit only crypto instructions they needed to be tested, so we’ve brought up RV32 as a compliance testing target and can now report that VRoom! passes all of the RV32 compliance testing as well. Compliance testing now take ~8 hours to run ….
Misc
We’re now working our way through the newly ratified extensions - adding in the security bits for the crypto seed we also implemented the rest of the security register (new PMAP modes for the OPTEE people).
Summary
Retesting Dhrystone (recompiled with the new instructions) we see a minor (5%) increase in Dhrystone numbers from 6.42 to 6.74 DMips/MHz - this is almost totally due to the compiler now using the new sh1add/sh2add/zextb instructions from the bitfield extension.
Next time: More misc extensions,