[hw, otbn] Mask Accelerator top level RTL implementation and pre_DV#30404
[hw, otbn] Mask Accelerator top level RTL implementation and pre_DV#30404h-filali wants to merge 3 commits into
Conversation
d746046 to
181d5c0
Compare
|
I stil need to update a failing OTBN smoke test. I'll write an update as soon as that's done. |
47f577a to
5f0d38e
Compare
5f0d38e to
c8b3dcb
Compare
etterli
left a comment
There was a problem hiding this comment.
I reviewed the RTL. This looks clean. I have some questions about blankers, the wipe and assertions.
c8b3dcb to
1e4b257
Compare
|
Let me know when you would like me to do the ML-DSA adjustments. I can push directly onto this branch. |
72da237 to
30cd0f1
Compare
|
Thanks @etterli for your feedback. I addressed your feedback now. Concerning the wipe I removed the wipe functionality inside the mask accelerator. I had a look at the existing wipe procedure inside the mai and it looks good to me as is. The wipe will finish any pending computation before it will clear the registers inside the mai. The only thing that needs to be done inside the mask accelerator is to turn off the rejection sampling such that the wipe of the mai is deterministic in runtime. |
|
Thanks @andrea-caforio I think this would make sense when I have the simulator changes ready. Probably in my next and last PR. So far I think there might be no changes from a SW perspective with how the current MAI interface is written. I see room for optimizations though to increase the throughput. |
|
Thanks @thommythomaso for your feedback. Could you please have a look if you agree with my changes and also if you agree with my secure wipe assessment. |
d48b12e to
ec2896b
Compare
etterli
left a comment
There was a problem hiding this comment.
Thanks for the changes! I have just partially reviewed it. Will continue tomorrow.
| logic [NumShares-1:0][31:0] ma_in0; | ||
| logic [NumShares-1:0][31:0] ma_in1; | ||
| logic [NumShares-1:0][31:0] ma_remask_rand; | ||
| logic [NumShares-1:0][31:0] ma_result; |
There was a problem hiding this comment.
Here we could also make use of the new types ma_share_t because the MA interface uses these already.
162ea0d to
cc091a0
Compare
| // A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg). | ||
| // inp1 = (a0 ^ r0, r0) | ||
| // inp2 = ((a1+mod_neg) ^ r1, r1) | ||
| // The adder computes a0 + (a1 + mod_neg) = a + mod_neg. | ||
| // SecAddMod's two-pass correction yields a Boolean masked a. | ||
| assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0]; | ||
| assign a2b_inp1_pre[1] = remask_rand_i[0]; | ||
| assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1]; | ||
| assign a2b_inp2_pre[1] = remask_rand_i[1]; | ||
|
|
||
| // B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a). | ||
| // inp2 is a fresh Boolean sharing of -mask_mod: | ||
| // inp2 = ((-m) ^ r1, r1) | ||
| // The adder computes a + (-m) = a - m. | ||
| // The output creates an Arithmetic sharing with m popped from the FIFO. | ||
| assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1]; | ||
| assign b2a_inp2_pre[1] = remask_rand_i[1]; |
There was a problem hiding this comment.
The blanker in front of the flop just makes sure that always 0 is flopped when the path should not be active. But this logic here toggles always, even when no A2B or B2A operation is ongoing. Should we move the blankers in front of this logic? Then only the logic which is actually required toggles (assuming the blankers are predecoded).
That is what we do in OTBN everywhere else. There the goal is to reduce the toggle noise to a minimum.
There was a problem hiding this comment.
This line is not dependent on any share so the toggles are only based on randomness. This should not be a problem IMO. For the a2b signals, what needs to be flopped is the final remasked values.
Can you tell me which part you would like me to move, maybe I'm misunderstanding?
There was a problem hiding this comment.
Ah I see now that for B2A it does not toggle based on secrets, it only mixes randomes with the constant modulus. But for A2B the signal(s) a2b_inp1_pre etc toggle for example unnecessarily if we perform a B2A or SecAdd operation. These depend on in0_i.
Or is in0_i blanked (with a predecoded control signal) if no A2B operation is performed?
Don't know if we need to change this right now or if we should just make an issue for this similar to the other question below.
There was a problem hiding this comment.
Feel free to make an issue to discuss this. I'm still not 100% sure where you want to move the blanker.
The blanker is just there such that we don't have the direct and a2b paths feed into the same mux simultaneously. Having only a blanker, leaks based on PROLEADs analysis.
There was a problem hiding this comment.
Here is an illustration. The current design has toggles where the "flashes" are even when the mode is B2A or SecAdd. As your analysis showed, these are toggles are ok to happen but could be avoided. In OTBN we want to minimize toggles of unused datapath elements as much as possible (even ones which we know/strongly assume to be save).
My first comment should describe that the blanker is moved in front of the A2B "Pre-encoding logic". And if the blanker is predecoded, the A2B "Pre-encoding logic" does only toggle if its result is actually required. Does this explain my point?
| result_o[0] = fifo_rdata; | ||
| result_o[1] = result_b2a_q[0] ^ result_b2a_q[1]; |
There was a problem hiding this comment.
Thanks for adding the comments!
Thanks @h-filali. Looks good from my side. From my understanding, we should be fine with the current secure wiping strategy. |
|
|
||
| // Width of randomness required by the mask accelerator | ||
| localparam int unsigned MaRndLen = 32'd322; | ||
| localparam int unsigned MaRndLen = SecAddRandWidth(SecAddWidth); |
There was a problem hiding this comment.
Can you please do two things:
- Correct the UrndLen parameter. This is now 389 bits. But we "only" really consume 322 bits + 2*32 bits where the latter bits are used for remasking.
- Document this somewhere. It's currently pretty hard to understand.
- Create an issue for adding a 322-bit permutation to the Bivium output in
otbn_rnd.sv. This permutation should be a secret netlist constant, so needs to be exposed as a parameter inotbn.svand then driven in the top level. Currently we have a separate permutation for the MAC. But the MAI is using the unpermuted URND output.
There was a problem hiding this comment.
Question: are the three remaining bits used for the shuffling?
There was a problem hiding this comment.
Yes, these 3 bits are used for shuffling. See the type mai_ma_urnd_t in otbn_mai.sv. This needs some comments, I agree.
| // A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg). | ||
| // inp1 = (a0 ^ r0, r0) | ||
| // inp2 = ((a1+mod_neg) ^ r1, r1) | ||
| // The adder computes a0 + (a1 + mod_neg) = a + mod_neg. | ||
| // SecAddMod's two-pass correction yields a Boolean masked a. | ||
| assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0]; | ||
| assign a2b_inp1_pre[1] = remask_rand_i[0]; | ||
| assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1]; | ||
| assign a2b_inp2_pre[1] = remask_rand_i[1]; | ||
|
|
||
| // B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a). | ||
| // inp2 is a fresh Boolean sharing of -mask_mod: | ||
| // inp2 = ((-m) ^ r1, r1) | ||
| // The adder computes a + (-m) = a - m. | ||
| // The output creates an Arithmetic sharing with m popped from the FIFO. | ||
| assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1]; | ||
| assign b2a_inp2_pre[1] = remask_rand_i[1]; |
| result_o[0] = fifo_rdata; | ||
| result_o[1] = result_b2a_q[0] ^ result_b2a_q[1]; |
There was a problem hiding this comment.
Thanks for the comments, they are really valuable (in general you're making very valuable comments throughout the code - well done!).
Question: I absolutely understand that we don't want to feed all adder outputs into this XOR here. However, what I don't understand is why the FF stage is needed. In my view, it should be sufficient to use pre-decoded one-hot muxes. Something like this should work no?
fifo_rdata --------- AND --\
OR ------------ result_o[0]
adder_result[0] ---- AND --/
\-- AND --\
XOR ---\
adder_result[1] ---- AND --/ OR ---- result_o[1]
\-- AND ----------/
I think this should work if you properly pre-encode the mux control signals (e.g. one-hot directly coming out of a flop) and you may need to introduce a bubble cycle between switching from one to the other path, such that you don't have a direct transition from one to the other path but that you pre-charge the wires to 0 during switching.
| result_o[0] = fifo_rdata; | ||
| result_o[1] = result_b2a_q[0] ^ result_b2a_q[1]; |
There was a problem hiding this comment.
I think we could explore this as a follow-up but I would like us to understand why this doesn't work.
| result_o[0] = fifo_rdata; | ||
| result_o[1] = result_b2a_q[0] ^ result_b2a_q[1]; |
There was a problem hiding this comment.
A similar scheme may be applicable to the input as well. There we would save even more flops. But again, this could be done as a follow-up.
|
A top-level test similar to what @etterli did for KMAC would be good to have. You can take my OTBN gadget tests as a template. |
971db5e to
4dcea80
Compare
There is already something in this fashion here: https://github.com/lowRISC/opentitan/blob/master/sw/otbn/mai/mai_test.s |
|
@h-filali currently the |
etterli
left a comment
There was a problem hiding this comment.
Looks good to me. There are two points regarding how to blank some datapath parts but I think we can postpone them.
|
|
||
| // Width of randomness required by the mask accelerator | ||
| localparam int unsigned MaRndLen = 32'd322; | ||
| localparam int unsigned MaRndLen = SecAddRandWidth(SecAddWidth); |
There was a problem hiding this comment.
Yes, these 3 bits are used for shuffling. See the type mai_ma_urnd_t in otbn_mai.sv. This needs some comments, I agree.
| // A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg). | ||
| // inp1 = (a0 ^ r0, r0) | ||
| // inp2 = ((a1+mod_neg) ^ r1, r1) | ||
| // The adder computes a0 + (a1 + mod_neg) = a + mod_neg. | ||
| // SecAddMod's two-pass correction yields a Boolean masked a. | ||
| assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0]; | ||
| assign a2b_inp1_pre[1] = remask_rand_i[0]; | ||
| assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1]; | ||
| assign a2b_inp2_pre[1] = remask_rand_i[1]; | ||
|
|
||
| // B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a). | ||
| // inp2 is a fresh Boolean sharing of -mask_mod: | ||
| // inp2 = ((-m) ^ r1, r1) | ||
| // The adder computes a + (-m) = a - m. | ||
| // The output creates an Arithmetic sharing with m popped from the FIFO. | ||
| assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1]; | ||
| assign b2a_inp2_pre[1] = remask_rand_i[1]; |
There was a problem hiding this comment.
Ah I see now that for B2A it does not toggle based on secrets, it only mixes randomes with the constant modulus. But for A2B the signal(s) a2b_inp1_pre etc toggle for example unnecessarily if we perform a B2A or SecAdd operation. These depend on in0_i.
Or is in0_i blanked (with a predecoded control signal) if no A2B operation is performed?
Don't know if we need to change this right now or if we should just make an issue for this similar to the other question below.
aa4c944 to
86f74f9
Compare
The new MAI will cause the OTBN smoke test to fail until the OTBNsim is aligned with the new mask accelerator. This commit temporarily comments out any lines that will cause the smoketest to fail. Furthermore this commit makes changes to the secure adder which get rid of a Verilator UNOPTFLAT false-positive. The warning arose because pre_p was a single array used both as the combinational XOR output (pre_p[0]) and the final pipeline stage driving result_o (pre_p[Stages+1]). The fix splits pre_p into two separate signals: pre_p and pre_p_q[Stages:0]. Signed-off-by: Hakim Filali <hfilali@lowrisc.org>
This commit changes the top level of the mask accelerator to remove the dummy implementation and replace it by the actual implementation. An explanation of how the accelerator works can be found in the doc header of the RTL. Signed-off-by: Hakim Filali <hfilali@lowrisc.org>
This commit adds a testbench for all 4 modes of the mask accelerator. It also tests the secure wipe. This commit also consolidates some of the testbench constants and helper functions into a single header file serving the sec_add and mask_accelerator testbenches. Furthermore, this commit changes the README to explain the added functionality. Signed-off-by: Hakim Filali <hfilali@lowrisc.org>
86f74f9 to
17b3e1e
Compare
This PR adds the RTL for the Mask Accelerator top level and the corresponding pre_dv.
For in detail information, please view the individual commit messages and the doc headers.
Here is a block diagram that might prove to be helpful: