[hw, otbn] Mask Accelerator top level RTL implementation and pre_DV by h-filali · Pull Request #30404 · lowRISC/opentitan

h-filali · 2026-06-15T09:08:30Z

This PR adds the RTL for the Mask Accelerator top level and the corresponding pre_dv.

For in detail information, please view the individual commit messages and the doc headers.

Here is a block diagram that might prove to be helpful:

h-filali · 2026-06-15T09:14:32Z

I stil need to update a failing OTBN smoke test. I'll write an update as soon as that's done.

etterli

I reviewed the RTL. This looks clean. I have some questions about blankers, the wipe and assertions.

andrea-caforio · 2026-06-15T13:59:41Z

Let me know when you would like me to do the ML-DSA adjustments. I can push directly onto this branch.

thommythomaso

Thanks @h-filali.

h-filali · 2026-06-16T14:23:45Z

Thanks @etterli for your feedback. I addressed your feedback now.

Concerning the wipe I removed the wipe functionality inside the mask accelerator. I had a look at the existing wipe procedure inside the mai and it looks good to me as is. The wipe will finish any pending computation before it will clear the registers inside the mai. The only thing that needs to be done inside the mask accelerator is to turn off the rejection sampling such that the wipe of the mai is deterministic in runtime.

h-filali · 2026-06-16T15:11:40Z

Thanks @andrea-caforio I think this would make sense when I have the simulator changes ready. Probably in my next and last PR. So far I think there might be no changes from a SW perspective with how the current MAI interface is written. I see room for optimizations though to increase the throughput.

h-filali · 2026-06-16T15:26:34Z

Thanks @thommythomaso for your feedback. Could you please have a look if you agree with my changes and also if you agree with my secure wipe assessment.

etterli

Thanks for the changes! I have just partially reviewed it. Will continue tomorrow.

etterli · 2026-06-16T15:41:22Z

+  logic [NumShares-1:0][31:0] ma_in0;
+  logic [NumShares-1:0][31:0] ma_in1;
+  logic [NumShares-1:0][31:0] ma_remask_rand;
+  logic [NumShares-1:0][31:0] ma_result;


Here we could also make use of the new types ma_share_t because the MA interface uses these already.

etterli · 2026-06-17T07:56:47Z

+  // A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg).
+  //   inp1 = (a0 ^ r0, r0)
+  //   inp2 = ((a1+mod_neg) ^ r1, r1)
+  // The adder computes a0 + (a1 + mod_neg) = a + mod_neg.
+  // SecAddMod's two-pass correction yields a Boolean masked a.
+  assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0];
+  assign a2b_inp1_pre[1] = remask_rand_i[0];
+  assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1];
+  assign a2b_inp2_pre[1] = remask_rand_i[1];
+
+  // B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a).
+  // inp2 is a fresh Boolean sharing of -mask_mod:
+  //   inp2 = ((-m) ^ r1, r1)
+  // The adder computes a + (-m) = a - m.
+  // The output creates an Arithmetic sharing with m popped from the FIFO.
+  assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1];
+  assign b2a_inp2_pre[1] = remask_rand_i[1];


The blanker in front of the flop just makes sure that always 0 is flopped when the path should not be active. But this logic here toggles always, even when no A2B or B2A operation is ongoing. Should we move the blankers in front of this logic? Then only the logic which is actually required toggles (assuming the blankers are predecoded).

That is what we do in OTBN everywhere else. There the goal is to reduce the toggle noise to a minimum.

Good point @etterli !

This line is not dependent on any share so the toggles are only based on randomness. This should not be a problem IMO. For the a2b signals, what needs to be flopped is the final remasked values.

Can you tell me which part you would like me to move, maybe I'm misunderstanding?

Ah I see now that for B2A it does not toggle based on secrets, it only mixes randomes with the constant modulus. But for A2B the signal(s) a2b_inp1_pre etc toggle for example unnecessarily if we perform a B2A or SecAdd operation. These depend on in0_i.

Or is in0_i blanked (with a predecoded control signal) if no A2B operation is performed?

Don't know if we need to change this right now or if we should just make an issue for this similar to the other question below.

Feel free to make an issue to discuss this. I'm still not 100% sure where you want to move the blanker.

The blanker is just there such that we don't have the direct and a2b paths feed into the same mux simultaneously. Having only a blanker, leaks based on PROLEADs analysis.

Here is an illustration. The current design has toggles where the "flashes" are even when the mode is B2A or SecAdd. As your analysis showed, these are toggles are ok to happen but could be avoided. In OTBN we want to minimize toggles of unused datapath elements as much as possible (even ones which we know/strongly assume to be save).

My first comment should describe that the blanker is moved in front of the A2B "Pre-encoding logic". And if the blanker is predecoded, the A2B "Pre-encoding logic" does only toggle if its result is actually required. Does this explain my point?

etterli · 2026-06-17T08:01:32Z

+      result_o[0] = fifo_rdata;
+      result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];


Thanks for adding the comments!

thommythomaso · 2026-06-18T07:13:55Z

Thanks @thommythomaso for your feedback. Could you please have a look if you agree with my changes and also if you agree with my secure wipe assessment.

Thanks @h-filali. Looks good from my side. From my understanding, we should be fine with the current secure wiping strategy.

vogelpi

Thanks for the nice PR @h-filali ! This looks mostly good to me, I have some questions but they should not block the PR.

vogelpi · 2026-06-18T15:32:03Z

+
  // Width of randomness required by the mask accelerator
-  localparam int unsigned MaRndLen = 32'd322;
+  localparam int unsigned MaRndLen = SecAddRandWidth(SecAddWidth);


Can you please do two things:

Correct the UrndLen parameter. This is now 389 bits. But we "only" really consume 322 bits + 2*32 bits where the latter bits are used for remasking.

Document this somewhere. It's currently pretty hard to understand.

Create an issue for adding a 322-bit permutation to the Bivium output in otbn_rnd.sv. This permutation should be a secret netlist constant, so needs to be exposed as a parameter in otbn.sv and then driven in the top level. Currently we have a separate permutation for the MAC. But the MAI is using the unpermuted URND output.

Question: are the three remaining bits used for the shuffling?

Yes, these 3 bits are used for shuffling. See the type mai_ma_urnd_t in otbn_mai.sv. This needs some comments, I agree.

vogelpi · 2026-06-18T15:44:56Z

+  // A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg).
+  //   inp1 = (a0 ^ r0, r0)
+  //   inp2 = ((a1+mod_neg) ^ r1, r1)
+  // The adder computes a0 + (a1 + mod_neg) = a + mod_neg.
+  // SecAddMod's two-pass correction yields a Boolean masked a.
+  assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0];
+  assign a2b_inp1_pre[1] = remask_rand_i[0];
+  assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1];
+  assign a2b_inp2_pre[1] = remask_rand_i[1];
+
+  // B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a).
+  // inp2 is a fresh Boolean sharing of -mask_mod:
+  //   inp2 = ((-m) ^ r1, r1)
+  // The adder computes a + (-m) = a - m.
+  // The output creates an Arithmetic sharing with m popped from the FIFO.
+  assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1];
+  assign b2a_inp2_pre[1] = remask_rand_i[1];


Good point @etterli !

vogelpi · 2026-06-18T16:14:04Z

+      result_o[0] = fifo_rdata;
+      result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];


Thanks for the comments, they are really valuable (in general you're making very valuable comments throughout the code - well done!).

Question: I absolutely understand that we don't want to feed all adder outputs into this XOR here. However, what I don't understand is why the FF stage is needed. In my view, it should be sufficient to use pre-decoded one-hot muxes. Something like this should work no?

fifo_rdata --------- AND --\ OR ------------ result_o[0] adder_result[0] ---- AND --/ \-- AND --\ XOR ---\ adder_result[1] ---- AND --/ OR ---- result_o[1] \-- AND ----------/

I think this should work if you properly pre-encode the mux control signals (e.g. one-hot directly coming out of a flop) and you may need to introduce a bubble cycle between switching from one to the other path, such that you don't have a direct transition from one to the other path but that you pre-charge the wires to 0 during switching.

vogelpi · 2026-06-18T16:14:35Z

+      result_o[0] = fifo_rdata;
+      result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];


I think we could explore this as a follow-up but I would like us to understand why this doesn't work.

vogelpi · 2026-06-18T16:15:43Z

+      result_o[0] = fifo_rdata;
+      result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];


A similar scheme may be applicable to the input as well. There we would save even more flops. But again, this could be done as a follow-up.

andrea-caforio · 2026-06-19T13:35:28Z

A top-level test similar to what @etterli did for KMAC would be good to have. You can take my OTBN gadget tests as a template.

etterli · 2026-06-23T08:26:55Z

A top-level test similar to what @etterli did for KMAC would be good to have. You can take my OTBN gadget tests as a template.

There is already something in this fashion here: https://github.com/lowRISC/opentitan/blob/master/sw/otbn/mai/mai_test.s

nasahlpa · 2026-06-23T08:31:06Z

@h-filali currently the otbn_isa_test_fpga_cw340_sival_rom_ext test is failing. This needs to be fixed before merging.

etterli

Looks good to me. There are two points regarding how to blank some datapath parts but I think we can postpone them.

etterli · 2026-06-23T08:37:37Z

+
  // Width of randomness required by the mask accelerator
-  localparam int unsigned MaRndLen = 32'd322;
+  localparam int unsigned MaRndLen = SecAddRandWidth(SecAddWidth);


Yes, these 3 bits are used for shuffling. See the type mai_ma_urnd_t in otbn_mai.sv. This needs some comments, I agree.

etterli · 2026-06-23T08:40:55Z

+  // A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg).
+  //   inp1 = (a0 ^ r0, r0)
+  //   inp2 = ((a1+mod_neg) ^ r1, r1)
+  // The adder computes a0 + (a1 + mod_neg) = a + mod_neg.
+  // SecAddMod's two-pass correction yields a Boolean masked a.
+  assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0];
+  assign a2b_inp1_pre[1] = remask_rand_i[0];
+  assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1];
+  assign a2b_inp2_pre[1] = remask_rand_i[1];
+
+  // B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a).
+  // inp2 is a fresh Boolean sharing of -mask_mod:
+  //   inp2 = ((-m) ^ r1, r1)
+  // The adder computes a + (-m) = a - m.
+  // The output creates an Arithmetic sharing with m popped from the FIFO.
+  assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1];
+  assign b2a_inp2_pre[1] = remask_rand_i[1];


Ah I see now that for B2A it does not toggle based on secrets, it only mixes randomes with the constant modulus. But for A2B the signal(s) a2b_inp1_pre etc toggle for example unnecessarily if we perform a B2A or SecAdd operation. These depend on in0_i.

Or is in0_i blanked (with a predecoded control signal) if no A2B operation is performed?

Don't know if we need to change this right now or if we should just make an issue for this similar to the other question below.

The new MAI will cause the OTBN smoke test to fail until the OTBNsim is aligned with the new mask accelerator. This commit temporarily comments out any lines that will cause the smoketest to fail. Furthermore this commit makes changes to the secure adder which get rid of a Verilator UNOPTFLAT false-positive. The warning arose because pre_p was a single array used both as the combinational XOR output (pre_p[0]) and the final pipeline stage driving result_o (pre_p[Stages+1]). The fix splits pre_p into two separate signals: pre_p and pre_p_q[Stages:0]. Signed-off-by: Hakim Filali <hfilali@lowrisc.org>

This commit changes the top level of the mask accelerator to remove the dummy implementation and replace it by the actual implementation. An explanation of how the accelerator works can be found in the doc header of the RTL. Signed-off-by: Hakim Filali <hfilali@lowrisc.org>

This commit adds a testbench for all 4 modes of the mask accelerator. It also tests the secure wipe. This commit also consolidates some of the testbench constants and helper functions into a single header file serving the sec_add and mask_accelerator testbenches. Furthermore, this commit changes the README to explain the added functionality. Signed-off-by: Hakim Filali <hfilali@lowrisc.org>

h-filali requested review from andrea-caforio, etterli, nasahlpa and vogelpi June 15, 2026 09:08

h-filali requested a review from a team as a code owner June 15, 2026 09:08

h-filali requested review from moidx and removed request for a team June 15, 2026 09:08

h-filali force-pushed the otbn-mask-accelerator-rtl branch from d746046 to 181d5c0 Compare June 15, 2026 09:11

h-filali force-pushed the otbn-mask-accelerator-rtl branch 2 times, most recently from 47f577a to 5f0d38e Compare June 15, 2026 09:33

etterli requested a review from thommythomaso June 15, 2026 09:40

h-filali force-pushed the otbn-mask-accelerator-rtl branch from 5f0d38e to c8b3dcb Compare June 15, 2026 09:45

etterli reviewed Jun 15, 2026

View reviewed changes

h-filali force-pushed the otbn-mask-accelerator-rtl branch from c8b3dcb to 1e4b257 Compare June 15, 2026 12:23

h-filali requested a review from a team as a code owner June 15, 2026 12:23

h-filali requested review from martin-velay and removed request for a team June 15, 2026 12:23

thommythomaso reviewed Jun 16, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated

thommythomaso reviewed Jun 16, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated

thommythomaso reviewed Jun 16, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_pkg.sv Outdated

thommythomaso reviewed Jun 16, 2026

View reviewed changes

h-filali force-pushed the otbn-mask-accelerator-rtl branch 2 times, most recently from 72da237 to 30cd0f1 Compare June 16, 2026 14:20

h-filali force-pushed the otbn-mask-accelerator-rtl branch from d48b12e to ec2896b Compare June 16, 2026 15:27

etterli reviewed Jun 16, 2026

View reviewed changes

h-filali force-pushed the otbn-mask-accelerator-rtl branch 2 times, most recently from 162ea0d to cc091a0 Compare June 16, 2026 16:12

etterli reviewed Jun 17, 2026

View reviewed changes

thommythomaso reviewed Jun 18, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv

vogelpi approved these changes Jun 18, 2026

View reviewed changes

h-filali force-pushed the otbn-mask-accelerator-rtl branch 3 times, most recently from 971db5e to 4dcea80 Compare June 22, 2026 14:58

etterli approved these changes Jun 23, 2026

View reviewed changes

h-filali force-pushed the otbn-mask-accelerator-rtl branch 4 times, most recently from aa4c944 to 86f74f9 Compare June 23, 2026 15:49

h-filali added 3 commits June 23, 2026 17:57

h-filali force-pushed the otbn-mask-accelerator-rtl branch from 86f74f9 to 17b3e1e Compare June 23, 2026 15:58

		result_o[0] = fifo_rdata;
		result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];

Conversation

h-filali commented Jun 15, 2026

Uh oh!

h-filali commented Jun 15, 2026

Uh oh!

etterli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrea-caforio commented Jun 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thommythomaso left a comment

Choose a reason for hiding this comment

Uh oh!

h-filali commented Jun 16, 2026

Uh oh!

h-filali commented Jun 16, 2026

Uh oh!

h-filali commented Jun 16, 2026

Uh oh!

etterli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thommythomaso commented Jun 18, 2026

Uh oh!

vogelpi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!