Skip to content

[hw, otbn] Mask Accelerator top level RTL implementation and pre_DV#30404

Open
h-filali wants to merge 3 commits into
lowRISC:masterfrom
h-filali:otbn-mask-accelerator-rtl
Open

[hw, otbn] Mask Accelerator top level RTL implementation and pre_DV#30404
h-filali wants to merge 3 commits into
lowRISC:masterfrom
h-filali:otbn-mask-accelerator-rtl

Conversation

@h-filali

Copy link
Copy Markdown
Contributor

This PR adds the RTL for the Mask Accelerator top level and the corresponding pre_dv.

For in detail information, please view the individual commit messages and the doc headers.

Here is a block diagram that might prove to be helpful:

mask_acceleratorpng

@h-filali h-filali requested a review from a team as a code owner June 15, 2026 09:08
@h-filali h-filali requested review from moidx and removed request for a team June 15, 2026 09:08
@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch from d746046 to 181d5c0 Compare June 15, 2026 09:11
@h-filali

Copy link
Copy Markdown
Contributor Author

I stil need to update a failing OTBN smoke test. I'll write an update as soon as that's done.

@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch 2 times, most recently from 47f577a to 5f0d38e Compare June 15, 2026 09:33
@etterli etterli requested a review from thommythomaso June 15, 2026 09:40
@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch from 5f0d38e to c8b3dcb Compare June 15, 2026 09:45

@etterli etterli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the RTL. This looks clean. I have some questions about blankers, the wipe and assertions.

Comment thread hw/ip/otbn/rtl/otbn_sec_add.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch from c8b3dcb to 1e4b257 Compare June 15, 2026 12:23
@h-filali h-filali requested a review from a team as a code owner June 15, 2026 12:23
@h-filali h-filali requested review from martin-velay and removed request for a team June 15, 2026 12:23
@andrea-caforio

Copy link
Copy Markdown
Contributor

Let me know when you would like me to do the ML-DSA adjustments. I can push directly onto this branch.

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_pkg.sv Outdated

@thommythomaso thommythomaso left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @h-filali.

@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch 2 times, most recently from 72da237 to 30cd0f1 Compare June 16, 2026 14:20
@h-filali

Copy link
Copy Markdown
Contributor Author

Thanks @etterli for your feedback. I addressed your feedback now.

Concerning the wipe I removed the wipe functionality inside the mask accelerator. I had a look at the existing wipe procedure inside the mai and it looks good to me as is. The wipe will finish any pending computation before it will clear the registers inside the mai. The only thing that needs to be done inside the mask accelerator is to turn off the rejection sampling such that the wipe of the mai is deterministic in runtime.

@h-filali

Copy link
Copy Markdown
Contributor Author

Thanks @andrea-caforio I think this would make sense when I have the simulator changes ready. Probably in my next and last PR. So far I think there might be no changes from a SW perspective with how the current MAI interface is written. I see room for optimizations though to increase the throughput.

@h-filali

Copy link
Copy Markdown
Contributor Author

Thanks @thommythomaso for your feedback. Could you please have a look if you agree with my changes and also if you agree with my secure wipe assessment.

@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch from d48b12e to ec2896b Compare June 16, 2026 15:27

@etterli etterli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! I have just partially reviewed it. Will continue tomorrow.

Comment thread hw/ip/otbn/rtl/otbn_sec_add.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_pkg.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mai.sv Outdated
Comment on lines +139 to +142
logic [NumShares-1:0][31:0] ma_in0;
logic [NumShares-1:0][31:0] ma_in1;
logic [NumShares-1:0][31:0] ma_remask_rand;
logic [NumShares-1:0][31:0] ma_result;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could also make use of the new types ma_share_t because the MA interface uses these already.

@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch 2 times, most recently from 162ea0d to cc091a0 Compare June 16, 2026 16:12
Comment thread hw/ip/otbn/rtl/otbn_pkg.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment on lines +215 to +231
// A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg).
// inp1 = (a0 ^ r0, r0)
// inp2 = ((a1+mod_neg) ^ r1, r1)
// The adder computes a0 + (a1 + mod_neg) = a + mod_neg.
// SecAddMod's two-pass correction yields a Boolean masked a.
assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0];
assign a2b_inp1_pre[1] = remask_rand_i[0];
assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1];
assign a2b_inp2_pre[1] = remask_rand_i[1];

// B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a).
// inp2 is a fresh Boolean sharing of -mask_mod:
// inp2 = ((-m) ^ r1, r1)
// The adder computes a + (-m) = a - m.
// The output creates an Arithmetic sharing with m popped from the FIFO.
assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1];
assign b2a_inp2_pre[1] = remask_rand_i[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blanker in front of the flop just makes sure that always 0 is flopped when the path should not be active. But this logic here toggles always, even when no A2B or B2A operation is ongoing. Should we move the blankers in front of this logic? Then only the logic which is actually required toggles (assuming the blankers are predecoded).

That is what we do in OTBN everywhere else. There the goal is to reduce the toggle noise to a minimum.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @etterli !

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not dependent on any share so the toggles are only based on randomness. This should not be a problem IMO. For the a2b signals, what needs to be flopped is the final remasked values.

Can you tell me which part you would like me to move, maybe I'm misunderstanding?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now that for B2A it does not toggle based on secrets, it only mixes randomes with the constant modulus. But for A2B the signal(s) a2b_inp1_pre etc toggle for example unnecessarily if we perform a B2A or SecAdd operation. These depend on in0_i.

Or is in0_i blanked (with a predecoded control signal) if no A2B operation is performed?

Don't know if we need to change this right now or if we should just make an issue for this similar to the other question below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to make an issue to discuss this. I'm still not 100% sure where you want to move the blanker.

The blanker is just there such that we don't have the direct and a2b paths feed into the same mux simultaneously. Having only a blanker, leaks based on PROLEADs analysis.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an illustration. The current design has toggles where the "flashes" are even when the mode is B2A or SecAdd. As your analysis showed, these are toggles are ok to happen but could be avoided. In OTBN we want to minimize toggles of unused datapath elements as much as possible (even ones which we know/strongly assume to be save).

My first comment should describe that the blanker is moved in front of the A2B "Pre-encoding logic". And if the blanker is predecoded, the A2B "Pre-encoding logic" does only toggle if its result is actually required. Does this explain my point?

Image

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment on lines +463 to +464
result_o[0] = fifo_rdata;
result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the comments!

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv Outdated
Comment thread hw/ip/otbn/rtl/otbn_mai.sv
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv
Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv
@thommythomaso

Copy link
Copy Markdown
Contributor

Thanks @thommythomaso for your feedback. Could you please have a look if you agree with my changes and also if you agree with my secure wipe assessment.

Thanks @h-filali. Looks good from my side. From my understanding, we should be fine with the current secure wiping strategy.

@vogelpi vogelpi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the nice PR @h-filali ! This looks mostly good to me, I have some questions but they should not block the PR.

Comment thread hw/ip/otbn/rtl/otbn_mai.sv

// Width of randomness required by the mask accelerator
localparam int unsigned MaRndLen = 32'd322;
localparam int unsigned MaRndLen = SecAddRandWidth(SecAddWidth);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please do two things:

  1. Correct the UrndLen parameter. This is now 389 bits. But we "only" really consume 322 bits + 2*32 bits where the latter bits are used for remasking.
  2. Document this somewhere. It's currently pretty hard to understand.
  3. Create an issue for adding a 322-bit permutation to the Bivium output in otbn_rnd.sv. This permutation should be a secret netlist constant, so needs to be exposed as a parameter in otbn.sv and then driven in the top level. Currently we have a separate permutation for the MAC. But the MAI is using the unpermuted URND output.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: are the three remaining bits used for the shuffling?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these 3 bits are used for shuffling. See the type mai_ma_urnd_t in otbn_mai.sv. This needs some comments, I agree.

Comment on lines +215 to +231
// A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg).
// inp1 = (a0 ^ r0, r0)
// inp2 = ((a1+mod_neg) ^ r1, r1)
// The adder computes a0 + (a1 + mod_neg) = a + mod_neg.
// SecAddMod's two-pass correction yields a Boolean masked a.
assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0];
assign a2b_inp1_pre[1] = remask_rand_i[0];
assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1];
assign a2b_inp2_pre[1] = remask_rand_i[1];

// B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a).
// inp2 is a fresh Boolean sharing of -mask_mod:
// inp2 = ((-m) ^ r1, r1)
// The adder computes a + (-m) = a - m.
// The output creates an Arithmetic sharing with m popped from the FIFO.
assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1];
assign b2a_inp2_pre[1] = remask_rand_i[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @etterli !

Comment on lines +463 to +464
result_o[0] = fifo_rdata;
result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments, they are really valuable (in general you're making very valuable comments throughout the code - well done!).

Question: I absolutely understand that we don't want to feed all adder outputs into this XOR here. However, what I don't understand is why the FF stage is needed. In my view, it should be sufficient to use pre-decoded one-hot muxes. Something like this should work no?

fifo_rdata --------- AND --\
                            OR ------------ result_o[0]
adder_result[0] ---- AND --/
                 \-- AND --\
                            XOR ---\
adder_result[1] ---- AND --/        OR ---- result_o[1]
                 \-- AND ----------/

I think this should work if you properly pre-encode the mux control signals (e.g. one-hot directly coming out of a flop) and you may need to introduce a bubble cycle between switching from one to the other path, such that you don't have a direct transition from one to the other path but that you pre-charge the wires to 0 during switching.

Comment on lines +463 to +464
result_o[0] = fifo_rdata;
result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could explore this as a follow-up but I would like us to understand why this doesn't work.

Comment on lines +463 to +464
result_o[0] = fifo_rdata;
result_o[1] = result_b2a_q[0] ^ result_b2a_q[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar scheme may be applicable to the input as well. There we would save even more flops. But again, this could be done as a follow-up.

@andrea-caforio

Copy link
Copy Markdown
Contributor

A top-level test similar to what @etterli did for KMAC would be good to have. You can take my OTBN gadget tests as a template.

@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch 3 times, most recently from 971db5e to 4dcea80 Compare June 22, 2026 14:58
@etterli

etterli commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

A top-level test similar to what @etterli did for KMAC would be good to have. You can take my OTBN gadget tests as a template.

There is already something in this fashion here: https://github.com/lowRISC/opentitan/blob/master/sw/otbn/mai/mai_test.s

@nasahlpa

Copy link
Copy Markdown
Contributor

@h-filali currently the otbn_isa_test_fpga_cw340_sival_rom_ext test is failing. This needs to be fixed before merging.

@etterli etterli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. There are two points regarding how to blank some datapath parts but I think we can postpone them.

Comment thread hw/ip/otbn/rtl/otbn_mask_accelerator.sv

// Width of randomness required by the mask accelerator
localparam int unsigned MaRndLen = 32'd322;
localparam int unsigned MaRndLen = SecAddRandWidth(SecAddWidth);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these 3 bits are used for shuffling. See the type mai_ma_urnd_t in otbn_mai.sv. This needs some comments, I agree.

Comment on lines +215 to +231
// A2B input pre-encoding: produce fresh Boolean sharings of a0 and (a1 + mod_neg).
// inp1 = (a0 ^ r0, r0)
// inp2 = ((a1+mod_neg) ^ r1, r1)
// The adder computes a0 + (a1 + mod_neg) = a + mod_neg.
// SecAddMod's two-pass correction yields a Boolean masked a.
assign a2b_inp1_pre[0] = in0_i[0] ^ remask_rand_i[0];
assign a2b_inp1_pre[1] = remask_rand_i[0];
assign a2b_inp2_pre[0] = (in0_i[1] + mod_neg) ^ remask_rand_i[1];
assign a2b_inp2_pre[1] = remask_rand_i[1];

// B2A input pre-encoding: inp1 is in0_i directly (already a Boolean sharing of a).
// inp2 is a fresh Boolean sharing of -mask_mod:
// inp2 = ((-m) ^ r1, r1)
// The adder computes a + (-m) = a - m.
// The output creates an Arithmetic sharing with m popped from the FIFO.
assign b2a_inp2_pre[0] = (~mask_mod + 1'b1) ^ remask_rand_i[1];
assign b2a_inp2_pre[1] = remask_rand_i[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now that for B2A it does not toggle based on secrets, it only mixes randomes with the constant modulus. But for A2B the signal(s) a2b_inp1_pre etc toggle for example unnecessarily if we perform a B2A or SecAdd operation. These depend on in0_i.

Or is in0_i blanked (with a predecoded control signal) if no A2B operation is performed?

Don't know if we need to change this right now or if we should just make an issue for this similar to the other question below.

@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch 4 times, most recently from aa4c944 to 86f74f9 Compare June 23, 2026 15:49
h-filali added 3 commits June 23, 2026 17:57
The new MAI will cause the OTBN smoke test to fail until the OTBNsim
is aligned with the new mask accelerator. This commit temporarily
comments out any lines that will cause the smoketest to fail.

Furthermore this commit makes changes to the secure adder which get
rid of a Verilator UNOPTFLAT false-positive. The warning arose because
pre_p was a single array used both as the combinational XOR output (pre_p[0])
and the final pipeline stage driving result_o (pre_p[Stages+1]).

The fix splits pre_p into two separate signals: pre_p and pre_p_q[Stages:0].

Signed-off-by: Hakim Filali <hfilali@lowrisc.org>
This commit changes the top level of the mask accelerator to remove the
dummy implementation and replace it by the actual implementation. An
explanation of how the accelerator works can be found in the doc header
of the RTL.

Signed-off-by: Hakim Filali <hfilali@lowrisc.org>
This commit adds a testbench for all 4 modes of the mask accelerator.
It also tests the secure wipe. This commit also consolidates some
of the testbench constants and helper functions into a single header
file serving the sec_add and mask_accelerator testbenches. Furthermore,
this commit changes the README to explain the added functionality.

Signed-off-by: Hakim Filali <hfilali@lowrisc.org>
@h-filali h-filali force-pushed the otbn-mask-accelerator-rtl branch from 86f74f9 to 17b3e1e Compare June 23, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants