Pr/compressor integration by sgerber-amd · Pull Request #1560 · Xilinx/finn

sgerber-amd · 2026-04-15T11:35:53Z

More details in comment below.

Port of compressor-python library for efficient low-bitwidth dot product computation using LUT primitives instead of DSP blocks. Architecture: - Counter-based compressor trees - Fused accumulation with constant propagation - Target-specific primitive selection (CARRY4/CARRY8/LOOKAHEAD8) FPGA Support: - Versal: Fully functional - 7-Series: Functional without fused accumulation and gate absorption (not ready for mvau integration) - UltraScale/UltraScale+: Not yet implemented Integration scripts for both dotp_comp and add_multi optimization modes included. Implementation: - Python-based compressor graph construction and optimization - SystemVerilog template expansion for RTL generation - mul_comp_map module for partial product broadcasting This commit adds the generator infrastructure only. Integration with FINN's RTL backend follows in subsequent commits.

Wire the compressor generator into FINN's RTL MVAU datapath, enabling LUT-based dot product computation as an alternative to DSP blocks. RTL Datapath Changes (finn-rtllib/mvu/): - mvu_vvu_axi.sv: Add USE_COMPRESSOR parameter and conditional instantiation - add_multi.sv: Add CATCH_COMP macro for generated compressor module instantiation - mvu_vvu_axi_wrapper.v: Propagate COMP_PIPELINE_DEPTH parameter FINN Backend Integration (matrixvectoractivation_rtl.py): - Add compressor eligibility checks (_is_dotp_comp_eligible) - Conditionally generate dotp_comp and add_multi compressor modules - Include generated RTL files in build - Propagate USE_COMPRESSOR and COMP_PIPELINE_DEPTH template variables Versal MVAU can use compressor-based compute instead of DSP blocks. 7-Series and UltraScale+ not yet supported.

Test infrastructure: - XSim testbench templates (dotp_comp_tb, add_multi_comp_tb, mul_comp_map_tb) - Vivado TCL simulation scripts (dotp_comp, add_multi_comp, dotp) - Test runner scripts: run_tests.sh (21 core configs), run_dotp_comp_tests.sh (8 configs), run_add_multi_comp_tests.sh (8 configs) - Common test utilities (test_common.sh)

Complete 7-Series support with gate absorption optimization and fused accumulation. Add UltraScale/UltraScale+ target (reuses 7-Series primitives, Vivado maps CARRY4→CARRY8 transparently). Key Changes: - Implement 7-Series gate absorption (MuxCYPredAdder, MuxCYRippleSum) - Fix 7-Series fused accumulation and carry chain wiring - Fix compressor generation bugs (mul_comp_map indexing, N=1 passthrough, MuxCYAtom06) - Add UltraScale() target class and remove UltraScale+ restrictions - Remove RTL bitwidth restrictions: 2-3 bit networks now eligible for compressor path - Add BIPOLAR datatype guard (RTL doesn't support BIPOLAR) - Unified add_multi.sv generation for OOC synthesis - VVU template variable consistency (USE_COMPRESSOR, COMP_PIPELINE_DEPTH) All three FPGA families (Versal, 7-Series, UltraScale+) now fully supported.

…ale+ test coverage

sgerber-amd · 2026-04-23T17:24:50Z

LUT-based compressor tree integration for MVAU RTL backend

This PR integrates a LUT-based compressor tree generator into FINN's MVAU RTL backend, enabling efficient dot-product and multi-operand addition operations on 7-Series, UltraScale+, and Versal FPGAs.

Key features:

Multi-platform support: 7-Series, UltraScale+, and Versal FPGAs
Gate absorption: Integrates two-input gates (AND, XOR, etc.) into first compression stage
Custom input shapes: Generates compressors for arbitrary bit-column configurations, enabling compressor-based implementation for
all possible node configurations
Automatic pipelining: Configurable pipeline depth for timing closure
FINN integration: Automatically invoked during SpecializeLayers transformation based on certain parameters
Dual use cases: Full dot-product units and optimized multi-operand adders for DSP lanes

Usage:

The compressor dot-product variant becomes the default MVAU Node implementation for small bitwidths (weight and activation bitwidth <=4)
The compressor is also now used as the default for adding together the DSP lane outputs when the RTL DSP path is chosen
Further details can be found in the Documentation

Documentation:

See src/finn/compressor/README.md for implementation details and standalone usage
See src/finn/compressor/mvau_compressor_integration_flow.svg for the complete MVAU integration decision tree

Testing:

Standalone compressor tests: 21 configurations across all platforms
Integration wrapper tests: 8 dot-product configs + 8 add-multi configs per platform
FINN integration tests: Standard pytest suite includes MVAU Nodes which have the compressor path as their default implementation and thus can be used as verification

Results:
We compare MVAU nodes instantiated via the FINN flow in various configurations. Representative benchmark results comparing the compressor implementation against the HLS baseline. Metrics include LUT usage and critical path delay for specific MVAU Node configurations. Critical path delay is reported as the achievable delay found (WNS >= 0) via WNS-guided timing binary search.
We also compare a Compressor implementation that replaces a previous binary tree implementation for adding together SIMD partial outputs from the DSP Lanes when DSPs are used.

- Fixed bitwidth >= 2 check for both activations and weights - Remove 7-Series narrow weight clipping from tests - Update comments to reflect bitwidth ranges

…e. pre-commit changes. code cleanup of several testing scripts.

…ollisions

sgerber-amd force-pushed the pr/compressor-integration branch from 91b219d to 17e2e8d Compare April 21, 2026 12:34

sgerber-amd added 4 commits April 21, 2026 14:27

sgerber-amd force-pushed the pr/compressor-integration branch from 17e2e8d to 03bfca4 Compare April 21, 2026 13:31

sgerber-amd added 2 commits April 22, 2026 18:00

[Compressor Documentation] Documentation updates and extended UltraSc…

df2664a

…ale+ test coverage

[Style] Fix flake8 linting issues.

fb2b3e8

sgerber-amd and others added 3 commits April 27, 2026 10:19

[RTL MVU] Enforce minimum 2-bit bitwidth constraint

0281051

- Fixed bitwidth >= 2 check for both activations and weights - Remove 7-Series narrow weight clipping from tests - Update comments to reflect bitwidth ranges

Merge branch 'dev' into pr/compressor-integration

03a37e3

Improved pipelining capabilities for non-accumulation compressor usag…

c747ae9

…e. pre-commit changes. code cleanup of several testing scripts.

sgerber-amd marked this pull request as ready for review May 15, 2026 14:43

pre-commit

3627511

auphelia self-requested a review May 19, 2026 09:06

Use config-specific dotp module names to avoid potential multi-MVAU c…

732684f

…ollisions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr/compressor integration#1560

Pr/compressor integration#1560
sgerber-amd wants to merge 11 commits into
Xilinx:devfrom
sgerber-amd:pr/compressor-integration

sgerber-amd commented Apr 15, 2026 •

edited by auphelia

Loading

Uh oh!

sgerber-amd commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sgerber-amd commented Apr 15, 2026 • edited by auphelia Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgerber-amd commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LUT-based compressor tree integration for MVAU RTL backend

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sgerber-amd commented Apr 15, 2026 •

edited by auphelia

Loading

sgerber-amd commented Apr 23, 2026 •

edited

Loading