Skip to content

Pr/compressor integration#1560

Open
sgerber-amd wants to merge 11 commits into
Xilinx:devfrom
sgerber-amd:pr/compressor-integration
Open

Pr/compressor integration#1560
sgerber-amd wants to merge 11 commits into
Xilinx:devfrom
sgerber-amd:pr/compressor-integration

Conversation

@sgerber-amd

@sgerber-amd sgerber-amd commented Apr 15, 2026

Copy link
Copy Markdown

More details in comment below.

@sgerber-amd sgerber-amd force-pushed the pr/compressor-integration branch from 91b219d to 17e2e8d Compare April 21, 2026 12:34
Port of compressor-python library for efficient low-bitwidth dot product
computation using LUT primitives instead of DSP blocks.

Architecture:
- Counter-based compressor trees
- Fused accumulation with constant propagation
- Target-specific primitive selection (CARRY4/CARRY8/LOOKAHEAD8)

FPGA Support:
- Versal: Fully functional
- 7-Series: Functional without fused accumulation and gate absorption (not ready for mvau integration)
- UltraScale/UltraScale+: Not yet implemented

Integration scripts for both dotp_comp and add_multi optimization modes included.

Implementation:
- Python-based compressor graph construction and optimization
- SystemVerilog template expansion for RTL generation
- mul_comp_map module for partial product broadcasting

This commit adds the generator infrastructure only. Integration with
FINN's RTL backend follows in subsequent commits.
Wire the compressor generator into FINN's RTL MVAU datapath, enabling
LUT-based dot product computation as an alternative to DSP blocks.

RTL Datapath Changes (finn-rtllib/mvu/):
- mvu_vvu_axi.sv: Add USE_COMPRESSOR parameter and conditional instantiation
- add_multi.sv: Add CATCH_COMP macro for generated compressor module instantiation
- mvu_vvu_axi_wrapper.v: Propagate COMP_PIPELINE_DEPTH parameter

FINN Backend Integration (matrixvectoractivation_rtl.py):
- Add compressor eligibility checks (_is_dotp_comp_eligible)
- Conditionally generate dotp_comp and add_multi compressor modules
- Include generated RTL files in build
- Propagate USE_COMPRESSOR and COMP_PIPELINE_DEPTH template variables

Versal MVAU can use compressor-based compute instead of DSP blocks.
7-Series and UltraScale+ not yet supported.
Test infrastructure:
- XSim testbench templates (dotp_comp_tb, add_multi_comp_tb, mul_comp_map_tb)
- Vivado TCL simulation scripts (dotp_comp, add_multi_comp, dotp)
- Test runner scripts: run_tests.sh (21 core configs), run_dotp_comp_tests.sh (8 configs), run_add_multi_comp_tests.sh (8 configs)
- Common test utilities (test_common.sh)
Complete 7-Series support with gate absorption optimization and fused
accumulation. Add UltraScale/UltraScale+ target (reuses 7-Series primitives,
Vivado maps CARRY4→CARRY8 transparently).

Key Changes:
- Implement 7-Series gate absorption (MuxCYPredAdder, MuxCYRippleSum)
- Fix 7-Series fused accumulation and carry chain wiring
- Fix compressor generation bugs (mul_comp_map indexing, N=1 passthrough, MuxCYAtom06)
- Add UltraScale() target class and remove UltraScale+ restrictions
- Remove RTL bitwidth restrictions: 2-3 bit networks now eligible for compressor path
- Add BIPOLAR datatype guard (RTL doesn't support BIPOLAR)
- Unified add_multi.sv generation for OOC synthesis
- VVU template variable consistency (USE_COMPRESSOR, COMP_PIPELINE_DEPTH)

All three FPGA families (Versal, 7-Series, UltraScale+) now fully supported.
@sgerber-amd sgerber-amd force-pushed the pr/compressor-integration branch from 17e2e8d to 03bfca4 Compare April 21, 2026 13:31
@sgerber-amd

sgerber-amd commented Apr 23, 2026

Copy link
Copy Markdown
Author

LUT-based compressor tree integration for MVAU RTL backend

This PR integrates a LUT-based compressor tree generator into FINN's MVAU RTL backend, enabling efficient dot-product and multi-operand addition operations on 7-Series, UltraScale+, and Versal FPGAs.

Key features:

  • Multi-platform support: 7-Series, UltraScale+, and Versal FPGAs
  • Gate absorption: Integrates two-input gates (AND, XOR, etc.) into first compression stage
  • Custom input shapes: Generates compressors for arbitrary bit-column configurations, enabling compressor-based implementation for
    all possible node configurations
  • Automatic pipelining: Configurable pipeline depth for timing closure
  • FINN integration: Automatically invoked during SpecializeLayers transformation based on certain parameters
  • Dual use cases: Full dot-product units and optimized multi-operand adders for DSP lanes

Usage:

  • The compressor dot-product variant becomes the default MVAU Node implementation for small bitwidths (weight and activation bitwidth <=4)
  • The compressor is also now used as the default for adding together the DSP lane outputs when the RTL DSP path is chosen
  • Further details can be found in the Documentation

Documentation:

  • See src/finn/compressor/README.md for implementation details and standalone usage
  • See src/finn/compressor/mvau_compressor_integration_flow.svg for the complete MVAU integration decision tree

Testing:

  • Standalone compressor tests: 21 configurations across all platforms
  • Integration wrapper tests: 8 dot-product configs + 8 add-multi configs per platform
  • FINN integration tests: Standard pytest suite includes MVAU Nodes which have the compressor path as their default implementation and thus can be used as verification

Results:
We compare MVAU nodes instantiated via the FINN flow in various configurations. Representative benchmark results comparing the compressor implementation against the HLS baseline. Metrics include LUT usage and critical path delay for specific MVAU Node configurations. Critical path delay is reported as the achievable delay found (WNS >= 0) via WNS-guided timing binary search.
We also compare a Compressor implementation that replaces a previous binary tree implementation for adding together SIMD partial outputs from the DSP Lanes when DSPs are used.

07_node_only_savings 07b_node_only_adp 11_versal_full_per_config 01_versal_full_dotp 24_addmulti_lookahead8_combined_savings 24b_addmulti_lookahead8_adp

sgerber-amd and others added 3 commits April 27, 2026 10:19
- Fixed bitwidth >= 2 check for both activations and weights
- Remove 7-Series narrow weight clipping from tests
- Update comments to reflect bitwidth ranges
…e. pre-commit changes. code cleanup of several testing scripts.
@sgerber-amd sgerber-amd marked this pull request as ready for review May 15, 2026 14:43
@auphelia auphelia self-requested a review May 19, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant