Skip to content

x86: optimize normalize with SIMD#6761

Open
crafcat7 wants to merge 2 commits into
Tencent:masterfrom
crafcat7:feat/x86-normalize
Open

x86: optimize normalize with SIMD#6761
crafcat7 wants to merge 2 commits into
Tencent:masterfrom
crafcat7:feat/x86-normalize

Conversation

@crafcat7

Copy link
Copy Markdown
Contributor

Summary

  • Added an x86 Normalize override in src/layer/x86/normalize_x86.h and src/layer/x86/normalize_x86.cpp.
  • Enabled support_packing for fp32 packed execution.
  • Covered the three generic Normalize branches used by the current implementation:
  • across_spatial=1, across_channel=1
  • across_spatial=1, across_channel=0
  • across_spatial=0, across_channel=1
  • Supported elempack=1/4/8/16 on x86.

Tests

  • Added packed-path regression coverage in tests/test_normalize.cpp.
  • Added small-value eps_mode regression cases.

Benchmarks

Case Baseline (ms/run) Optimized (ms/run) Speedup
[56,56,64] spatial=1 channel=0 shared=0 mode=0 0.1242 0.0236 5.25x
[56,56,64] spatial=0 channel=1 shared=0 mode=0 0.1412 0.0309 4.57x
[56,56,64] spatial=1 channel=1 shared=1 mode=0 0.0730 0.0245 2.99x
[28,28,128] spatial=1 channel=0 shared=0 mode=1 0.0607 0.0091 6.66x
[28,28,128] spatial=0 channel=1 shared=1 mode=1 0.0696 0.0121 5.76x
[28,28,128] spatial=1 channel=1 shared=0 mode=1 0.0293 0.0095 3.10x
[14,14,8,64] spatial=1 channel=0 shared=0 mode=2 0.0618 0.0096 6.46x
[14,14,8,64] spatial=0 channel=1 shared=1 mode=2 0.0705 0.0134 5.27x
[14,14,8,64] spatial=1 channel=1 shared=1 mode=2 0.0356 0.0097 3.65x

crafcat7 added 2 commits May 31, 2026 15:49
Summary:
  Add packed Normalize regression coverage and a focused perf target for the new x86 path. This verifies packed layout behavior and small-value eps_mode cases without including the x86 implementation files in this commit.

Changes:
  1. Add test_packing_normalize and packed Normalize regression cases in tests/test_normalize.cpp
  2. Add small-value eps_mode regression coverage for packed Normalize inputs
  3. Register and add the perf_normalize benchmark target under tests/perf
Summary:
  Add an x86 Normalize override for fp32 packed execution on SSE/AVX/AVX512-capable builds. This accelerates the main packed Normalize branches while preserving the existing eps_mode behavior and generic semantics.

Changes:
  1. Add src/layer/x86/normalize_x86.h and src/layer/x86/normalize_x86.cpp for the x86 Normalize override
  2. Enable support_packing and implement SIMD helpers for scalar, lanewise, and packed square-sum and scale operations
  3. Optimize the across_spatial/across_channel branch combinations used by the generic Normalize implementation for elempack 1, 4, 8, and 16
@codecov-commenter

codecov-commenter commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 67.16867% with 109 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.77%. Comparing base (104edd7) to head (a8870b5).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/x86/normalize_x86.cpp 67.16% 109 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6761      +/-   ##
==========================================
+ Coverage   95.69%   95.77%   +0.08%     
==========================================
  Files         944      944              
  Lines      410509   410404     -105     
==========================================
+ Hits       392834   393064     +230     
+ Misses      17675    17340     -335     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants