x86: optimize normalize with SIMD#6761
Open
crafcat7 wants to merge 2 commits into
Open
Conversation
Summary: Add packed Normalize regression coverage and a focused perf target for the new x86 path. This verifies packed layout behavior and small-value eps_mode cases without including the x86 implementation files in this commit. Changes: 1. Add test_packing_normalize and packed Normalize regression cases in tests/test_normalize.cpp 2. Add small-value eps_mode regression coverage for packed Normalize inputs 3. Register and add the perf_normalize benchmark target under tests/perf
Summary: Add an x86 Normalize override for fp32 packed execution on SSE/AVX/AVX512-capable builds. This accelerates the main packed Normalize branches while preserving the existing eps_mode behavior and generic semantics. Changes: 1. Add src/layer/x86/normalize_x86.h and src/layer/x86/normalize_x86.cpp for the x86 Normalize override 2. Enable support_packing and implement SIMD helpers for scalar, lanewise, and packed square-sum and scale operations 3. Optimize the across_spatial/across_channel branch combinations used by the generic Normalize implementation for elempack 1, 4, 8, and 16
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6761 +/- ##
==========================================
+ Coverage 95.69% 95.77% +0.08%
==========================================
Files 944 944
Lines 410509 410404 -105
==========================================
+ Hits 392834 393064 +230
+ Misses 17675 17340 -335 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Normalizeoverride insrc/layer/x86/normalize_x86.handsrc/layer/x86/normalize_x86.cpp.support_packingfor fp32 packed execution.Normalizebranches used by the current implementation:across_spatial=1, across_channel=1across_spatial=1, across_channel=0across_spatial=0, across_channel=1elempack=1/4/8/16on x86.Tests
tests/test_normalize.cpp.eps_moderegression cases.Benchmarks
[56,56,64]spatial=1 channel=0 shared=0 mode=0[56,56,64]spatial=0 channel=1 shared=0 mode=0[56,56,64]spatial=1 channel=1 shared=1 mode=0[28,28,128]spatial=1 channel=0 shared=0 mode=1[28,28,128]spatial=0 channel=1 shared=1 mode=1[28,28,128]spatial=1 channel=1 shared=0 mode=1[14,14,8,64]spatial=1 channel=0 shared=0 mode=2[14,14,8,64]spatial=0 channel=1 shared=1 mode=2[14,14,8,64]spatial=1 channel=1 shared=1 mode=2