Fix int8 Convolution for flattened 1D input + 1x1 kernel#6777
Conversation
The fp32 path of Convolution detects when bottom_blob.dims == 1 and kernel_w == kernel_h == 1 and redirects the computation to InnerProduct. However, the int8 inference paths (Convolution::forward_int8 and the architecture-specific forward_int8_*) did not have this fallback. Expected broken behavior before this patch: - For a flattened input with num_input elements, bottom_blob.dims == 1 and bottom_blob.c == 1, but the actual number of input channels is num_input. - The int8 convolution loop iterated only over channels == 1, so most of the input values were ignored. - The output shape was wrong: it became dims=3 with w=num_input, h=1, c=num_output instead of the correct dims=1, w=num_output. This patch adds the missing flattened-blob handling: - In the generic naive path (Convolution::forward_int8), dispatch to InnerProduct, mirroring the fp32 behavior. - In the optimized x86 / arm / mips / loongarch paths (forward_int8_x86 / forward_int8_arm / forward_int8_mips / forward_int8_loongarch), reshape the 1D blob to 3D (1x1xW), recurse into the same int8 forward function, and reshape the result back to 1D. A regression test tests/test_convolution_5.cpp is added. It compares the fp32 reference output (via InnerProduct) against the int8 output for dims==1 inputs with 1x1 kernels. Before the fix the test fails because the int8 path produced an incorrect shape; after the fix it passes with max abs error below 0.01.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f95ee53ec0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| pd.set(0, num_output); | ||
| pd.set(1, bias_term); | ||
| pd.set(2, weight_data_size); | ||
| pd.set(8, int8_scale_term); |
There was a problem hiding this comment.
Preserve requantized int8 outputs
When int8_scale_term > 100, the normal convolution int8 path creates a 1-byte top blob and applies top_blob_int8_scales before returning. This new flattened fallback delegates to InnerProduct, whose int8 implementation only dequantizes to 4-byte floats and never loads the convolution top scale, so portable builds now return fp32 for flattened requantized convolutions that downstream int8 layers expect to stay int8. Gate this fallback off for >100 or preserve the convolution requantization path.
Useful? React with 👍 / 👎.
| int elembits = bottom_blob.elembits(); | ||
|
|
||
| // flattened blob, implement as InnerProduct | ||
| if (bottom_blob.dims == 1 && kernel_w == 1 && kernel_h == 1) |
There was a problem hiding this comment.
Guard the flattened reshape by weight size
This unconditional branch also catches true 1-D convolutions where the model has one input channel (weight_data_size / num_output == 1) but the input width is >1. In that case reshaping to 1x1xw makes the optimized int8 path think there are w input channels and it indexes the packed weights for channels that were never transformed; the generic path avoids this with the bottom_blob.w * bottom_blob.elempack == num_input check, so add the same guard here and in the parallel arm/mips/loongarch copies.
Useful? React with 👍 / 👎.
The fp32 path of Convolution detects when bottom_blob.dims == 1 and kernel_w == kernel_h == 1 and redirects the computation to InnerProduct. However, the int8 inference paths (Convolution::forward_int8 and the architecture-specific forward_int8_*) did not have this fallback.
Expected broken behavior before this patch:
This patch adds the missing flattened-blob handling:
In the generic naive path (Convolution::forward_int8), dispatch to InnerProduct, mirroring the fp32 behavior.
In the optimized x86 / arm / mips / loongarch paths (forward_int8_x86 / forward_int8_arm / forward_int8_mips / forward_int8_loongarch), reshape the 1D blob to 3D (1x1xW), recurse into the same int8 forward function, and reshape the result back to 1D.
A regression test tests/test_convolution_5.cpp is added. It compares the fp32 reference output (via InnerProduct) against the int8 output for dims==1 inputs with 1x1 kernels. Before the fix the test fails because the int8 path produced an incorrect shape; after the fix it passes with max abs error below 0.01.