Skip to content

Fix int8 Convolution for flattened 1D input + 1x1 kernel#6777

Open
futz12 wants to merge 1 commit into
Tencent:masterfrom
futz12:ncnn2table-opt
Open

Fix int8 Convolution for flattened 1D input + 1x1 kernel#6777
futz12 wants to merge 1 commit into
Tencent:masterfrom
futz12:ncnn2table-opt

Conversation

@futz12

@futz12 futz12 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

The fp32 path of Convolution detects when bottom_blob.dims == 1 and kernel_w == kernel_h == 1 and redirects the computation to InnerProduct. However, the int8 inference paths (Convolution::forward_int8 and the architecture-specific forward_int8_*) did not have this fallback.

Expected broken behavior before this patch:

  • For a flattened input with num_input elements, bottom_blob.dims == 1 and bottom_blob.c == 1, but the actual number of input channels is num_input.
  • The int8 convolution loop iterated only over channels == 1, so most of the input values were ignored.
  • The output shape was wrong: it became dims=3 with w=num_input, h=1, c=num_output instead of the correct dims=1, w=num_output.

This patch adds the missing flattened-blob handling:

  • In the generic naive path (Convolution::forward_int8), dispatch to InnerProduct, mirroring the fp32 behavior.

  • In the optimized x86 / arm / mips / loongarch paths (forward_int8_x86 / forward_int8_arm / forward_int8_mips / forward_int8_loongarch), reshape the 1D blob to 3D (1x1xW), recurse into the same int8 forward function, and reshape the result back to 1D.

A regression test tests/test_convolution_5.cpp is added. It compares the fp32 reference output (via InnerProduct) against the int8 output for dims==1 inputs with 1x1 kernels. Before the fix the test fails because the int8 path produced an incorrect shape; after the fix it passes with max abs error below 0.01.

The fp32 path of Convolution detects when bottom_blob.dims == 1 and
kernel_w == kernel_h == 1 and redirects the computation to InnerProduct.
However, the int8 inference paths (Convolution::forward_int8 and the
architecture-specific forward_int8_*) did not have this fallback.

Expected broken behavior before this patch:
- For a flattened input with num_input elements, bottom_blob.dims == 1
  and bottom_blob.c == 1, but the actual number of input channels is
  num_input.
- The int8 convolution loop iterated only over channels == 1, so most
  of the input values were ignored.
- The output shape was wrong: it became dims=3 with
  w=num_input, h=1, c=num_output instead of the correct dims=1,
  w=num_output.

This patch adds the missing flattened-blob handling:

- In the generic naive path (Convolution::forward_int8), dispatch to
  InnerProduct, mirroring the fp32 behavior.

- In the optimized x86 / arm / mips / loongarch paths
  (forward_int8_x86 / forward_int8_arm / forward_int8_mips /
  forward_int8_loongarch), reshape the 1D blob to 3D (1x1xW), recurse
  into the same int8 forward function, and reshape the result back to
  1D.

A regression test tests/test_convolution_5.cpp is added. It compares
the fp32 reference output (via InnerProduct) against the int8 output
for dims==1 inputs with 1x1 kernels. Before the fix the test fails
because the int8 path produced an incorrect shape; after the fix it
passes with max abs error below 0.01.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f95ee53ec0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/layer/convolution.cpp
pd.set(0, num_output);
pd.set(1, bias_term);
pd.set(2, weight_data_size);
pd.set(8, int8_scale_term);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve requantized int8 outputs

When int8_scale_term > 100, the normal convolution int8 path creates a 1-byte top blob and applies top_blob_int8_scales before returning. This new flattened fallback delegates to InnerProduct, whose int8 implementation only dequantizes to 4-byte floats and never loads the convolution top scale, so portable builds now return fp32 for flattened requantized convolutions that downstream int8 layers expect to stay int8. Gate this fallback off for >100 or preserve the convolution requantization path.

Useful? React with 👍 / 👎.

int elembits = bottom_blob.elembits();

// flattened blob, implement as InnerProduct
if (bottom_blob.dims == 1 && kernel_w == 1 && kernel_h == 1)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard the flattened reshape by weight size

This unconditional branch also catches true 1-D convolutions where the model has one input channel (weight_data_size / num_output == 1) but the input width is >1. In that case reshaping to 1x1xw makes the optimized int8 path think there are w input channels and it indexes the packed weights for channels that were never transformed; the generic path avoids this with the bottom_blob.w * bottom_blob.elempack == num_input check, so add the same guard here and in the parallel arm/mips/loongarch copies.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant