Skip to content

refactor riscv gemm microkernel#6752

Open
nihui wants to merge 8 commits into
Tencent:masterfrom
nihui:opt-riscv-gemm
Open

refactor riscv gemm microkernel#6752
nihui wants to merge 8 commits into
Tencent:masterfrom
nihui:opt-riscv-gemm

Conversation

@nihui

@nihui nihui commented May 28, 2026

Copy link
Copy Markdown
Member

No description provided.

@github-actions github-actions Bot added the riscv label May 28, 2026
@tencent-adm

Copy link
Copy Markdown
Member

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter

codecov-commenter commented May 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.68345% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.78%. Comparing base (104edd7) to head (f2826ea).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/riscv/gemm_fp16sa.h 95.25% 57 Missing ⚠️
src/layer/riscv/gemm_fp16s.h 93.58% 25 Missing ⚠️
src/layer/riscv/gemm_bf16s_fp16s.h 87.03% 7 Missing ⚠️
src/layer/riscv/gemm_riscv_zfh.cpp 98.78% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6752      +/-   ##
==========================================
+ Coverage   95.69%   95.78%   +0.09%     
==========================================
  Files         944      947       +3     
  Lines      410509   412957    +2448     
==========================================
+ Hits       392834   395569    +2735     
+ Misses      17675    17388     -287     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nihui nihui marked this pull request as ready for review May 30, 2026 08:21
@nihui nihui changed the title [WIP] refactor riscv gemm microkernel refactor riscv gemm microkernel May 30, 2026
@nihui nihui requested a review from Copilot May 30, 2026 08:23
@nihui

nihui commented May 30, 2026

Copy link
Copy Markdown
Member Author

@codex review

@nihui nihui closed this May 30, 2026
@nihui nihui reopened this May 30, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@nihui

nihui commented May 30, 2026

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e0a6655f26

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/layer/riscv/gemm_riscv.cpp Outdated
Comment on lines +588 to +596
for (; jj + 7 < max_jj; jj += 8)
{
if (elempack == packn)
{
const int q = (j + jj) / packn * packn;
const int r = (j + jj) % packn;
const float* p0 = (const float*)B + k * B_hstep + q * packn + r * packn;

if (packn == 8)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle the 8-column tail when packn is 16

On RVV targets where csrr_vlenb() / 4 == 16, this new 8-column tail loop can run for transB == 0, B.elempack == packn, and an unpacked column count with an 8..15 remainder after the 16-wide blocks. The loop advances jj by 8, but the elempack == packn body only writes data for packn == 8 or packn == 4, so for packn == 16 pp is not advanced and those B columns are never packed; the following 4/2/1 tails cannot recover them, causing GEMM to read stale/misaligned BT data. Please add a packn == 16 path for this width or let the smaller tail loops handle it.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment on lines +595 to +597
Mat ATX(TILE_K * TILE_M, (K + TILE_K - 1) / TILE_K, nT, 2u, opt.workspace_allocator);
if (ATX.empty())
return -100;
@nihui nihui closed this Jun 6, 2026
@nihui nihui reopened this Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants