Releases · Tencent/ncnn

26 May 13:57

20260526

e54f7b1

android harmonyos ios macos linux windows webassembly watchos tvos visionos 预编译库 20260526 e54f7b1 Latest

Latest

编译版本，默认配置，android-ndk-r29，ohos-sdk-5.0.3，xcode 16.4，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-harmonyos.zip	harmonyos 静态库/动态库	armeabi-v7a + arm64-v8a + x86_64
ncnn-harmonyos-vulkan.zip	harmonyos 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

重点概览

新增 HarmonyOS 预编译包发布流程，release 产物覆盖 CPU/Vulkan、静态/动态库，以及 armeabi-v7a、arm64-v8a、x86_64 三种架构。（#6746）
Vulkan 后端新增 SDPA/FlashAttention、RotaryEmbed、GroupNorm、Reduction、Unfold、Softplus、Shrink 等算子，并引入持久化 pipeline cache、mmap 模型加载、host memory 权重驻留和逐层权重上传，明显面向大模型与长启动耗时优化。（@futz12, @CLV-Iclucia, #6514, #6702, #6537, #6531, #6534）
x86 后端大规模补齐 bf16 storage 路径，覆盖 GEMM、Convolution、InnerProduct、Deconvolution、Pooling、Interp、归一化、激活、逐元素、量化/反量化等层，并增加 AVX512BF16 dispatch 和多项 micro-kernel 优化。（#6598, #6624, #6626, #6680）
ARM 后端新增 ARM SDPA 实现，并为 ARMv8.4 BF16 优化 GEMM、Convolution im2col-GEMM、InnerProduct、MultiHeadAttention；同时补齐 ERF、ELU、GELU、SELU 的 NEON/fp16 SIMD 实现。（@Abandon-ht, @futz12, #6698, #6714, #6715, #6716, #6717, #6605）
RISC-V RVV、MIPS MSA、LoongArch LSX/LASX 后端继续扩展，重点补齐 packed convolution/deconvolution、GEMM、量化/反量化、常见 unary/binary op、bf16/int8 和 4D Mat 支持。（#6662, #6740, #6636, #6658, #6695）
pnnx 支持 .npy 真实输入、输出 FLOPS/memory OPS 统计，兼容 PyTorch 2.10/2.11，修复非对称 padding + conv 融合、Conv2d padding tuple 归一化、Erf 表达式落层等转换问题。（@MollySophia, @Yeuvoir, @crafcat7, #6700, #5836, #6592, #6701, #6694）
新增 benchncnn_llm 和算子级 perf 基础设施，benchmark 覆盖 LLM prefill/decode 与更多 CPU/GPU 算子性能回归。（#6711, #6570, #6632）

Vulkan / GPU

新增 Vulkan SDPA 层和 FlashAttention 路径，基础实现包含 2x2 unroll 和 local memory 优化，后续补齐统一 cross-attention shader、cooperative matrix 与非 cooperative matrix 两套 FlashAttention 实现，支持 mask、KV cache concat 和 chunk 化输出调度。（#6514, #6521, #6528, #6538）
GEMM/SDPA cooperative matrix 和 subgroup 路径继续优化，加入 bf16/fp16 cooperative matrix、4x4 unroll、向量化加载、bank conflict 规避与 packed GEMM；后续限制 bf16 cooperative matrix 用法，避免不匹配的数据布局。（@futz12, #6515, #6524, #6573, #6632）
新增 Vulkan RotaryEmbed、GroupNorm、Reduction、Unfold、Softplus、Shrink 算子，减少 Transformer、norm、shape 处理和常见激活在 GPU 图中的 CPU fallback。（@futz12, #6519, #6556, #6476, #6543, #6478, #6479）
Convolution、Convolution 1x1s1d1、Convolution GEMM、Convolution1D、Deconvolution、Deconvolution GEMM 的多个 pack1/pack4/pack1to4/pack4to1 shader 合并为统一 packed elempack shader，通过 specialization 控制输入输出 packing，减少 shader 和 pipeline 组合数量。（#6561, #6562, #6565, #6566, #6564, #6572）
Conv1D Vulkan 在 fp16 条件下为 1x1s1d1 和 GEMM 路径增加 cooperative matrix，并将权重重新打包为 tile layout，以提升大通道 Conv1D 吞吐。（@futz12, #6587）
新增持久化 pipeline cache，PipelineCache 可保存/加载单文件 cache，记录设备、驱动、pipelineCacheUUID、shader hash、SPIR-V 和 driver pipeline cache 校验；C API 同步增加接口，并新增测试和开发文档。（@futz12, @CLV-Iclucia, #6702）
模型加载新增只读 mmap 路径，Option::use_mapped_model_loading 可减少大模型加载时的一次文件读入拷贝，并校验消费字节数与文件大小一致，失败时回退普通文件读取。（#6537）
新增 Vulkan 权重 host memory 加载策略，Option::use_weights_in_host_memory 可在支持 VK_EXT_external_memory_host 或 host-visible device memory 时将权重驻留 host/shared VRAM；Windows 下改用 shared VRAM 以符合 WDDM 行为。（#6531, #6545, #6547）
模型权重上传改为逐层执行，load_model 在每层 load_model/create_pipeline 后立即上传，并在待上传数据过大时提交并 reset transfer command，降低大模型加载峰值 CPU 内存和 staging buffer 占用。（#6534）
针对 Resizable BAR 优化权重上传，离散 GPU 若 device-local heap 同时 host-visible，则优先分配可映射 device-local 权重内存，减少 staging copy。（#6536）
VkMat / allocator 记录 memory_type_index，设备可判断 buffer 是否 device-local；GEMM 在常量 A/B 位于非 device-local 内存时先 clone 到 device-local，兼顾 host-memory 权重省内存与热点 GEMM 读带宽。（#6581）
packed shape hint 下沉到 Net 加载阶段，依据 shape hint、packing layout 和 fp16/bf16 选项提前计算 packed bottom/top shape，提升 Vulkan layer 创建 pipeline 时的 shape 一致性。（#6553）
Vulkan forward 长命令支持自动分段提交，按 pending dispatch 数和 GPU rough score 阈值提交 command buffer，减少大图或慢 GPU 上的驱动 timeout 风险。（#6541）
模型加载时会清理设备不支持的 Vulkan bf16 packed/storage 选项，避免后续生成非法 shader。（#6522）
Vulkan 扩展启用逻辑补齐依赖关系，对 external memory、8/16bit storage、descriptor indexing、buffer device address、Android hardware buffer 等扩展按前置能力过滤，减少驱动能力误报导致的初始化问题。（#6705）
Qualcomm/Adreno GPU 暂时禁用 KHR/NV cooperative matrix，规避当前硬件/驱动对 ncnn tile unroll 支持不足的问题。（#6719）
修复和兼容 SwiftShader memory type bits、MoltenVK half shader 类型、Reduction fp16 subgroup 扩展声明、llvmpipe atan2(0,0) 结果等 Vulkan 驱动差异。（@NKID00, #6539, #6602, #6615, #6729）
DeepCopy、Normalize、InnerProduct、InstanceNorm、LayerNorm、RMSNorm、Scale、PReLU、ShuffleChannel、Padding 等 Vulkan 路径补充更多 4D Mat 处理，减少 4D 输入回退或 shape 错误。（#6737）

x86 CPU 后端

新增 AbsVal_x86，支持 fp16/bf16 storage，减少 16-bit storage 图中的 fp32 往返转换。（#6584）
LayerNorm、RMSNorm、UnaryOp、BinaryOp 增加 x86 bf16 storage 和 AVX512BF16 dispatch，归一化和逐元素算子在 bf16 模型中更少 fallback。（#6585, #6586, #6588, #6591）
Concat、Slice、Flatten、Reshape、Crop、Padding、Packing 支持 x86 fp16/bf16 storage，使 shape/data movement 层不再强制回到 fp32。（#6593）
BatchNorm、GroupNorm、InstanceNorm、Clip、ReLU、Sigmoid、PReLU、Scale、Swish、Softmax、RotaryEmbed、Tanh、SELU、Mish、HardSwish、HardSigmoid、GELU、ERF、ELU、Eltwise、Dropout、Quantize、Dequantize、BNLL 等补齐 bf16 storage。（#6594, #6595, #6589, #6624）
GEMM、Convolution、InnerProduct、Deconvolution、Convolution1D、Pooling、Interp 全面扩展 x86 bf16 storage，GEMM 增加 out_elemtype，MultiHeadAttention 和 SDPA 可复用 bf16 路径。（#6598, #6623, #6625, #6626, #6627, #6630, #6648, #6649）
AVX512BF16 GEMM 和 Convolution bf16s micro-kernel 继续优化，包括针对 AMD Zen 5 将部分 vpalignr 改为 vpshufd 以避开与 vdpbf16ps 的端口冲突、增加 16x16 kernel 指令调度、N tile x16 和 convolution unroll 16。（#6609, #6673, #6680）
优化 x86 int8 GEMM、InnerProduct 和 Depthwise Convolution 的 SSE4.1 路径，提升 int8 packed/depthwise 推理性能。（@Edwardssss, #6600, #6687）
优化 x86 fp16s InnerProduct GEMM，降低 loop-carried stalls。（@Edwardssss, #6682）
Interp、ERF/GELU、RotaryEmbed、PixelShuffle 增加或优化 SIMD 实现，覆盖 resize、激活、LLM rotary embedding 和 block transpose 场景。（@futz12, @crafcat7, #6597, #6604, #6427, #6690）
DeformableConv2D 和 Deconvolution 改为 unified elempack packed 实现，减少 pack1/4/8/16 多套分支文件。（#6567, #6568）
修复 i386 上 x86 bf16 GEMM packing 顺序、x86 临时 buffer 对齐导致的 ASAN 报错，以及 SSE ShuffleChannel 最后通道处理越界读。（@junwha, #6708, #6703, #5735）

ARM CPU 后端

新增 ARM SDPA layer 实现，内部复用 GEMM + Softmax，覆盖 attention mask 和 KV cache 场景，使 ARM CPU 上 Transformer 注意力路径更完整。（@Abandon-ht, #6698）
ARMv8.4 BF16 优化 GEMM、Convolution im2col-GEMM、InnerProduct、MultiHeadAttention，支持 BF16 指令的 CPU 上核心矩阵乘、卷积和注意力层可直接走 bf16 storage。（#6714, #6715, #6716, #6717）
ERF、ELU、GELU、SELU 增加 ARM SIMD 实现，并补充 fp16 asimdhp 版本，常见激活函数在 NEON/fp16 storage 路径上更快。（@futz12, #6605）
优化 AArch64 exp_ps 和 fp16 exp_ps floor step，减少依赖 exp 的激活和 softmax 类计算开销。（@crafcat7, #6657, #6659）
x86/ARM GEMM 增加 m == 1 优化，覆盖 batch=1、decode、单 token 推理等低延迟场景。（#6723）
修复 Windows ARM 构建问题，并重构 ARM bf16 逻辑以绕过 OHOS clang aarch64 crash。（#6699, #6725）
ARM ShuffleChannel 最后通道处理修复越界读，与 x86 同步补充测试。（@junwha, #5735）

RISC-V / MIPS / LoongArch 后端

RISC-V 新增 fp16 storage GEMM，Gemm_riscv 可根据 ZFH/ZVFH 能力启用 fp16 storage，常量 A/B 支持 16-bit 预打包，减少 fp32 中间存储和转换。（@Xinyu302, #5311）
RISC-V 新增 DeformableConv2D RVV 实现，覆盖 pack1、packn、pack1ton、packnto1 路径，相比 scalar 实现提速约 12.94x 至 20.16x。（@chenglimin, #6540）
RISC-V RVV 批量补齐 Softplus、Exp、Log、Power、Shrink、Threshold、Dropout fp16 等算子实现，新增 fp32 和 ZFH fp16 路径，并补充 Exp/Log/Threshold 测试。（@ihb2032, #6635, #6637, #6638, #6666, #6671, #6676, #6667）
RISC-V RVV 1.0 新增 Quantize、Dequantize、Requantize 实现，支持 packn/int8 packn、per-tensor/per-channel scale、fp16 storage 输入或输出，Requantize 支持 ReLU/LeakyReLU 融合量化路径。（@Deepdive543443, #6636, #6658, #6695）
RISC-V packed convolution/deconvolution 统一实现，删除多个 packn/pack1ton/packnto1 专用头文件，改为 convolution_packed*.h 和 deconvolution_packed*.h 统一调度。（#6731）
RISC-V im2col GEMM 和 Winograd convolution 统一 elempack 优化，新增统一的 convolution_im2col_gemm*.h 和 convolution_3x3_winograd*.h，替换旧的 1x1/sgemm/winograd 分裂实现。（#6740）
MIPS 新增 ELU、Erf、GELU、SELU 的 MSA 实现，避免这些激活层退回通用标量路径。（@futz12, #6607）
MIPS 后端大规模优化，新增/重构 MSA 路径，覆盖 absval、batchnorm、binaryop、bnll、concat/slice/reshape/packing/padding、convolutio...

Contributors

proydakov, bkmgit, and 32 other contributors

Assets 44

ncnn-20260526-android-shared.zip

sha256:123d5bd837c3af570d64529402bf484817ded4837384d88ed711a0eaa83635fe

17.4 MB 2026-05-26T13:57:02Z
ncnn-20260526-android-vulkan-shared.zip

sha256:eb205b332274974511890903828451ae7a4c19c309f21431536e0a8c9f3dd0c1

23.2 MB 2026-05-26T13:57:02Z
ncnn-20260526-android-vulkan.zip

sha256:26909c92eed35afed4a966b5e9e503fcb0a529691ea3f910ec2c94a4fff52804

31.4 MB 2026-05-26T13:57:02Z
ncnn-20260526-android.zip

sha256:85b18b875488585c2d21360430e0e54abb6c04aa88094b471c20208ab55ff796

20.1 MB 2026-05-26T13:57:02Z
ncnn-20260526-apple-vulkan.zip

sha256:086adee198a6e0a4b24f64f35bd263e41753299d08dabf3d6502553a19fa5bad

92.4 MB 2026-05-26T13:57:02Z
ncnn-20260526-apple.zip

sha256:bfd7188f0eda2c273c945496aaa9cd6eff5bea2a98f04c0200e37bb586a0a0bd

67.7 MB 2026-05-26T13:57:02Z
ncnn-20260526-full-source.zip

sha256:754659d6fe65545cf2ef4483ffb84526fea631f8764c44b150f1601d0fb4004b

22.2 MB 2026-05-26T13:57:02Z
ncnn-20260526-harmonyos-shared.zip

sha256:b64653bae9aeb7970f44bc18279157305936918aa29044a2677316eadcd6cb1c

8.19 MB 2026-05-26T13:57:02Z
ncnn-20260526-harmonyos-vulkan-shared.zip

sha256:e435c3261f1fc904b71b97b8c9ca29170c1dba6be42a17bd16c966e965e877eb

11.5 MB 2026-05-26T13:57:02Z
ncnn-20260526-harmonyos-vulkan.zip

sha256:6fd3bb091c66b27f3a50cdeecd8c96135059bbce997324570abe3eb74c6838ac

16.9 MB 2026-05-26T13:57:02Z
Source code (zip)

2026-05-26T11:30:27Z
Source code (tar.gz)

2026-05-26T11:30:27Z

13 Jan 03:28

github-actions

20260113

e956fbf

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20260113 e956fbf

编译版本，默认配置，android-ndk-r29，xcode 16.4，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

新增sdpa layer和pnnx torch.scaled_dot_product_attention的转换，支持gqa合并
新增rotaryembed layer
sdpa支持kvcache
multiheadattention支持kvcache
layer可选实现support_vulkan_packing
layer可选实现support_vulkan_any_packing
vulkan支持bf16开关，支持旧显卡模拟转换bf16
rmsnorm vulkan优化(@futz12)
selu vulkan优化(@futz12)
vulkan eltwise统一elempack shader
简化vulkan cast
改善M较小时在N上切块的多线程调度
gemm x86 avx512采用N维度16切块优化
sdpa x86使用gemm和softmax优化(@futz12)
arm neon数学函数优先使用fma指令优化(@Abandon-ht)
unaryop tan rvv优化(@ihb2032 @lyd1992)
新增cmake NCNN_WINXP开关，不再主动定义_WIN32_WINNT宏
c-api新增ncnn_version_number()接口返回数值
c-api新增更多option setter getter接口
net加载模型接口新增wchar_t参数类型
新增float8和bfloat8转换函数(@chloeee99)
格式化glsl文件
删除shader注释和额外的空格
不再编译onnx2ncnn
benchncnn内置模型param，运行时不再需要param文件
修复modelwriter访问空bias数据崩溃问题(@csukuangfj)
修复param解析时尝试对已读取数据再次读取的逻辑错误(@futz12)
修复softmax多线程尾部余数的错误，优化倒数计算(@futz12)
修复msvc编译器x86 lstm int8开启vnni指令集时计算错误
修复x86 lstm int8越界读写问题
修复加载模型param出错时的退出逻辑(@Cat-myq)
修复vulkan驱动返回无效subgroup size导致加载卡死的问题(@Cat-myq)
修复加载模型时CRLF行尾解析逻辑错误(@chennevwin)
修复sdpa单通道attnmask的处理逻辑
修复ncnn2int8对仅有反量化的输出int8 scales保存崩溃问题
modelwriter支持tile层
ncnn2mem支持新的数组和字符串类型
x86上引用传参simd寄存器类型，函数无法接受对齐类型传值
检查gpu显存分配失败错误，返回错误码(@Upliner)
simplevk支持查找高通windows vulkan驱动文件(@strongtz)
simplevk支持apple平台动态加载vulkan驱动
simplevk支持VK_DRIVER_FILES环境变量加载vulkan驱动
禁用windows amd rdna2驱动的cooperative matrix软件模拟功能以提升性能
glslang更新到20260109
适配新windows-sdk的更多arm处理器特性检查功能
更新pybind 3.0.1，修复python-3.14使用pyncnn崩溃问题
更新pnnx到torch-2.9，支持onnx external data，支持dynamo-exported onnx
pnnx支持转换torch.shrink Tensor.unflatten torch.flatten
pnnx转换torch.flatten到ncnn支持多动态维度
pnnx支持转换F.interpolate nearest-exact
pnnx修复转换Tensor.expand到ncnn缺失的repeats
pnnx支持转换onnx gelu groupnorm rmsnorm gridsample
pnnx支持合并更多transformer attention变种
pnnx支持合并更多sdpa attention变种
pnnx支持合并更多rmsnorm变种
pnnx合并连续permute，删除无用的permute
pnnx添加deepseek_v3和qwen2 attention转换测试
pnnx合并非interleaved和更多的interleaved rope模块
pnnx合并t5风格的无gamma layernorm
pnnx总是删除contiguous，view统一转为reshape
pnnx转换onnx reshape丢弃allowzero参数
pnnx修复onnx旧版opset模型的部分shape折叠
pnnx修复折叠的常量输入丢弃逻辑
pnnx修复onnx padding非常量数值的转换
pnnx修复转换torch.stack负数axis越界崩溃问题
pnnx支持转换onnx动态resize
pnnx合并相同常量为一个
pnnx改善paddle风格的tensor.size模式
pnnx改善合并whisper风格的attention
pnnx自动从onnx模型中获取输入shape
pnnx生成的推理代码在自动shape时生成有效shape
pnnx改善pnnx.py中浮点数的表示方式
pnnx转换onnx模型不再输出无用的open failed警告
pnnx在pnnx.py中生成export_pnnx和export_ncnn工具函数
pnnx检查import xxx_pnnx路径，跳过目录检查(@glenn-jocher)
修复pnnx windows编译
ppocrv5分割英文文本时保留空格(@sxj731533730)
修复whisper例子中ffmpeg命令错误(@quink-black)
whisper截断音频时长到30秒
新增arcface示例(@heabeounMKTO)
gpu单元测试丢弃shape hint测试的pipeline缓存减少gpu显存占用
删除无用的testutil layer hook功能
新增gemm oom单元测试
ci比较二进制任务改用pull_request触发
ci修复windows-xp编译，统一workflow文件
ci更新mingw工具链下载地址
ci asan任务优化存储占用
ci新增aarch64 asan任务
ci更新macos-13到macos-15-intel(@Willaaaaaaa)
ci更新windows-sdk和swiftshader
删除已停用的tencent ci(@mpj1234)
更新onnx模型转换文档
readme添加8bit量化文档链接(@mlbo)
编译步骤增加make install(@roachsinai)
添加打印VkMat内容的文档
新增Arduino UNO Q性能数据(@SimoSbara)
发布linux riscv64的python wheel
发布macos arm64的python pypy wheel

New Contributors

@sxj731533730 made their first contribution in #6350
@mpj1234 made their first contribution in #6355
@glenn-jocher made their first contribution in #6379
@Abandon-ht made their first contribution in #6393
@Cat-myq made their first contribution in #6383
@heabeounMKTO made their first contribution in #6386
@SimoSbara made their first contribution in #6454
@ihb2032 made their first contribution in #6460
@chennevwin made their first contribution in #6472
@0130w made their first contribution in #6286
@chloeee99 made their first contribution in #6495

Full Changelog: 2025091...2026011

Contributors

Upliner, csukuangfj, and 18 other contributors

Assets 40

16 Sep 02:38

github-actions

20250916

c4193aa

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20250916 c4193aa

编译版本，默认配置，android-ndk-r28c，xcode 15.2，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

新增flip算子和pnnx torch.flip的转换
clip x86 avx512循环剩余优化
tanh和unaryop x86 avx512循环剩余优化(@lfalive)
sigmoid x86 avx512循环剩余优化(@futz12)
instancenorm x86优化(@futz12)
groupnorm x86 sse2/avx/avx512优化
groupnorm arm neon优化(@mmyyy22)
sigmoid和部分数学函数 loongarch lsx/lasx 优化(@AtomAlpaca)
shufflechannel riscv rvv/zfh/zvfh/xtheadvector优化(@AtomAlpaca)
layernorm riscv rvv/zfh/zvfh/xtheadvector优化(@Deepdive543443)
layernorm vulkan优化(@futz12)
使用size_t类型改善超大尺寸tensor的支持
修复x86 convolution int8 在启用avx512vnni时崩溃
修复android asset datareader在新android系统和部分手机上崩溃的问题
初始化layer featmask为空
简化layernorm naive c实现
修复convdw int8 dequantize pack8
使用putenv和平台相关api修复llvm-mingw编译问题(@zhuzeitou)
使用combine_x用于sse/avx vector拼接
修复rnn/lstm/gru int8测试因rounding导致的差异
更新ruapu探测risc-v zfh zvfh xtheadvector和动态分发
删除已废弃的 Extractor::set_num_threads/set_vulkan_compute api
修复cmake时编译器支持avxvnniint16的探测
修复windows nt内核不存在GetLogicalProcessorInformationEx时的崩溃问题(@futz12)
cmake find_package(ncnn)支持指定最低版本并输出ncnn版本号(@Willaaaaaaa)
benchncnn跑GPU时跳过int8模型(@c8s-wk)
支持Windows XP目标平台编译和新增msvc/mingw/clang的windows-xp ci(@AtomAlpaca @Sugar-Baby)
修正ppocr中ctc decode后处理规则(@futz12)
改善benchncnn中printf对size_t类型格式化参数兼容(@whyb)
更新glslang
最大支持gpu数量到32(@tpoisonooo)
支持nvidia headless vulkan
检测vulkan扩展 VK_KHR_shader_integer_dot_product VK_KHR_shader_bfloat16 VK_KHR_shader_float_controls2 VK_NV_cooperative_vector VK_NV_cooperative_matrix2 VK_EXT_shader_float8 VK_KHR_vulkan_memory_model
支持任意 cooperative matrix MNK size查询
修复vulkan-sdk支持VK_KHR_acceleration_structure时的编译错误
1d/2d Mat和VkMat总是会分配对齐的size，调整cstep策略
删除全部layer中vkimagemat类型输入输出的forward实现
删除layer support_image_storage和option use_image_storage字段
删除全部layer中pack8 shader实现
支持无graphics queue的vulkan驱动
vulkan fp16 packed在pack1时也使用半精度存储
跳过vulkan 1d权重上传前的cpu-pack
修复当cpu支持avx512时vulkan上传pack16数据的错误
修复有关localsize非subgroupsize倍数的vulkan validation error
总是将localsize设为subgroupsize的整数倍
合并khr/nv双版本cooperative matrix shader
vulkan convolution 1x1s1d1支持任意mnk size和统一elempack
vulkan convolution gemm支持任意mnk size和统一elempack
vulkan convolution winograd支持任意mnk size和统一elempack
vulkan deconvolution gemm支持任意mnk size和统一elempack
vulkan gemm支持任意mnk size
vulkan absval 统一elempack shader
vulkan sigmoid和激活函数统一elempack shader(@futz12)
vulkan unaryop 统一elempack shader(@weikangqi)
支持vulkan int8 packing/quantize/dequantize/requantize
检测vulkan扩展 VK_EXT_robustness2 VK_KHR_robustness2 调整ssbo对齐尺寸，修复nvidia新驱动中waitfence -4问题(@Upliner)
pnnx增强huggingface/transformers的attention/sdpa变种转换，有 albert bart bert blenderbot camembert chinese clip ctrl deberta distilbert electra flaubert fsmt funnel gpt2 layoutlm longformer lxmert marian mbart mobilebert mt5 openai pegasus prophetnet reformer roberta squeezebert t5 xlm xlnet
pnnx增强ppocrv5 onnx转换
pnnx支持转换onnx MaxPool auto_pad same
pnnx支持转换torch.reshape_as
pnnx新增logical_and/not/or/xor测试
pnnx总是为test_inference()生成有效的静态shape
pnnx自动处理conv/convtranspose/linear中的weight norm转换
pnnx匹配更多pad-conv模式
pnnx支持onnx flatten无axis参数的转换
pnnx修复onnx groupnorm转换
pnnx修正生成python脚本时inputshape越界崩溃问题
pnnx转ncnn时处理batch index相关的squeeze/unsqueeze
pnnx转ncnn不再删除模型末尾的reshape/permute
pnnx在windows上设置codepage utf8解决乱码问题
支持OMP_THREAD_LIMIT环境变量约束pnnx转换onnx模型的线程数
pnnx更新torch-2.8
FAQ新增ncnn deepwiki链接(@tpoisonooo)
更新readme有关cpu/gpu兼容性的表格
更新中文glsl扩展文档(@chri321)
更新glsl文档中去除废弃的image function(@GIBEREZ)
修正esp32编译文档中的命令错误(@Willaaaaaaa)
使用spdx风格的文件头协议
新增yolo11示例
新增yoloworld示例
新增ppocrv5示例
新增piper-tts示例
禁用pypi free threading wheel编译
迁移gpu swiftshader/lavapipe ci到ubuntu25
在self-hosted runner上使用预置的codecov二进制程序
ci更新riscv spacemit工具链和qemu
ci更新riscv xuantie工具链和qemu
迁移msvc ci到windows-2022并运行时安装vs2015/vs2017(@bil0077)
vs2015/vs2017 ci使用旧版本windows sdk修复编译
ci修复linux riscv64依赖缺失错误(@Jzow)
新增AK3918(AK)和SS928(hisi)的cmake交叉编译配置(@chentyjpm)
ci新增linux riscv32和c907交叉编译配置(@YuzukiTsuru)
新增MUSE Pi Pro Spacemit M1性能数据(@ChinaYingXi)

New Contributors

@ChinaYingXi made their first contribution in #6074
@lfalive made their first contribution in #6096
@zhuzeitou made their first contribution in #6101
@mmyyy22 made their first contribution in #4080
@Jzow made their first contribution in #6133
@chri321 made their first contribution in #6162
@Willaaaaaaa made their first contribution in #6165
@c8s-wk made their first contribution in #6174
@weikangqi made their first contribution in #6179
@Copilot made their first contribution in #6204
@bil0077 made their first contribution in #6210

Full Changelog: 2025050...2025091

Contributors

Upliner, whyb, and 18 other contributors

Assets 40

03 May 09:58

github-actions

20250503

305837f

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20250503 305837f

no new features from 20250428
fix blacklist for amd radv coopmat
workaround for qcom adreno turnip

Assets 40

28 Apr 12:38

github-actions

20250428

205ca50

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20250428 205ca50

编译版本，默认配置，android-ndk-r28b，xcode 15.2，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

x86 convolution int8 gemm xop/avx2/avx512/avx512vnni/avxvnni/avxvnniint8优化
risc-v eltwise rvv优化(@xfan1024)
risc-v bias rvv优化(@AtomAlpaca)
risc-v bnll rvv优化(@AtomAlpaca)
risc-v celu rvv优化(@AtomAlpaca)
重构reduction/quantize/dequantize/requantize减小二进制体积
softmax支持4维输入计算，优化支持任意elempack
param模型文件支持字符串类型的参数值，支持自然写法的数组
reshape支持表达式动态shape，不再支持reshape内permute参数
crop支持表达式动态slice
interp支持表达式动态output size
设置openmp环境变量解决多个openmp冲突问题和绑核失败问题
防止cmake编译器检测的优化导致的误识别
改善windows7+系统中的cpu大小核检测(@futz12)
修复HarmonyOS NEXT get_elf_hwcap返回空的问题(@peerless2012)
添加apple a18和m4系列的cpu型号识别
修复在cpu l2<1M时可能的convolution int8 gemm计算错误
修复android编译可能的重定义VK_USE_PLATFORM_ANDROID_KHR问题
修复mips/loongarch/risc-v架构开启simplemath编译问题
extractor clear()重置local allocator为0，修复多次clear的警告问题
绕过nvidia新驱动下padding可能的程序卡死问题
ncnn2table量化校准工具支持读取npy数据(@wxqwinner)
ncnn2int8量化工具默认使用fp16类型保存权重
改善ncnn单元测试编译耗时
优化cmake ncnn_add_layer()的vulkan shader收集性能(@maxint)
更新最新glslang
glsl编译时自动定义设备相关属性和特性的宏
glsl编译时自动定义ncnn_glsl_version宏
option结构体添加vulkan device index
清理目前comp源码中通用的扩展适配，已自动定义
支持在vulkan shader代码中使用NCNN_LOGE函数打印
检测vulkan扩展 VK_EXT_subgroup_size_control/VK_KHR_shader_subgroup_extended_types/VK_KHR_zero_initialize_workgroup_memory/VK_KHR_shader_subgroup_rotate/VK_EXT_shader_atomic_float/VK_EXT_shader_atomic_float2/VK_KHR_shader_non_semantic_info
修复risc-v ci c906/908/910/k1静态链接libgomp的错误
支持cmake-4.0编译
pip编译pyncnn支持使用系统cmake和ninja(@mgorny)
pnnx更新到torch-2.7
pnnx重构pass-level1代码，实现更快速编译
pnnx支持从tnn模型转换到pytorch/ncnn
pnnx支持转换onnx非对称Conv/Depth2Space，添加pixelshuffle/pixelunshuffle单元测试
pnnx支持转换GlobalAvgPool/ReduceL1/ReduceL2，添加onnx adaptive avg/max pool和norm单元测试
pnnx优化expression中的类型转换
pnnx转换gelu fast模式
pnnx兼容默认sdpa scale参数到ncnn的转换
pnnx合并wav2vec风格的mha
pnnx修复conv+bias合并中bias shape错误问题
pnnx修复instancenorm2d/instancenorm3d的num_featuers错误问题
pnnx转ncnn正确处理squeeze/unsqueeze引起的batch轴变动
pnnx转ncnn正确处理reshape引起的batch轴变动
pnnx修复i64转ncnn失效的问题(@Baiyuetribe)
重构qnx cmake工具链(@zchrissirhcz)
ci更新llvmpipe版本
ci新增自动对比库二进制大小
ci迁移到ubuntu-latest，维持code-format在ubuntu-20.04
ci使用apt中的qemu
ci更新riscv64 thead工具链
ci更新riscv elf工具链
修复apple平台glslang打包，移除glslang-default-resource-limits依赖
python预编译包发布armv7l neon版本，移除ppc64le和s390x预编译包，修复sdist编译
添加缺失的协议文件头(@erquren)
更新readme yolov8链接(@whyb)
更新文档不再要求安装vulkan-sdk
更新文档有关docker中无法使用vulkan的解决办法
更新yolov8检测/分割/姿态估计/分类/旋转目标检测示例代码和模型
新增macbook-air-m3性能数据(@chainsx)
新增orion o6性能数据(@TheSnowfield)

New Contributors

@mgorny made their first contribution in #5899
@erquren made their first contribution in #5925
@futz12 made their first contribution in #5927
@wxqwinner made their first contribution in #5930
@TheSnowfield made their first contribution in #5943
@peerless2012 made their first contribution in #5951
@AtomAlpaca made their first contribution in #6005

Full Changelog: 2024122...2025042

Contributors

maxint, mgorny, and 11 other contributors

Assets 40

26 Dec 03:27

github-actions

20241226

5285895

android ios macos linux windows webassembly watchos tvos visionos 预编译库 20241226 5285895

编译版本，默认配置，android-ndk-r27c，xcode 15.2，ubuntu-20.04，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64 + riscv64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

embed 支持int8量化
gemm 支持int8量化
multiheadattention 支持int8量化
新增spectrogram和inverse spectrogram实现
arm rmsnorm neon优化
arm layernorm neon fp32/bf16s/fp16s优化
x86 rmsnorm sse2/avx/avx512优化
x86 layernorm sse2/avx/avx512优化
x86 gemm int8 sse2/xop/avx/avx512/vnni/vnniint8优化
更新riscv vector标准到1.0，重写全部ncnn riscv优化代码，自动探测rvv/zfh/zvfh/xtheadvector并分发
riscv gemm rvv优化支持128bit/256bit vlen
禁用x86倒数优化避免可能的精度损失
改善harmonyos cpu拓扑结构abi兼容性
暂时禁用mesa驱动的vulkan矩阵扩展支持
兼容ndk-21编译asimdfhm目标的错误导致的问题
兼容clang-18编译avx512bf16时编译器崩溃的问题
禁用msvc对windows arm平台exp/tanh的svml优化以解决计算错误
探测avxvnniint8/avxvnniint16/avxneconvert指令集
runtime cpu开启时仅使用ncnn cmake内置的编译参数
删除windows arm32支持(@Shironana817)
android默认启用16kb pagesize编译，android-api升级到21
vkCreateDevice失败时不直接崩溃(@Upliner)
为powerpc架构跳过0.5附近数值的unaryop round测试用例
pnnx更新到torch-2.5
pnnx支持从traced inputs自动设定inputshape
pnnx编译不再输出来自torch头文件的警告
pnnx重排pass level2内的全部顺序，并复用pattern
pnnx不再保存debug中间模型(@LJoson)
pnnx输出python脚本的onnx导出代码更新到export(@whyb)
pnnx合并t5-layernorm为rmsnorm
pnnx不再折叠具有动态shape的tensor
pnnx在输出的python脚本中使用隐含的int转换避免trace时常数化
pnnx转换Tensor.select为ncnn crop+squeeze
pnnx转换onnx constantofshape为torch.zeros/ones
pnnx修正onnx clip在可选min/max缺失时的转换问题
ci更新riscv64工具链
ci添加c908/spacemit-x60
ci webassembly兼容node>20
ci android添加riscv64目标并打包
添加vim3 vulkan跑分数据(@GIBEREZ)

New Contributors

@ankushgoel27 made their first contribution in #5709
@Shironana817 made their first contribution in #5811
@GIBEREZ made their first contribution in #5821

Full Changelog: 2024082...2024122

Contributors

Upliner, whyb, and 4 other contributors

Assets 42

20 Aug 08:45

github-actions

20240820

a6d3ef5

android ios macos linux windows webassembly 预编译库 20240820 a6d3ef5

编译版本，默认配置，android-ndk-r27，xcode 15.2，ubuntu-20.04，ubuntu-22.04，ubuntu-24.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-vulkan.zip	visionos 静态库，支持 GPU	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-visionos-simulator-vulkan.zip	visionos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

新增RMSNorm层和对应的pnnx转换，单元测试
x86 convolution tiled gemm优化
量化工具支持 rnn/lstm/gru 动态量化
x86 lstm int8 sse2/xop/avx2/avx512/avx512vnni/avxvnni优化
arm rnn/lstm/gru int8 neon/asimdhp/asimddp优化
multiheadattention支持qdim参数与embed_dim不同
multiheadattention支持scale参数
更新pybind11到2.12支持numpy2
添加wasi支持(@quink-black)
添加x86/arm convolution/slice/concat oom单元测试
onnx2ncnn工具添加警告和推荐使用pnnx的信息输出(@lll143653)
修复x86 avx512 vnni指令派发失效的问题
增强x86/arm计算内核在内存不足时的错误返回
仅在windows arm平台使用ruapu指令集探测
windows mingw编译时支持大小核和SMT探测
修复powerpc vsx计算abs可能的错误
修复arm vfpv4条件下可能的fp16s/bf16s同时启用的冲突
修复aarch64架构l2-cache很小时因gemm K分块可能的越界读错误
修复riscv v tanh计算错误(@zhangyang2057)
arm/convolution_3x3_pack1to8_fp16s使用ldr/str替代ld1/st1优化(@quink-black)
修复c_api无参数函数声明(@quink-black)
c_api添加set_vulkan_device接口(@Baiyuetribe)
pyncnn添加从python bytes内存加载模型的接口(@joeyballentine)
为VkAndroidHardwareBufferImageAllocator添加NCNN_PLATFORM_API宏(@Xyzhao1999)
修复mingw64编译时avx崩溃和termux编译错误(@TianZerL)
修复在关闭NCNN_BF16时arm riscv编译错误
修复x86-wsl编译时的无用变量警告(@Tabbleman)
create_gpu_instance()中不进行destroy_gpu_instance()(@Asd-g)
更新ruapu.h(@lazyparser)
修复ndk-r27在cmake阶段的编译错误(@Galasnow)
添加yolov8示例代码(@whyb)
pnnx支持转换dynamo导出的onnx
pnnx默认编译onnx2pnnx支持，支持转换conv/convtranspose/pad/linear/softmax/relu/resize/upsample/avgpool/maxpool/batchnorm/lrn/layernorm/instancenorm/groupnorm/rnn/lstm/gru/prelu/gelu/elu/leakyrelu/relu6/celu/hardshrink/hardsigmoid/hardswish/clip/multiheadattention/reducemin/reducemax/reducemean/reducesum/reduceprod/logsoftmax/logsigmoid/mish/selu/sigmoid/silu/softmin/softplus/softshrink/softsign/tanh/tanhshrink/expand/permute/repeat/reshape/select/slice/cat/ceil/chunk/flatten/floor/maximum/minimum/split/squeeze/stack/transpose/unbind/unsqueeze
pnnx支持转换onnx指定inputshape
pnnx转换onnx遇到动态shape时尝试折叠非动态轴相关的常量
pnnx转换onnx合并简单的shape运算pattern
pnnx清除onnx中无用的cast
pnnx接受bf16的模型转换和输入输出类型
pnnx转换torch.tile/torch.where/torch.logaddexp
pnnx转换无dilation参数的F.maxpool到ncnn
pnnx转换1到2个轴参数的torch.roll到ncnn
pnnx转换有dim参数的torch.max/torch.min时返回tuple并自动删除没有用到的indice输出
pnnx合并onnx sdpa和qdim mha
pnnx识别sdpa的batch轴
pnnx支持torch-2.3和torch-2.4
pnnx不再折叠有就地操作的别名tensor为常量
pnnx转换到的ncnn模型py自动替换long为int
ci添加windows clang
ci添加harmonyos
ci添加mingw(@TianZerL)
ci添加esp32和esp32编译文档(@luxincn)
重构release ci脚本
发布ubuntu 24.04预编译包
发布visionos/visionos-simulator vulkan预编译包
pypi发布python 3.13预编译包
更新pytorch/onnx模型转换文档(@whyb)
添加riscv-gnu-toolchain编译文档(@Tabbleman)
添加harmonyos vulkan编译文档(@cugxchen)
修正vulkan-notes文档的错误(@roachsinai)
更新qcom855plus跑分数据
添加RaspberryPi 5 GPU超频跑分数据(@CharlieYu4994)
添加EPYC7742和V100跑分数据(@sakria9)
添加Snapdragon 888跑分数据(@chainsx)
添加RaspberryPi 5 CPU超频跑分数据(@chainsx)
添加OrangePi 5Plus跑分数据(@inspireMeNow)
添加Snapdragon 765G跑分数据(@inspireMeNow)
添加CVITEK SG2000跑分数据(@inspireMeNow)
添加OrangePi CM4跑分数据(@py1066)
添加Axera AX630C跑分数据(@UOPiceman)
添加Kunpeng 920 7260跑分数据(@violet73)

New Contributors

@quink-black made their first contribution in #5436
@Tabbleman made their first contribution in #5444
@roachsinai made their first contribution in #5472
@Asd-g made their first contribution in #5437
@lazyparser made their first contribution in #5499
@CharlieYu4994 made their first contribution in #5518
@Xyzhao1999 made their first contribution in #5521
@sakria9 made their first contribution in #5528
@inspireMeNow made their first contribution in #5550
@py1066 made their first contribution in #5551
@UOPiceman made their first contribution in #5559
@luxincn made their first contribution in #5567
@zhangyang2057 made their first contribution in #5584
@violet73 made their first contribution in #5606

Full Changelog: 2024041...2024082

Contributors

lazyparser, whyb, and 20 other contributors

Assets 42

10 Apr 11:16

github-actions

20240410

56775de

android ios macos linux windows webassembly 预编译库 20240410 56775de

编译版本，默认配置，android-ndk-r26c，xcode 15.2，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator	arm64 + arm64e + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst + watchos + watchos-simulator + tvos + tvos-simulator + visionos + visionos-simulator，支持 GPU	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库	arm64
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU	arm64
ncnn-ios-simulator.zip	ios simulator 静态库	x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU	x86_64 + arm64
ncnn-watchos.zip	watchos 静态库	armv7k + arm64_32
ncnn-watchos-simulator.zip	watchos simulator 静态库	x86_64 + arm64
ncnn-tvos.zip	tvos 静态库	x86_64 + arm64
ncnn-tvos-vulkan.zip	tvos 静态库，支持 GPU	x86_64 + arm64
ncnn-tvos-simulator.zip	tvos simulator 静态库	x86_64 + arm64
ncnn-tvos-simulator-vulkan.zip	tvos simulator 静态库，支持 GPU	x86_64 + arm64
ncnn-visionos.zip	visionos 静态库	arm64
ncnn-visionos-simulator.zip	visionos simulator 静态库	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

解耦合layer cpu和vulkan，不再使用virtual public继承
支持编译动态库时编译单元测试
单层特性掩码支持禁用多线程
extractor set_num_threads和set_vulkan_compute现在是无操作
gpu shader增加uniform类型改善adreno上fp16兼容性
检测vulkan矩阵扩展8x8x16配置，fp16a条件下默认使用fp16累加
更新stb_image rvv/neon优化
x86 mish avx512优化(@wnqn1597)
riscv gemm fp32 rvv优化(@Xinyu302)
加载模型上传权重时不保留无用的临时数据
c-api新增draw rectangle/text/circle/line接口(@Deepdive543443)
修复armv7平台加载fp16模型sigbus错误
修复reduction L2norm denormal产生inf的问题
修复arm平台pixel_resize rounding导致的数值误差
修复softmax arm fp16计算错误
修复risc-v rvv输出fp16没有自动转换的问题
修复destroy_gpu_instance在驱动加载不完整时crash的问题(@shatyuka)
destroy_gpu_instance等待全部设备idle(@whyb)
修复low-level api没有load_param直接create_pipeline可能的崩溃
修复ncnnoptimize在shape推断的崩溃
ncnnoptimize支持更多新算子，修复gemm权重丢失问题
被调试时候禁用signal指令集检测
windows-arm平台使用ruapu cpu指令集检测
arm vfpv4支持时启用自动转换fp16
在arm64架构中总是报告支持neon和vfpv4
simplevk寻找更多已知的vulkan驱动路径
修复旧cpp标准下risc-v rvv编译错误
修复某些老编译器在debug模式下编译错误
修复uwp平台编译
修复test_reduction运行时的警告
修复NCNN_PIXEL_DRAWING禁用时候编译错误(@shatyuka)
支持MSVC使用LLVM openmp运行时的配合编译(@shatyuka)
修复yolov8 python示例返回空发生错误(@dsplvd)
pnnx解耦torchscript加载，清理cxxabi hack，修复whole-archive链接
pnnx加载dynamo onnx，默认不启用编译
pnnx改善函数化，支持更多slice+inplace复合操作
pnnx转换torch.masked_select/torch.slice_scatter
pnnx支持超过4G的模型
pnnx macos编译universal wheel
pnnx添加entrypoint脚本
pnnx支持动态slice下标
pnnx转换softmin logsoftmax dtype参数
pnnx处理index_put传入空indices和标量数值
pnnx转换一些cudnn conv2d变种
pnnx合并完整slices为tensor_split
pnnx合并静态embedding
pnnx不消除会导致shape变化的数学操作
pnnx改善torch-2.1 mha attn_mask探测
pnnx修复无bias tensor的nn.Conv2d转换
pnnx转换torch.stack负数dim
pnnx添加torch.arange单元测试
pnnx修复图匹配失败时可能的越界访问问题
pnnx识别embedding输入的batch轴为0
pnnx python添加控制fp16参数(@MollySophia)
pnnx添加torch-2.2 ci
github ci使用4并行编译
更新cmake ios工具链，添加visionos ci，watchos支持arm64_32架构
添加apple a17和m3 cpu名称
不再编译apple平台32bit支持，不再编译ios arm64e架构，提升最低部署版本到ios-13
统一android python macos ci
不再打包和发布apple bitcode和32bit预编译包，新增visionos预编译包，新增tvos-gpu预编译包，更新openmp到18.1.2
改善a53/a55双发射文档(@luqiang-guo)
添加windows上protobuf>=22.0编译文档(@Galasnow)
更新macos编译文档(@lll143653)
清理无用的代码警告(@hokamilkv)
修正FAQ的拼写错误(@eltociear)
修正拼写错误(@hugo-syn)
修正拼写错误(@afredooo)
修正convolution_x86注释错误(@strongtz)
添加markdown文档代码辅助标志(@hugo-syn)
添加OneCloud跑分数据(@mizu-bai)
添加AWS c5.4xlarge跑分数据(@mizu-bai)
添加Xeon Phi 3120A跑分数据(@mizu-bai)
添加orangepi zero2跑分数据(@wonderfullook)
添加Dimensity 9300 MT6989跑分数据(@MollySophia)
添加PhytiumPi跑分数据(@HalfSweet)
添加remipi跑分数据(@dreamcmi)
添加radxa zero 3w跑分数据(@Qengineering)

New Contributors

@wonderfullook made their first contribution in #5277
@hugo-syn made their first contribution in #5301
@FartSimps0n made their first contribution in #5304
@HalfSweet made their first contribution in #5312
@strongtz made their first contribution in #5310
@afredooo made their first contribution in #5339
@shatyuka made their first contribution in #5346
@dsplvd made their first contribution in #5345
@Galasnow made their first contribution in #5359
@hokamilkv made their first contribution in #5365

Full Changelog: 2024010...2024041

Contributors

whyb, dsplvd, and 19 other contributors

Assets 38

02 Jan 04:06

github-actions

20240102

1e88fb8

android ios macos linux windows webassembly 预编译库 20240102 1e88fb8

编译版本，默认配置，android-ndk-r26b，xcode 13.4.1，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e
ncnn-ios-simulator.zip	ios simulator 静态库，with and w/o bitcode	i386 + x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库，with and w/o bitcode	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

内建vulkan驱动加载功能，不依赖vulkan-sdk编译gpu功能，可直接加载显卡驱动文件
msvc编译启用arm neon指令加速，启用arm64 asimdhp编译
实现python pnnx pypi包和python调用接口/文档(@Hideousmon)
arm convolution int8 直接卷积重构支持任意elempack
优化 vulkan global pooling性能
优化resize bilinear性能
压缩字体数据减小二进制体积
deconvolution支持动态权重和对应pnnx转换
新增跑分数据rank card(@Qengineering)
支持big-endian架构平台，powerpc32位
添加woa linux ci
添加msvc禁用exceptions/rtti的编译开关
在macos上使用信号探测avx512指令集支持情况
支持寻找32位显卡驱动文件(@whyb)
启用benchmark编译打印4维shape(@Deepdive543443)
修复riscv-int8 sigmoid激活的测试失败问题(@MollySophia)
修复deconvolution x86 bias非对齐访问的问题
修复prelu x86 sse指令非对齐访问的问题(@aioa)
修复windows上openmp设置线程数为0的警告
修复在支持16bit/8bit的gpu上有关fp16sa shader使用fp16 shared变量的警告
修复nvidia vulkan驱动在程序退出的crash
修复vkimagemat from_android_hardware_buffer缺失的elemsize参数错误
修复simpleocv Mat模板ptr的偏移错误
添加更过的gpu相关python绑定接口(@joeyballentine)
android vulkan包的api版本降低到14/21
pnnx支持转换recompute_scale_factor=True的nn.Upsample
新增nn.Identity测试
修复pnnx路径切分的问题
修复pnnx生成ncnn py空格对齐(@cmdbug)
pnnx生成的py可以直接执行推理
python pnnx返回优化后的torch模型
删除无用的代码(@ningjiang233)
改善cmake toolchain文件(@zchrissirhcz)
新增watchos和tvos ci
修复linux sde ci的运行错误
更新POWER clang版本信息的文档(@JeremyRand)
更新有关vulkan/libomp-dev依赖的文档(@JeremyRand)
更新有关编译python模块CMAKE_TOOLCHAIN_FILE环境变量的文档(@JeremyRand)
修复Rasberry拼写错误(@JeremyRand)
FAQ新增有关pyncnn数据连续性的文档(@lll143653)
更新readme下载页表格
添加Nintendo 3DS编译信息(@Deepdive543443)
添加oncloud amlogic s805跑分数据(@mizu-bai)
添加树莓派5 gpu跑分数据(@FantasyGmm)
添加Jetson TX2跑分数据(@FantasyGmm)
添加8gen2跑分数据(@mahirumahiru)
添加2K2000跑分数据(@RevySR)
更新Jetson Orin Nano/树莓派5跑分数据(@Qengineering)
添加visionfive2跑分数据(@wzyforgit)

New Contributors

@Deepdive543443 made their first contribution in #5116
@ningjiang233 made their first contribution in #5139
@FantasyGmm made their first contribution in #5152
@mahirumahiru made their first contribution in #5180
@Qengineering made their first contribution in #5216
@lll143653 made their first contribution in #5220
@joeyballentine made their first contribution in #5165

Full Changelog: 2023102...2024010

Contributors

JeremyRand, whyb, and 15 other contributors

Assets 42

27 Oct 06:17

github-actions

20231027

3116e02

android ios macos linux windows webassembly 预编译库 20231027 3116e02

编译版本，默认配置，android-ndk-r25c，xcode 13.4.1，ubuntu-20.04，ubuntu-22.04，vs2015，vs2017，vs2019，vs2022，emscripten-3.1.28

file	content	arch
ncnn-full-source.zip	包含全部 submodule 代码的完整源码
ncnn-android.zip	android 静态库/动态库	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-android-vulkan.zip	android 静态库/动态库，支持 GPU	armeabi-v7a + arm64-v8a + x86 + x86_64
ncnn-apple.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，with and w/o bitcode	armv7 + arm64 + arm64e + i386 + x86_64
ncnn-apple-vulkan.zip	apple xcframework，ios + ios-simulator + macos + mac-catalyst，支持 GPU，with and w/o bitcode	arm64 + arm64e + x86_64
ncnn-ios.zip	ios 静态库，with and w/o bitcode	armv7 + arm64 + arm64e
ncnn-ios-vulkan.zip	ios 静态库，支持 GPU，with and w/o bitcode	arm64 + arm64e
ncnn-ios-simulator.zip	ios simulator 静态库，with and w/o bitcode	i386 + x86_64 + arm64
ncnn-ios-simulator-vulkan.zip	ios simulator 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-macos.zip	macos 静态库	x86_64 + arm64
ncnn-macos-vulkan.zip	macos 静态库，支持 GPU	x86_64 + arm64
ncnn-mac-catalyst.zip	mac catalyst 静态库，with and w/o bitcode	x86_64 + arm64
ncnn-mac-catalyst-vulkan.zip	mac catalyst 静态库，支持 GPU，with and w/o bitcode	x86_64 + arm64
ncnn-ubuntu.zip	ubuntu linux 静态库/动态库，支持 GPU，模型转换工具	x86_64
ncnn-windows.zip	windows 静态库/动态库，支持 GPU，模型转换工具	x86 + x64 + arm + arm64
ncnn-webassembly.zip	webassembly 静态库	wasm32 + simd + threads + simd-threads

x86 convolution int8 gemm重构支持任意elempack
x86 convolution int8 winograd重构支持任意elempack
arm convolution int8 gemm重构支持任意elempack
arm convolution int8 winograd重构支持任意elempack
gelu vulkan优化(@FhqTreap)
convolution1d vulkan优化(@FhqTreap)
gridsample x86优化(@Yoh-Z)
riscv gemm fp32优化(@Xinyu302)
新增erf/shrink和onnx转换(@brightening-eyes)
新增diag和pnnx转换(@wnqn1597)
新增celu和pnnx转换(@wnqn1597)
新增simplemath，允许不依赖libm编译使用数学函数(@HonestDeng)
pooling adaptive支持动态的输出尺寸和pnnx转换
elu selu支持4维输入输出
slice支持indices参数
memorydata支持tag参数和fp16存储
x86 selu shufflechannel优化(@wnqn1597)
修复convolution vulkan在固定shape时的结果错误
修复权重tag潜在的溢出(@lrw04)
按层加载模型减少内存占用(@daquexian)
修复老版本gcc编译avx2 gather的错误(@chainsx)
修复老版本gcc编译_mm256_set_m128的错误(@whyb)
修复新版本protobuf编译问题
修复老版本glibc round编译问题
修复c906工具链编译错误
pyncnn启用vulkan支持(@Hideousmon)
pyncnn添加load_param_mem接口(@JeremyRand @theflyingzamboni)
pnnx支持torch-2.1
pnnx消除moduleop的输出unpack
pnnx moduleop将权重shape作为参数写入param，内部权重顺序为使用顺序
pnnx改善reflect replicated pad匹配
pnnx合并conv3d-bn和deconv3d-bn
pnnx转换torch.narrow(@zyt1024)
pnnx转换torch.lgamma(@shudorcl)
pnnx转换torch.positive(@nicochen1118)
pnnx转换torch.cumprod(@Jiang-Weibo)
pnnx转换torch.mv/nn.ReplicationPad3d(@ShuRaymond)
pnnx转换F.pairwise_distance(@marsyule)
pnnx转换torch.view_as_real/torch.view_as_complex(@Baiyuetribe)
修复pnnx与新版本protobuf编译问题(@HuPengsheet)
修复pnnx改变目录下划线的错误
onnx2ncnn支持celu转换(@brightening-eyes)
自动为pull request添加label
修复ohos工具链编译错误
改进codeformat脚本使用函数(@xiezheng-XD)
添加rk3566 rk3588s跑分数据(@chainsx)
添加Allwinner T527跑分数据(@YuzukiTsuru)
添加树莓派5b跑分数据(@Pillar1989)
添加RTX A3000跑分数据(@chainsx)
添加多款pc跑分数据(@whyb)