Summary
The faiss-cpu 1.14.2 manylinux x86_64 wheel (the first one published from the new in-repo wheel pipeline, built with FAISS_OPT_LEVEL=dd) crashes with SIGILL during import faiss on x86-64 CPUs that support AVX but not AVX2 (Sandy Bridge / Ivy Bridge era). The crash happens inside dlopen of libfaiss.so — ctypes.CDLL(".../faiss/libfaiss.so") alone reproduces it — so neither the runtime CPUID dispatch nor FAISS_SIMD_LEVEL=NONE can prevent it. faiss-cpu 1.13.2 (the last faiss-wheels build, which shipped a separate generic binary) works on the same CPUs.
This appears to contradict the DD build's own intent: common code is compiled with -mpopcnt -msse4 -mno-avx -mno-avx2 ("prevents auto-vectorization" per the comment in faiss/CMakeLists.txt), so the SSE4-era baseline looks deliberate and this looks like an accidental leak rather than a baseline raise.
Platform
OS: Linux (any distro; reproduced on real hardware and under qemu-user)
Faiss version: faiss-cpu 1.14.2 from PyPI (faiss_cpu-1.14.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl)
Installed from: pip (official PyPI wheel)
Faiss compilation options: as published — FAISS_OPT_LEVEL=dd per the repo pyproject.toml
Running on:
Interface:
Reproduction instructions
$ pip install faiss-cpu==1.14.2
$ qemu-x86_64 -cpu SandyBridge python -c "import faiss" # SIGILL, exit code 132
$ qemu-x86_64 -cpu SandyBridge python -c "import ctypes; ctypes.CDLL('<site-packages>/faiss/libfaiss.so')" # also SIGILL
$ FAISS_SIMD_LEVEL=NONE qemu-x86_64 -cpu SandyBridge python -c "import faiss" # still SIGILL (crash precedes env handling)
$ qemu-x86_64 -cpu Haswell python -c "import faiss" # OK, selects AVX2
$ pip install faiss-cpu==1.13.2
$ qemu-x86_64 -cpu SandyBridge python -c "import faiss" # OK, loads OPTIMIZE GENERIC
Analysis
One of the three .init_array entries in the shipped libfaiss.so (initializer at offset 0x14ffc0) executes an AVX2 instruction unconditionally at library load time:
14ffc0: 48 b8 01 02 04 08 10 ... movabs $0x8040201008040201,%rax
...
14fff4: c4 e2 7d 59 c0 vpbroadcastq %xmm0,%ymm0 <-- faulting instruction (qemu trace)
The pattern (movabs of a bit-table constant + broadcast + table fill) looks like GCC auto-vectorizing the dynamic initializer of a global lookup table in one of the AVX2-flagged translation units. The per-file -mavx2 -mfma flags legitimately apply to the whole TU — including dynamic initializers of globals — but those initializers run at dlopen, before faiss/utils/simd_levels.cpp dispatch can check CPUID. Static initializers in SIMD TUs would need to be constant-initialized, lazily initialized, or compiled at baseline.
Found while debugging a downstream Docker image crash: LearningCircuit/local-deep-research#4480 (we've pinned to 1.13.2 as a workaround).
Summary
The faiss-cpu 1.14.2 manylinux x86_64 wheel (the first one published from the new in-repo wheel pipeline, built with
FAISS_OPT_LEVEL=dd) crashes with SIGILL duringimport faisson x86-64 CPUs that support AVX but not AVX2 (Sandy Bridge / Ivy Bridge era). The crash happens insidedlopenoflibfaiss.so—ctypes.CDLL(".../faiss/libfaiss.so")alone reproduces it — so neither the runtime CPUID dispatch norFAISS_SIMD_LEVEL=NONEcan prevent it. faiss-cpu 1.13.2 (the last faiss-wheels build, which shipped a separate generic binary) works on the same CPUs.This appears to contradict the DD build's own intent: common code is compiled with
-mpopcnt -msse4 -mno-avx -mno-avx2("prevents auto-vectorization" per the comment infaiss/CMakeLists.txt), so the SSE4-era baseline looks deliberate and this looks like an accidental leak rather than a baseline raise.Platform
OS: Linux (any distro; reproduced on real hardware and under qemu-user)
Faiss version: faiss-cpu 1.14.2 from PyPI (
faiss_cpu-1.14.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl)Installed from: pip (official PyPI wheel)
Faiss compilation options: as published —
FAISS_OPT_LEVEL=ddper the repopyproject.tomlRunning on:
Interface:
Reproduction instructions
Analysis
One of the three
.init_arrayentries in the shippedlibfaiss.so(initializer at offset0x14ffc0) executes an AVX2 instruction unconditionally at library load time:The pattern (movabs of a bit-table constant + broadcast + table fill) looks like GCC auto-vectorizing the dynamic initializer of a global lookup table in one of the AVX2-flagged translation units. The per-file
-mavx2 -mfmaflags legitimately apply to the whole TU — including dynamic initializers of globals — but those initializers run atdlopen, beforefaiss/utils/simd_levels.cppdispatch can check CPUID. Static initializers in SIMD TUs would need to be constant-initialized, lazily initialized, or compiled at baseline.Found while debugging a downstream Docker image crash: LearningCircuit/local-deep-research#4480 (we've pinned to 1.13.2 as a workaround).