Skip to content

faiss-cpu 1.14.2 wheel SIGILLs at import on x86-64 CPUs without AVX2 — AVX2 instruction in .init_array static initializer runs before dynamic dispatch #5296

@LearningCircuit

Description

@LearningCircuit

Summary

The faiss-cpu 1.14.2 manylinux x86_64 wheel (the first one published from the new in-repo wheel pipeline, built with FAISS_OPT_LEVEL=dd) crashes with SIGILL during import faiss on x86-64 CPUs that support AVX but not AVX2 (Sandy Bridge / Ivy Bridge era). The crash happens inside dlopen of libfaiss.soctypes.CDLL(".../faiss/libfaiss.so") alone reproduces it — so neither the runtime CPUID dispatch nor FAISS_SIMD_LEVEL=NONE can prevent it. faiss-cpu 1.13.2 (the last faiss-wheels build, which shipped a separate generic binary) works on the same CPUs.

This appears to contradict the DD build's own intent: common code is compiled with -mpopcnt -msse4 -mno-avx -mno-avx2 ("prevents auto-vectorization" per the comment in faiss/CMakeLists.txt), so the SSE4-era baseline looks deliberate and this looks like an accidental leak rather than a baseline raise.

Platform

OS: Linux (any distro; reproduced on real hardware and under qemu-user)

Faiss version: faiss-cpu 1.14.2 from PyPI (faiss_cpu-1.14.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl)

Installed from: pip (official PyPI wheel)

Faiss compilation options: as published — FAISS_OPT_LEVEL=dd per the repo pyproject.toml

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

$ pip install faiss-cpu==1.14.2
$ qemu-x86_64 -cpu SandyBridge python -c "import faiss"   # SIGILL, exit code 132
$ qemu-x86_64 -cpu SandyBridge python -c "import ctypes; ctypes.CDLL('<site-packages>/faiss/libfaiss.so')"  # also SIGILL
$ FAISS_SIMD_LEVEL=NONE qemu-x86_64 -cpu SandyBridge python -c "import faiss"  # still SIGILL (crash precedes env handling)
$ qemu-x86_64 -cpu Haswell python -c "import faiss"        # OK, selects AVX2
$ pip install faiss-cpu==1.13.2
$ qemu-x86_64 -cpu SandyBridge python -c "import faiss"    # OK, loads OPTIMIZE GENERIC

Analysis

One of the three .init_array entries in the shipped libfaiss.so (initializer at offset 0x14ffc0) executes an AVX2 instruction unconditionally at library load time:

14ffc0:  48 b8 01 02 04 08 10 ...  movabs $0x8040201008040201,%rax
...
14fff4:  c4 e2 7d 59 c0            vpbroadcastq %xmm0,%ymm0     <-- faulting instruction (qemu trace)

The pattern (movabs of a bit-table constant + broadcast + table fill) looks like GCC auto-vectorizing the dynamic initializer of a global lookup table in one of the AVX2-flagged translation units. The per-file -mavx2 -mfma flags legitimately apply to the whole TU — including dynamic initializers of globals — but those initializers run at dlopen, before faiss/utils/simd_levels.cpp dispatch can check CPUID. Static initializers in SIMD TUs would need to be constant-initialized, lazily initialized, or compiled at baseline.

Found while debugging a downstream Docker image crash: LearningCircuit/local-deep-research#4480 (we've pinned to 1.13.2 as a workaround).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions