Skip to content

Selectable local embedding models (multilingual support) #37

Description

@introfini

Summary

Allow users to choose the local embedding model from a small curated set, instead of being locked to nomic-embed-text-v1.5 (768 dims). Everything stays fully on-device — this is not about cloud APIs (see #31, which stays out by design).

Motivation

ZotSeek currently ships a single bundled model. It's solid for English but isn't the best option for every library, especially heavy multilingual collections where models like bge-m3 or multilingual-e5 clearly outperform it on retrieval.

This came up in #31, and is likely relevant to multilingual users tracked in #24 and #16.

Proposal

  • Curated set of local ONNX models, downloaded on demand, run the same way as today (fully local).
  • Include at least one stronger multilingual model (candidates: bge-m3, multilingual-e5).
  • Model selection in preferences, with clear info on dimensions and trade-offs.

Key technical constraint (index migration)

Each chunk stores its model_id, and embeddings from different models have different dimensions / vector spaces, so queries must use the same model that built the index. Switching the active model therefore requires either:

  • a full re-index of the library, or
  • per-model partitioned search (only query chunks whose model_id matches the active model).

This is the main design decision to resolve before implementation: what happens to an already-indexed library when the user changes model.

Checklist

  • Decide index-migration strategy (re-index vs partitioned search)
  • Curated model list + on-demand download
  • Add a stronger multilingual model
  • Preferences UI for model selection (show dims + warning on switch)
  • Guard mixed-model libraries from silently returning incomplete results
  • Docs (README + SEARCH_ARCHITECTURE)

Out of scope

Cloud / online embedding APIs — refused by design (#31). This issue is local-only.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions