Summary
Allow users to choose the local embedding model from a small curated set, instead of being locked to nomic-embed-text-v1.5 (768 dims). Everything stays fully on-device — this is not about cloud APIs (see #31, which stays out by design).
Motivation
ZotSeek currently ships a single bundled model. It's solid for English but isn't the best option for every library, especially heavy multilingual collections where models like bge-m3 or multilingual-e5 clearly outperform it on retrieval.
This came up in #31, and is likely relevant to multilingual users tracked in #24 and #16.
Proposal
- Curated set of local ONNX models, downloaded on demand, run the same way as today (fully local).
- Include at least one stronger multilingual model (candidates:
bge-m3, multilingual-e5).
- Model selection in preferences, with clear info on dimensions and trade-offs.
Key technical constraint (index migration)
Each chunk stores its model_id, and embeddings from different models have different dimensions / vector spaces, so queries must use the same model that built the index. Switching the active model therefore requires either:
- a full re-index of the library, or
- per-model partitioned search (only query chunks whose
model_id matches the active model).
This is the main design decision to resolve before implementation: what happens to an already-indexed library when the user changes model.
Checklist
Out of scope
Cloud / online embedding APIs — refused by design (#31). This issue is local-only.
Summary
Allow users to choose the local embedding model from a small curated set, instead of being locked to
nomic-embed-text-v1.5(768 dims). Everything stays fully on-device — this is not about cloud APIs (see #31, which stays out by design).Motivation
ZotSeek currently ships a single bundled model. It's solid for English but isn't the best option for every library, especially heavy multilingual collections where models like
bge-m3ormultilingual-e5clearly outperform it on retrieval.This came up in #31, and is likely relevant to multilingual users tracked in #24 and #16.
Proposal
bge-m3,multilingual-e5).Key technical constraint (index migration)
Each chunk stores its
model_id, and embeddings from different models have different dimensions / vector spaces, so queries must use the same model that built the index. Switching the active model therefore requires either:model_idmatches the active model).This is the main design decision to resolve before implementation: what happens to an already-indexed library when the user changes model.
Checklist
Out of scope
Cloud / online embedding APIs — refused by design (#31). This issue is local-only.