fix(chroma_store): fix three bugs causing DB profile refresh and RAG-NL2SQL to fail#3021
Open
yunshaochu wants to merge 2 commits into
Open
fix(chroma_store): fix three bugs causing DB profile refresh and RAG-NL2SQL to fail#3021yunshaochu wants to merge 2 commits into
yunshaochu wants to merge 2 commits into
Conversation
…NL2SQL to fail
1. delete_vector_name: list_collections() returns Collection objects, not strings.
Comparing collection name against Collection objects always returned False,
causing the delete to be silently skipped.
2. delete_vector_name: after deleting the collection, self._collection still
referenced the deleted object, causing "Collection does not exist" errors
on subsequent operations.
3. delete_vector_name + create_collection: when recreating a collection after
deletion, metadata={"hnsw:space": "cosine"} was not passed, resulting in
the default L2 distance space. Since score = 1 - distance, L2 distances
(300+) produced negative scores that were all filtered out by
score_threshold=0, making RAG-NL2SQL always return 0 results.
Collaborator
|
@yunshaochu Please format code with |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix three bugs in
ChromaStorethat caused the database profile refresh API and RAG-NL2SQL retrieval to fail silently.Bug 1:
delete_vector_name- collection existence check brokenlist_collections()returns Collection objects, not strings. Comparingself._collection.nameagainst Collection objects always returnedFalse, causing the delete to be silently skipped. The refresh API would then skip re-embedding becausevector_name_exists()still returnedTrue.Bug 2:
delete_vector_name- stale collection reference after deletionAfter
delete_collection(),self._collectionstill pointed to the deleted collection object, causing "Collection does not exist" errors on subsequent operations.Bug 3:
create_collection/delete_vector_name- missing hnsw:space metadataWhen recreating a collection after deletion,
metadata={"hnsw:space": "cosine"}was not passed, resulting in the default L2 distance space. Sincescore = 1 - distance, L2 distances (300+) produced negative scores that were all filtered out byscore_threshold=0, making RAG-NL2SQL always return 0 results.Additionally,
get_or_create_collectiondoes not update metadata for existing collections, socreate_collectionnow detects mismatched metadata and recreates the collection with correct settings.How Has This Been Tested?
/api/v1/chat/db/refreshand/api/v2/serve/datasources/{id}/refresh— both now work correctly{"hnsw:space": "cosine"}metadata[RAG-NL2SQL] Retrieved table info count: 10(previouslycount: 0)make fmt,make fmt-check,make mypy— all passedSnapshots:
Before fix (L2 distance, score always negative):
After fix (cosine distance, score positive):
Checklist: