Skip to content

fix(chroma_store): fix three bugs causing DB profile refresh and RAG-NL2SQL to fail#3021

Open
yunshaochu wants to merge 2 commits into
eosphoros-ai:mainfrom
yunshaochu:main
Open

fix(chroma_store): fix three bugs causing DB profile refresh and RAG-NL2SQL to fail#3021
yunshaochu wants to merge 2 commits into
eosphoros-ai:mainfrom
yunshaochu:main

Conversation

@yunshaochu

Copy link
Copy Markdown

Description

Fix three bugs in ChromaStore that caused the database profile refresh API and RAG-NL2SQL retrieval to fail silently.

Bug 1: delete_vector_name - collection existence check broken
list_collections() returns Collection objects, not strings. Comparing self._collection.name against Collection objects always returned False, causing the delete to be silently skipped. The refresh API would then skip re-embedding because vector_name_exists() still returned True.

Bug 2: delete_vector_name - stale collection reference after deletion
After delete_collection(), self._collection still pointed to the deleted collection object, causing "Collection does not exist" errors on subsequent operations.

Bug 3: create_collection / delete_vector_name - missing hnsw:space metadata
When recreating a collection after deletion, metadata={"hnsw:space": "cosine"} was not passed, resulting in the default L2 distance space. Since score = 1 - distance, L2 distances (300+) produced negative scores that were all filtered out by score_threshold=0, making RAG-NL2SQL always return 0 results.

Additionally, get_or_create_collection does not update metadata for existing collections, so create_collection now detects mismatched metadata and recreates the collection with correct settings.

How Has This Been Tested?

  1. Set up MSSQL database (ShopDB) with 17 tables including Chinese-named tables
  2. Called /api/v1/chat/db/refresh and /api/v2/serve/datasources/{id}/refresh — both now work correctly
  3. Verified ChromaDB collections have correct {"hnsw:space": "cosine"} metadata
  4. Verified RAG-NL2SQL retrieval returns results (previously returned 0)
  5. Verified log output: [RAG-NL2SQL] Retrieved table info count: 10 (previously count: 0)
  6. Ran make fmt, make fmt-check, make mypy — all passed

Snapshots:

Before fix (L2 distance, score always negative):

Query with L2 distance: dist=3.62, score=-2.62, pass_threshold_0=False
[RAG-NL2SQL] Retrieved table info count: 0, content: []

After fix (cosine distance, score positive):

Query with cosine distance: dist=0.55, score=0.45, pass_threshold_0=True
[RAG-NL2SQL] Retrieved table info count: 10, content: [...]

Checklist:

  • My code follows the style guidelines of this project
  • I have already rebased the commits and make the commit message conform to the project standard.
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (N/A - bug fix, no doc changes needed)
  • Any dependent changes have been merged and published in downstream modules (N/A)

…NL2SQL to fail

1. delete_vector_name: list_collections() returns Collection objects, not strings.
   Comparing collection name against Collection objects always returned False,
   causing the delete to be silently skipped.

2. delete_vector_name: after deleting the collection, self._collection still
   referenced the deleted object, causing "Collection does not exist" errors
   on subsequent operations.

3. delete_vector_name + create_collection: when recreating a collection after
   deletion, metadata={"hnsw:space": "cosine"} was not passed, resulting in
   the default L2 distance space. Since score = 1 - distance, L2 distances
   (300+) produced negative scores that were all filtered out by
   score_threshold=0, making RAG-NL2SQL always return 0 results.
@github-actions github-actions Bot added the fix Bug fixes label Apr 17, 2026

@chenliang15405 chenliang15405 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenliang15405

Copy link
Copy Markdown
Collaborator

@yunshaochu Please format code with make fmt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants