Semantic Tool Discovery for MCP

A replication of the semantic tool discovery architecture described in:

Semantic Tool Discovery for Large Language Models: A Vector-Based Approach to MCP Tool Selection
Mudunuri et al. (2026) — https://arxiv.org/abs/2603.20313

This project parses the static MCP tool catalog in src/tools.py, converts each tool into a semantic document, stores those documents in Weaviate, and retrieves the most relevant tools for a user query.

The notebook entry_point.ipynb is the current entry point. It demonstrates the full flow:

ingest the tool catalog into Weaviate,
run a sample retrieval query, and
evaluate retrieval quality against the built-in dataset.

Motivation

Current MCP deployments expose all available tool definitions to the LLM on every request. Modern tool schemas require 200–800 tokens per tool; with 100 tools, this consumes 20K–80K tokens before any user query or response. The paper identifies two broader failure modes from this static provisioning approach:

Token overhead and cost — at scale, loading full catalogs is economically prohibitive and competes with conversation history and retrieved documents for context-window space.
Accuracy degradation — LLM performance degrades with increased context length. Presenting irrelevant tools introduces noise that reduces tool-selection accuracy, especially when tools have similar names.

The paper proposes replacing static provisioning with vector-based dynamic selection: index all tools as dense embeddings and retrieve only the top-K most relevant tools for each query. Their benchmark across 121 tools from 5 MCP servers (Filesystem, MySQL, Slack, GitHub, Time/Weather) shows:

K	Hit Rate	MRR	Token Reduction	Latency
1	85.0%	0.85	99.6%	<91 ms
3	97.1%	0.91	99.6%	<91 ms
5	97.1%	0.91	99.6%	<91 ms

K=3 is identified as the optimal operating point, achieving the best F1 (58.4%) while maintaining a 97.1% hit rate and 0.91 MRR.

What It Does

Parses tool definitions from src/tools.py
Builds one searchable text document per tool
Stores tool chunks in the McpToolChunks Weaviate collection
Uses hybrid retrieval to return the top-K most relevant tools for a query
Provides a small evaluation loop to measure top-K retrieval success

How The Entry Notebook Works

The notebook uses the ingestion, retrieval, and evaluation modules directly. The package also exposes convenience exports from src/init.py.

from src.store_and_retrieve.indexing import ingest_tools_file
from src.store_and_retrieve.retrieval import retrieve_tool_chunks
from src.evaluation.eval import run_evaluation

Its three cells do the following:

Ingest src/tools.py into Weaviate with ingest_tools_file("src/tools.py")
Query the store with retrieve_tool_chunks(sample_query, top_k=3)
Run the evaluation suite with run_evaluation(top_k=3)

Data Model

Each tool chunk is stored as a structured text document with this shape:

Tool: <tool_name>
Purpose: <purpose>
Capabilities: <capabilities>
Parameters: <parameters>

Only three properties are stored in Weaviate:

text
tool_name
server

The purpose, capabilities, and parameters fields are embedded into text to improve retrieval quality.

Requirements

Python 3.10+
Weaviate Cloud or a compatible Weaviate instance
OPENAI_API_KEY for Weaviate text vectorization

Install dependencies with:

uv sync

Environment Variables

Set these in .env or your shell:

WEAVIATE_API_KEY: Weaviate API key
WEAVIATE_URL or WEAVIATE_REST_ENDPOINT: Weaviate cluster URL
OPENAI_API_KEY: OpenAI API key used by the Weaviate vectorizer

Usage

Open entry_point.ipynb and run the cells in order.

To do the same from Python:

from src import ingest_tools_file, retrieve_tool_chunks
from src.evaluation.eval import run_evaluation

count = ingest_tools_file("src/tools.py")
print(f"Inserted {count} tool chunks")

hits = retrieve_tool_chunks("I need to search files in a directory", top_k=3)
for item in hits:
    print(item.server, item.tool_name, item.score)

run_evaluation(top_k=3)

Typical ingestion output looks like:

Inserted 121 tool chunks from src/tools.py

Typical evaluation output looks like:

✅ PASS: Copy the entire 'images' folder over to ... -> copy_directory
❌ FAIL: When was the database.sqlite file last m... -> Got: ['select_database', 'ping_database', 'disconnect_database'], Expected: get_file_info

Retrieval Defaults

top_k=3: default retrieval depth used in the notebook and evaluation loop
alpha=0.65: hybrid search blend between lexical and semantic matching
score_threshold: optional filter that falls back to top-K if it filters everything out

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
entry_point.ipynb		entry_point.ipynb
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Tool Discovery for MCP

Motivation

What It Does

How The Entry Notebook Works

Data Model

Requirements

Environment Variables

Usage

Retrieval Defaults

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Semantic Tool Discovery for MCP

Motivation

What It Does

How The Entry Notebook Works

Data Model

Requirements

Environment Variables

Usage

Retrieval Defaults

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages