A lightweight RAG system that retrieves relevant document chunks and generates grounded answers using LLMs.
- Document Chunking: Splits documents into meaningful segments
- Semantic Embeddings: Uses
sentence-transformers/all-MiniLM-L6-v2 - FAISS Vector Search: Fast similarity search
- Grounded Generation: LLM answers strictly based on retrieved content
- Multiple LLM Options: OpenRouter API or Local HuggingFace models
mini-rag/
├── documents/ # Your documents (.txt or .md files)
├── src/
│ ├── chunker.py # Document chunking
│ ├── embedder.py # Embedding generation
│ ├── indexer.py # FAISS vector indexing
│ ├── retriever.py # Semantic retrieval
│ ├── generator.py # Local LLM generator
│ ├── generator_openrouter.py # OpenRouter API generator
│ └── rag_pipeline.py # Main orchestrator
├── main.py # CLI interface
├── requirements.txt
└── README.md
cd mini-rag
# Create virtual environment (recommended)
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
# Install dependencies
pip install -r requirements.txtOpenRouter provides access to various LLMs via API. Free models available!
- Go to https://openrouter.ai/keys
- Sign up / Login
- Create a new API key
- Copy the key
PowerShell:
$env:OPENROUTER_API_KEY = "sk-or-v1-your-key-here"Command Prompt:
set OPENROUTER_API_KEY=sk-or-v1-your-key-hereLinux/Mac:
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"# Single query
python main.py --query "What packages does Indecimal offer?" --use-openrouter
# Interactive mode
python main.py --interactive --use-openrouter
# Use different model
python main.py --query "Your question" --use-openrouter --openrouter-model "meta-llama/llama-3.2-3b-instruct:free"| Model | ID |
|---|---|
| Google Gemma 3n | google/gemma-3n-e2b-it (default) |
| Meta Llama 3.2 3B | meta-llama/llama-3.2-3b-instruct:free |
| Mistral 7B | mistralai/mistral-7b-instruct:free |
| Microsoft Phi-3 Mini | microsoft/phi-3-mini-128k-instruct:free |
Run LLMs locally on your machine. Requires more RAM (~4GB+).
# Single query (uses TinyLlama by default)
python main.py --query "What packages does Indecimal offer?" --use-local-llm
# Interactive mode
python main.py --interactive --use-local-llm
# Use different local model
python main.py --query "Your question" --use-local-llm --llm-model "microsoft/phi-2"| Model | Size | RAM Required |
|---|---|---|
| TinyLlama-1.1B-Chat (default) | 1.1B | ~3GB |
| Microsoft Phi-2 | 2.7B | ~6GB |
| Mistral-7B-Instruct | 7B | ~16GB |
The first run will download the model (~2-4GB). This only happens once.
For testing without LLM - returns raw retrieved context.
python main.py --query "What packages does Indecimal offer?"
python main.py --interactive| Flag | Description |
|---|---|
-q, --query |
Question to ask |
-i, --interactive |
Interactive mode |
-d, --docs |
Documents directory (default: documents) |
-k, --top-k |
Number of chunks to retrieve (default: 3) |
--use-openrouter |
Use OpenRouter API |
--openrouter-model |
OpenRouter model ID |
--openrouter-key |
API key (or use env var) |
--use-local-llm |
Use local HuggingFace model |
--llm-model |
Local model name |
--chunk-size |
Chunk size in characters (default: 500) |
======================================================================
QUERY: What packages does Indecimal offer?
======================================================================
RETRIEVED CONTEXT (Top 3 chunks):
----------------------------------------------------------------------
[1] Source: doc2.md (Score: 0.85)
Package Pricing: Essential ₹1,851/sqft, Premier ₹1,995/sqft...
[2] Source: doc1.md (Score: 0.78)
Indecimal provides end-to-end home construction support...
======================================================================
GENERATED ANSWER:
======================================================================
Based on the provided documents, Indecimal offers four packages:
1. Essential: ₹1,851/sqft
2. Premier (Most Popular): ₹1,995/sqft
3. Infinia: ₹2,250/sqft
4. Pinnacle: ₹2,450/sqft
----------------------------------------------------------------------
⏱ Retrieval: 25.3ms | Generation: 1234.5ms
----------------------------------------------------------------------
- Place
.txtor.mdfiles in thedocuments/folder - Run the pipeline - documents are automatically indexed
python main.py --query "Your question about your documents" --use-openrouterMIT License