This is a beginner-friendly project that demonstrates the core concepts of a Retrieval-Augmented Generation (RAG) system in a creative area - to recommend quotes! It allows users to search for relevant quotes based on their query and receive a generated explanation using OpenAI's GPT models.
The goal of this project is to implement my understanding of how retrieval and generation work together in a RAG system, without needing complex infrastructure like vector databases.
✔ Load a dataset of quotes, authors, and tags
✔ Embed quotes using a local sentence-transformer model
✔ Store and reuse embeddings efficiently in a JSON file
✔ Retrieve the most similar quotes based on cosine similarity
✔ Use OpenAI’s API model (openai-4o-mini) to generate explanations or insights based on retrieved quotes
✔ Modular design ready for scaling later with tools like FAISS or Chroma DB
The dataset used here contains:
quote: The text of the quoteauthor: The person who said ittags: Categories or themes related to the quote
Link: https://huggingface.co/datasets/Abirate/english_quotes
-
Embedding
Each quote is converted into a vector usingsentence-transformersto capture its meaning. -
Retrieval
Given a user query, the system finds the most similar quotes by comparing embeddings using cosine similarity. -
Generation
Retrieved quotes are aggregated and passed to OpenAI’s GPT model to generate a context-aware explanation or response.
- Python
- Streamlit – Web interface
- Sentence Transformers – Semantic embeddings
- scikit-learn – Similarity calculations
- OpenAI GPT (optional) – Enhanced recommendations and explanations
- JSON – Data storage and export
-
Clone the repository in Colab
-
Install required packages as mentioned in requirements.txt
-
Add your OpenAI API KEY in Colab secrets
-
Run the repo and you will get quote recommendations for a user query
-
Alternatively, clone the repo on your system, install the required packages.
-
In command prompt, set your API key as environment variable: export OPENAI_API_KEY="your_openai_api_key"
-
Run the app.py file which contains your app interface: streamlit run app.py