A modular collection of AI-powered tools for personal information management and research. This suite includes both a Personal RAG System for querying personal documents and a Web Search Agent for real-time information retrieval.
web_and_rag/
├── agent_web_search/
│ ├── agent_web_search.py # Web search agent implementation
│ └── README.md # Agent Web Search documentation
├── personal-rag-system/
│ ├── me/
│ │ └── summary.txt # Personal introduction and background
│ ├── rag_multi-docs.py # Main RAG system implementation
│ └── README.md # Personal RAG System documentation
├── hybrid-rag-system/
│ ├── hybrid_rag.py # Hybrid system combining both approaches
│ └── README.md # Hybrid RAG System documentation
├── requirements.txt # Python dependencies
├── .env # Environment variables (create this)
└── README.md # This file
This modular AI suite provides three complementary approaches to information retrieval and question answering:
Query your personal documents using advanced Retrieval-Augmented Generation (RAG) technology. Perfect for answering questions about your background, experience, and personal information stored locally.
Use Cases:
- Interview preparation
- Personal information lookup
- Background summaries
- Skills and experience queries
Real-time web search agent that finds current information online using DuckDuckGo. Ideal for research and staying up-to-date with latest developments.
Use Cases:
- Current news and trends
- Research assistance
- Real-time information lookup
- Market research
Intelligently combines both local personal documents and web search to provide comprehensive answers. Automatically determines when to use local data, web search, or both.
Use Cases:
- Complex queries requiring both personal and public information
- Professional research with personal context
- Comprehensive question answering
- Adaptive information retrieval
Independent Operation: Each system works standalone for specific use cases Hybrid Intelligence: The hybrid system automatically routes queries to the most appropriate source(s) Modular Design: Use any combination based on your needs
- Python 3.8+
- OpenAI API key
-
Clone the repository
git clone <repository-url> cd agent_web_rag
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
# Create .env file in project root echo "OPENAI_API_KEY=your_openai_api_key_here" > .env
-
Set up Personal RAG System
cd personal-rag-system mkdir -p me # Add your summary.txt (and optionally other documents) to the me/ folder python rag_multi-docs.py
-
Try Web Search Agent
cd ../agent_web_search python agent_web_search.py
All required packages are listed in the shared requirements.txt:
# Core AI/ML Dependencies
openai>=1.12.0
chromadb>=0.4.22
python-dotenv>=1.0.0
# RAG System Dependencies
PyPDF2>=3.0.1
pypdf>=4.0.1
pdfplumber>=0.10.0
# Web Search Dependencies
duckduckgo-search>=6.0.0
agents>=0.1.0
# Optional: Development Tools
pytest>=7.4.0
black>=23.0.0
flake8>=6.0.0Navigate to the personal-rag-system/ folder and run:
cd personal-rag-system
python rag_multi-docs.pySample Questions:
- "Tell me about my background"
- "What are my main technical skills?"
- "What is my education background?"
Navigate to the agent_web_search/ folder and run:
cd agent_web_search
python agent_web_search.pySample Queries:
- Latest developments in AI
- Current market trends
- Recent news about specific topics
Navigate to the hybrid-rag-system/ folder and run:
cd hybrid-rag-system
python hybrid_rag.pySample Queries:
- "Who is the CEO of OpenAI and what is their background?" (triggers web search)
- "Tell me about my background" (uses local documents)
- "What are my skills and current AI trends?" (combines both sources)
Create a .env file in the project root:
OPENAI_API_KEY=sk-your-openai-api-key-hereEach system can be customized independently:
- RAG System: Modify document processing, chunking, or response generation
- Web Agent: Adjust search parameters, result count, or search regions
- Setup: ~$0.001 (one-time document embedding)
- Per Query: ~$0.005 (embeddings + LLM response)
- Per Query: ~$0.01-0.02 (LLM processing of search results)
- Local Only Queries: Same as RAG system (~$0.005)
- Hybrid Queries: RAG + Web costs (~$0.015-0.025)
- Smart Routing: Minimizes unnecessary web searches
Monthly Estimate: $1-15 for regular personal use across all systems
- Use
gpt-4o-miniinstead ofgpt-4ofor 90% cost reduction - Implement response caching for frequent queries
- Adjust
max_tokensto control response length
✅ Modular Design: Each system is independent and can be developed separately
✅ Shared Dependencies: Common packages in single requirements.txt
✅ Easy Maintenance: Update or modify systems without affecting others
✅ Scalable: Easy to add new AI agents or tools
✅ Flexible Usage: Use individual systems or combine via hybrid approach
✅ Smart Routing: Hybrid system automatically chooses optimal sources
- Create new folder:
new-agent/in theagent_web_rag/directory - Add implementation file and README.md
- Update main README.md with new component info
- Add any new dependencies to shared requirements.txt
API Key Not Found
# Ensure .env file is in project root
echo "OPENAI_API_KEY=your_key_here" > .envModule Not Found
# Install all dependencies
pip install -r requirements.txtRAG System: Documents Not Found
# Create and populate documents folder
cd personal-rag-system
mkdir -p me
# Add summary.txt (and optional documents)Web Search: No Results
# Check internet connection and try again
# DuckDuckGo search may have rate limits- Personal Documents: Remain local, never uploaded to external services
- API Keys: Stored in local .env file, not committed to version control
- Web Searches: Performed through DuckDuckGo (privacy-focused)
- Data Processing: All processing happens locally or through OpenAI API
- Use persistent ChromaDB storage for faster startup
- Implement document caching
- Optimize chunk sizes for your content
- Adjust search result count based on needs
- Implement result caching for frequent queries
- Use specific search queries for better results
- Fork the repository
- Create feature branch:
git checkout -b feature-name - Make changes in appropriate module folder
- Test both systems independently
- Update relevant documentation
- Submit pull request
- Follow PEP 8 guidelines
- Add docstrings to new functions
- Include error handling
- Update README files for any new features
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing powerful AI models
- ChromaDB for excellent vector database
- DuckDuckGo for privacy-focused web search
- Python community for robust ecosystem
For questions or issues:
- Check the individual README files in each module
- Review troubleshooting section above
- Check OpenAI API documentation
- Create an issue in the repository
Built with ❤️ using cutting-edge AI technology for personal productivity# web_rag_search