Skip to content

TEJAS-SAI-PRASHAD-K/PDFInsight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“„ PDFInsight

An AI-powered local-first system for extracting insights from PDF documents using Retrieval-Augmented Generation (RAG). This app loads documents, processes them into chunks, generates embeddings using sentence-transformers, and performs semantic search via a vector store (e.g., FAISS). A FastAPI backend can be used for interactive queries and integration.


🧠 Features

  • πŸ“‚ Load and parse PDF or text-based documents
  • 🧼 Preprocess and chunk documents for optimal embedding
  • πŸ”Ž Semantic search using vector similarity (e.g., FAISS)
  • 🧠 Sentence-transformer-based embedding generation
  • πŸ”„ Retrieval-Augmented Generation engine (RAG)
  • πŸš€ FastAPI backend for RESTful document insight queries
  • πŸ› οΈ Modular, clean, and extensible codebase

πŸ—‚ Project Structure

pdfinsight/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py                   # Application entry point (e.g., FastAPI setup)
β”‚   β”‚
β”‚   β”œβ”€β”€ loaders/
β”‚   β”‚   └── document_loader.py    # Load PDF/text files into memory
β”‚   β”‚
β”‚   β”œβ”€β”€ processors/
β”‚   β”‚   └── document_processor.py # Clean and split text into chunks
β”‚   β”‚
β”‚   β”œβ”€β”€ embeddings/
β”‚   β”‚   └── embedding_service.py  # Generate vector embeddings for text chunks
β”‚   β”‚
β”‚   β”œβ”€β”€ vectorstores/
β”‚   β”‚   └── vector_store.py       # Store and query embedding vectors using FAISS or similar
β”‚   β”‚
β”‚   └── engines/
β”‚       └── rag_engine.py         # Perform RAG (retrieve + generate)
β”‚
β”œβ”€β”€ requirements.txt              # Python dependencies
β”œβ”€β”€ README.md                     # Project documentation (this file)
└── .gitignore                    # Git ignored files

πŸ”§ Tech Stack

  • Python 3.9+
  • PdfPlumber (for PDF parsing)
  • langchain
  • sentence-transformers
  • Chroma (for vector search)
  • FastAPI + Uvicorn (for API layer)

πŸͺͺ License

This project is licensed under the MIT License.


🀝 Contributing

See CONTRIBUTING.md for guidelines.


πŸ“¬ Contact

For questions, suggestions, or feedback, open an issue or contact @TEJAS-SAI-PRASHAD-K on GitHub.


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages