AI Web Scraper

An AI-powered web scraper that extracts website content using Selenium and processes it with AI models like Gemini to generate insights based on user prompts. The application features an interactive Streamlit frontend for real-time content analysis.

Features

Extracts website content using Selenium
Cleans and structures scraped data with BeautifulSoup
Uses Gemini API for AI-driven insights and analysis
Accepts user-provided URL and prompt for customized scraping
Displays results in a Streamlit-based interactive UI

Tech Stack

Backend: Python, Selenium, BeautifulSoup
AI Processing: Gemini API, LangChain, OpenAI
Frontend: Streamlit, HTML, CSS, JavaScript

Installation

Clone the Repository

git clone https://github.com/yourusername/ai-web-scraper.git
cd ai-web-scraper

Create and Activate Virtual Environment

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies
```
pip install -r requirements.txt
```
Set Up API Key
- Create a .env file in the project directory and add your Gemini API key:
```
GEMINI_API_KEY=your_api_key_here
```
Run the Application
```
streamlit run app.py
```

Usage

Open the Streamlit web UI.
Enter a website URL and a prompt describing the data you want to extract.
Click the "Scrape" button to retrieve and process the data.
View the extracted content and AI-generated insights.

Example Prompt

Extract all product names and prices from this e-commerce website.

Future Enhancements

Support for multiple AI models (e.g., OpenAI, Groq, Ollama)
Improved data visualization in Streamlit
Integration with databases for storing scraped content

License

This project is licensed under the MIT License.

Contributing

Pull requests are welcome! Feel free to submit issues or feature requests.

Contact

For any inquiries, reach out via [your email] or open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Frontend		Frontend
frontend_backup		frontend_backup
routers		routers
.gitignore		.gitignore
Readme.Md		Readme.Md
ai_visibility.py		ai_visibility.py
apps.py		apps.py
find_competitor.py		find_competitor.py
key.txt		key.txt
middleware.py		middleware.py
models.py		models.py
parse.py		parse.py
requierments.txt		requierments.txt
scrape.py		scrape.py
streamli_main.py		streamli_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Web Scraper

Features

Tech Stack

Installation

Usage

Example Prompt

Future Enhancements

License

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Web Scraper

Features

Tech Stack

Installation

Usage

Example Prompt

Future Enhancements

License

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages