Screenshot Voice Assistant

Project Title Placeholder

A Python-based desktop assistant that captures your screen, listens for your voice commands, uses OpenAI's Whisper for transcription and GPT-4o for understanding the command in the context of the screen content, and responds verbally using OpenAI's TTS.

Features

Captures desktop screenshots in real-time using a dedicated thread.
Listens for voice input via microphone.
Uses OpenAI Whisper for accurate speech-to-text transcription.
Leverages LangChain to manage conversation history and integrate screen content with prompts.
Utilizes OpenAI GPT-4o for multimodal understanding of the user's prompt and the current screenshot.
Provides spoken responses using OpenAI Text-to-Speech (TTS).
Displays the current screenshot in a window.
Clean shutdown of resources (screenshot thread, audio listener, OpenCV windows).

Requirements

Python 3.7+
An OpenAI API Key.
Necessary system dependencies for PyAudio (often PortAudio) and opencv-python.

All required Python packages are listed in requirements.txt.

Setup

Clone the Repository:

git clone <your-repo-url>
cd <your-repo-directory>

Create a Virtual Environment (Recommended):

# For Windows PowerShell
python -m venv .venv
.\.venv\Scripts\Activate.ps1

# For Windows Command Prompt
python -m venv .venv
.venv\Scripts\activate.bat

# For macOS/Linux
python3 -m venv .venv
source .venv/bin/activate

Install Dependencies: With the virtual environment activated:
```
pip install -r requirements.txt
```
Set up OpenAI API Key: Create a file named .env in the root of your project directory (the same place as screen_asis.py and requirements.txt). Add your OpenAI API key to this file:
```
OPENAI_API_KEY='your-api-key-here'
```
Replace 'your-api-key-here' with your actual OpenAI API Key. Do not commit this file to GitHub. Add .env to your .gitignore file.
Install PortAudio (for PyAudio): PyAudio requires the PortAudio library to be installed on your system.
- Windows: Pre-built wheels often include PortAudio, so pip install PyAudio might work directly. If not, you might need to install it separately or use a different pyaudio wheel.
- macOS (using Homebrew): brew install portaudio
- Linux (Debian/Ubuntu): sudo apt-get install portaudio19-dev
- Linux (Fedora): sudo dnf install portaudio-devel

How to Run

With your virtual environment activated, run the main script:

python screen_asis.py

👨‍💻 Developed By

Ahmed Zeyad Tareq

📌 Data Scientist & AI Developer | 🎓 Master of AI Engineering

📞 WhatsApp: +905533333587
GitHub | LinkedIn | Kaggle

📄 License

🌟 Support If you like this project, give it a ⭐ on GitHub and share Got ideas for improvements? Feel free to open a Pull Request or create an Issue. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
screenshot_assistant.py		screenshot_assistant.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Screenshot Voice Assistant

Project Title Placeholder

Features

Requirements

Setup

How to Run

👨‍💻 Developed By

Ahmed Zeyad Tareq

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Screenshot Voice Assistant

Project Title Placeholder

Features

Requirements

Setup

How to Run

👨‍💻 Developed By

Ahmed Zeyad Tareq

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages