A Python-based desktop assistant that captures your screen, listens for your voice commands, uses OpenAI's Whisper for transcription and GPT-4o for understanding the command in the context of the screen content, and responds verbally using OpenAI's TTS.
- Captures desktop screenshots in real-time using a dedicated thread.
- Listens for voice input via microphone.
- Uses OpenAI Whisper for accurate speech-to-text transcription.
- Leverages LangChain to manage conversation history and integrate screen content with prompts.
- Utilizes OpenAI GPT-4o for multimodal understanding of the user's prompt and the current screenshot.
- Provides spoken responses using OpenAI Text-to-Speech (TTS).
- Displays the current screenshot in a window.
- Clean shutdown of resources (screenshot thread, audio listener, OpenCV windows).
- Python 3.7+
- An OpenAI API Key.
- Necessary system dependencies for
PyAudio(often PortAudio) andopencv-python.
All required Python packages are listed in requirements.txt.
-
Clone the Repository:
git clone <your-repo-url> cd <your-repo-directory>
-
Create a Virtual Environment (Recommended):
# For Windows PowerShell python -m venv .venv .\.venv\Scripts\Activate.ps1 # For Windows Command Prompt python -m venv .venv .venv\Scripts\activate.bat # For macOS/Linux python3 -m venv .venv source .venv/bin/activate
-
Install Dependencies: With the virtual environment activated:
pip install -r requirements.txt
-
Set up OpenAI API Key: Create a file named
.envin the root of your project directory (the same place asscreen_asis.pyandrequirements.txt). Add your OpenAI API key to this file:OPENAI_API_KEY='your-api-key-here'
Replace
'your-api-key-here'with your actual OpenAI API Key. Do not commit this file to GitHub. Add.envto your.gitignorefile. -
Install PortAudio (for PyAudio):
PyAudiorequires the PortAudio library to be installed on your system.- Windows: Pre-built wheels often include PortAudio, so
pip install PyAudiomight work directly. If not, you might need to install it separately or use a differentpyaudiowheel. - macOS (using Homebrew):
brew install portaudio - Linux (Debian/Ubuntu):
sudo apt-get install portaudio19-dev - Linux (Fedora):
sudo dnf install portaudio-devel
- Windows: Pre-built wheels often include PortAudio, so
With your virtual environment activated, run the main script:
python screen_asis.py📌 Data Scientist & AI Developer | 🎓 Master of AI Engineering
MIT License © Ahmed Zeyad Tareq
🌟 Support If you like this project, give it a ⭐ on GitHub and share Got ideas for improvements? Feel free to open a Pull Request or create an Issue. 🚀