pip install fastapi uvicorn websockets sounddevice numpy scipy
pip install pyautogui pytesseract elevenlabs openai httpx pillow
ollama --version # Check Ollama is installedTesseract OCR (for read_screen tool):
- Download: https://github.com/UB-Mannheim/tesseract/wiki
- Install to default path:
C:\Program Files\Tesseract-OCR\tesseract.exe - Vision auto-detects common Windows install paths; if you use a custom location, add it to PATH or set
TESSERACT_CMD
# Install: https://ollama.ai
ollama pull qwen2.5:0.5b # lightweight local model
ollama pull llama3.2:3b # stronger local general model
ollama serve # starts on port 11434If your local Ollama store lives outside the default location, set OLLAMA_MODELS before starting the server.
Example used in this project:
set OLLAMA_MODELS=F:\modelsCreate .env or set in your shell / launch bat:
ELEVENLABS_API_KEY=sk_5f2c93b54654c98... # Required for STT + TTS
OPENAI_API_KEY=sk-... # Optional: OpenAI provider
GITHUB_TOKEN=ghp_... # Optional: GitHub Copilot provider
ANTHROPIC_API_KEY=sk-ant-... # Optional: Anthropic provider
DEEPSEEK_API_KEY=sk-... # Optional: DeepSeek provider
GROQ_API_KEY=gsk_... # Optional: Groq provider + fast Whisper fallback
MISTRAL_API_KEY=... # Optional: Mistral provider
GEMINI_API_KEY=... # Optional: Google Gemini provider
XAI_API_KEY=xai-... # Optional: xAI provider
cd C:\project\vision
set ELEVENLABS_API_KEY=sk_...
set ELEVENLABS_WIDGET_AGENT_ID=agent_0701knwqnqy9e1aa3a3drdh30cva
python live_chat_app.pyOr double-click Live Chat on desktop.
Browser opens at http://localhost:8765.
The UI includes an ElevenLabs widget in the AGENT panel for the fastest browser voice path.
Requirements:
ELEVENLABS_API_KEYmust be valid for syncing tools/prompt to ElevenLabsELEVENLABS_WIDGET_AGENT_IDshould point at your public widget agent- That agent must have authentication disabled in ElevenLabs widget settings
Sync the live Vision tool set and operator prompt into the widget agent:
python setup_el_agent_tools.pyAfter launch:
- Open
http://localhost:8765 - Open the AGENT panel
- Click SHOW / HIDE WIDGET
- Start talking to the ElevenLabs widget
The page injects a runtime operator prompt and registers Vision client tools so widget-triggered actions can execute through the local Vision backend.
OpenClaw provides a unified control plane for agents, channels (Slack, Teams, WhatsApp, etc.), and multi-agent orchestration.
Windows (PowerShell)
iwr -useb https://openclaw.ai/install.ps1 | iexmacOS / Linux / WSL2
curl -fsSL https://openclaw.ai/install.sh | bashVerify:
node --version # Should be 24+ or 22.14+
openclaw --version # Should be 2026.4.9 or lateropenclaw onboard --install-daemonThe wizard walks you through:
- Model provider selection (Anthropic, OpenAI, Google, etc.)
- API key configuration
- Gateway settings (port 18789 is default)
- Daemon/service install (Windows: Scheduled Task + Startup folder)
openclaw onboard --non-interactive --accept-risk --install-daemonNote: Requires an API key env var to be pre-set (e.g., OPENAI_API_KEY=sk-...).
openclaw gateway statusYou should see:
Service: Scheduled Task (registered)
Listening: 127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
RPC probe: ok
openclaw dashboardThis opens the Control UI where you can:
- Chat with the configured agent
- Set up additional channels (Slack, Telegram, etc.)
- Configure tools and plugins
With the OpenClaw gateway running in the background, start Vision normally:
cd C:\project\vision
python live_chat_app.pyVision will now:
- Integrate with OpenClaw's tool ecosystem
- Participate in multi-agent workflows
- Use OpenClaw's authentication and channel routing (if configured)
- Microphone is set as default recording device
- Speakers/headphones connected (preferably headphones to avoid echo)
- Ollama is running (
ollama serve) - ELEVENLABS_API_KEY is set
- Browser opens at http://localhost:8765
- Green "connected" dot appears in top-right
- Orb turns blue ("Listening")
- Speak — orb should turn red ("Recording")
- Wait for purple ("Speaking") — AI responds
No audio from AI:
- Check ELEVENLABS_API_KEY is correct
- Check internet connection (ElevenLabs is cloud-based)
- Try slow internet mode: set TTS_MODEL = "eleven_flash_v2_5"
- If you switch TTS to
LOCAL, Vision will prefer Microsoft Ava when it is installed on Windows
ElevenLabs widget loads but tools do not fire:
- Re-run
python setup_el_agent_tools.py - Verify
ELEVENLABS_API_KEYis valid (python test_keys.py) - Verify the widget agent is public and auth is disabled
- Verify
ELEVENLABS_WIDGET_AGENT_IDpoints at the intended ElevenLabs agent
AI doesn't hear me:
- Lower RMS_THRESH in live_chat_app.py (try 150 or 100)
- Check mic is selected as default Windows recording device
- Run:
python -c "import sounddevice; print(sounddevice.query_devices())"
OCR not working (operator mode):
- Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Add to PATH
Ollama models not showing:
- Ensure Ollama is running:
ollama serve - Check:
curl http://localhost:11434/api/tags - Click ⟳ Refresh in model picker
Gateway won't start:
openclaw doctor # Full diagnostics
openclaw gateway status --json # JSON output for scriptingService installation issues (native Windows):
- On native Windows, if Scheduled Task creation is blocked, OpenClaw falls back to a Startup-folder login item
- Check:
C:\Users\{username}\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\OpenClaw Gateway.cmd
WSL2 + Windows combo issues:
- For best stability, install and run OpenClaw inside WSL2
- See:
https://docs.openclaw.ai/platforms/windowsfor WSL2 setup
Use the /openclaw-getting-started Copilot skill for guided setup.