Simple NodeJs server with only Gemini-Live backend for now. #1869
Replies: 6 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
-
|
Added voice support for qwen3-omni-flash-realtime (and maybe others?) via Alibaba DashScope API now. Still trying to figure out how tool calls would work with Alibaba? |
Beta Was this translation helpful? Give feedback.
-
|
That's the simplest MCP-capable xiaozhi-esp32 server I have ever seen. Thumbs UP! |
Beta Was this translation helpful? Give feedback.
-
|
wonderful! |
Beta Was this translation helpful? Give feedback.
-
|
Thank you so much for the heads up, that's very encouraging. Next I will try to get the tool calls working for some Alibaba model, I'll probably have a look at "fun audio chat" next, that seemed really interesting. I might also try to run this "locally" on a cloud VPS and see how this can be integrated. I don't have a gaming rig at home and google gave me way too much starting credit, and I haven't even looked at the free credits from Alibaba yet, these need to go somewhere too I guess. |
Beta Was this translation helpful? Give feedback.
-
|
Tool calls are working now with qwen3-omni-flash, but I did a few changes to the core logic, still keeping it simple and allowing for easier addition of new models with one "base.js" provider that handles the communication with Xiaozhi, and different files for each model like qwen_realtime.js, gemini.js, etc. Will test this a bit more thoroughly tomorrow before uploading, but it still seems to work nicely. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, thank you so much for this amazing project.
Shared the same in the Discord already, it is still very experimental, supports only gemini-live as the LLM backend and it has a simplified approval mechanism (admin needs to approve all devices manually in the dashboard, approved mcp devices need to be exposed to the Xiaozhi devices manually in an extra step by marking the respective checkboxes in the device settings). Also no language settings because gemini-live just speaks like ~40 languages out of the box. 🤯
Might also be a token-saving way to "teach" a coding agent how to implement your own solutions. I have only tried this on a real world small Linux VPS behind Nginx so far, because that's how I want to use it, but I think it should run just as well on a local machine, will give this a try as well these days.
https://github.com/5ch4um1/xiaozhi-server-nodejs (You'd need your own Gemini API key to run it).
And here a short video showcasing the MCP and language capabilities:
https://www.youtube.com/shorts/OddE0rnGxaY
Beta Was this translation helpful? Give feedback.
All reactions