Current Position:fig. beginning " AI Answers

How to use xiaozhi-esp32-server to realize voice conversation with ESP32 devices?

2025-08-29

2.6 K

The following key steps need to be completed to realize the voice dialog function:

environmental preparation: Install Python 3.10 and Conda, configure hardware environment with 4 cores CPU/8GB RAM (API mode can be reduced to 2 cores/2GB)
Project deployment: After downloading the source code from GitHub, create a dedicated virtual environment through Conda and install libopus, ffmpeg, and other dependencies.
Model Configuration: Download the FunASR speech recognition model to be placed in the models directory, making sure to include the SenseVoiceSmall/model.pt file
dialog settings: Adjustments in config.yamlmin_silence_duration_msParameter (1000ms recommended) controls dialog response sensitivity
interaction method::
- Voice wake-up: activate the device with a preset wake-up word
- Manual Trigger: Use physical buttons to start a dialog
- Real-time interruptions: support for interrupting the current response in the middle of a speech.

During the actual test, you can verify the interaction link by saying "Hello" and other test statements, and the system supports Chinese/English/Japanese/Korean language recognition by default. If there is a delay in response, you can use AliLLM+DoubaoTTS combination to improve performance.

This answer comes from the articlexiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end servicesThe

May not be reproduced without permission:AI productivity tools " How to use xiaozhi-esp32-server to realize voice conversation with ESP32 devices?