The following key steps need to be completed to realize the voice dialog function:
- environmental preparation: Install Python 3.10 and Conda, configure hardware environment with 4 cores CPU/8GB RAM (API mode can be reduced to 2 cores/2GB)
- Project deployment: After downloading the source code from GitHub, create a dedicated virtual environment through Conda and install libopus, ffmpeg, and other dependencies.
- Model Configuration: Download the FunASR speech recognition model to be placed in the models directory, making sure to include the SenseVoiceSmall/model.pt file
- dialog settings: Adjustments in config.yaml
min_silence_duration_msParameter (1000ms recommended) controls dialog response sensitivity - interaction method::
- Voice wake-up: activate the device with a preset wake-up word
- Manual Trigger: Use physical buttons to start a dialog
- Real-time interruptions: support for interrupting the current response in the middle of a speech.
During the actual test, you can verify the interaction link by saying "Hello" and other test statements, and the system supports Chinese/English/Japanese/Korean language recognition by default. If there is a delay in response, you can use AliLLM+DoubaoTTS combination to improve performance.
This answer comes from the articlexiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end servicesThe































