The core features of AI-Chatbox mainly include:
- Wake to Voice and Command Recognition: Supports recording triggered by the wake-up word "hi, Loxin" and the command word "I have a question".
- speech-to-text: Convert recorded WAV audio to text using the Vosk tool, which supports Chinese recognition.
- Large Model Interaction: Send text questions and get smart answers via the DeepSeek API.
- Logging: Real-time recording of device status, recognition results and LLM answers for easy debugging.
- cross-device access: Build a REST service via Flask to allow other devices on the LAN to call the speech-to-text function.
- Embedded Optimization: Developed in Rust, optimized for ESP32S3 hardware, configured with 512 max generated tokens to balance performance and resources.
This answer comes from the articleAI-Chatbox: Speech-to-Text Intelligent Dialogue Project based on ESP32S3The