Latency optimization scheme for ESP32S3:
hardware layer
- Processing Audio with the ESP-DSP Acceleration Library Built into the XIAO ESP32S3 Sense Development Board
- Increase the PSRAM configuration to 8MB by
cargo espflash flash --flash-size 8mb
Burning Firmware
software layer
- exist
vosk_server.py
set up in--threads=2
Enable multi-threaded parsing - Using Rust's
tokio
Asynchronous runtime processing of network requests - Turn off non-essential logging output (modification)
log_level = warn
)
Process Optimization
Using speech streaming recognition, when detectingwn9_hilexin
Immediately establishes API long connection after wakeup word, reducing cold start time by about 300ms
This answer comes from the articleAI-Chatbox: Speech-to-Text Intelligent Dialogue Project based on ESP32S3The