How to achieve low latency response for speech synthesis in smart home scenarios?

2025-08-27

1.4 K

Scene Characteristics

Home control has a <300ms latency requirement for voice feedback, which is difficult to meet with conventional cloud-based solutions.

Localized Deployment
- Running a Lightweight TTS Engine with Docker Containers
- Preloaded voice clips of commonly used commands (about 50 basic commands)
- Enabling Edge Computing with the Raspberry Pi
caching strategy
- Create a pool of LRU voice caches (recommended to keep the last 100)
- Use template stitching for dynamic content such as temperature/time
- Realize voice fingerprint de-duplication storage
network optimization
- Configure QoS to guarantee priority transmission of voice packets
- Transmission of control commands using UDP protocol
- Setting up the local fallback server

Tested to achieve: 97ms response for room temperature commands and 420ms return for the first vocabulary request.