Published development plans
According to project documentation and developer interviews, the focus will be on the next six months:
- Language Extension: French/Japanese support completed by Q3 2024, Chinese Mandarin and Korean live in Q4
- emotion engine: Add control of 8 emotional parameters such as anger, sadness, etc. (beta version has been tested internally)
- hardware acceleration: Proprietary optimizations for NVIDIA Tensor Core and Intel OpenVINO
Community Driven Functions
Feature proposals being discussed in the open source community include:
- Dialect Support: Cantonese, Kansai Japanese, and other regional variants
- voiceprint cloning: Allow users to upload sample speech for feature extraction
- cloud collaboration: Hybrid inference schemes for local models and large models in the cloud
ecological construction
There are plans to create a Voice Style Marketplace (Voice Marketplace) to allow developers:
- Share Custom Trained Sound Models
- Commercialization of the sale of professional voice-over packages
- Extend sound processing capabilities with plug-in system
This answer comes from the articleKokoro-ONNX: Efficient Text-to-Speech Tool with Multi-Language and Multi-Voice SupportThe





























