The system can automatically process long text content through intelligent sentence breaking, chunking and seamless audio splicing technology, which is especially suitable for audiobook production scenarios. After users set the chunk size of 300-500 characters in the Web UI, the system will automatically complete the whole process of text segmentation, speech generation and final audio synthesis, outputting coherent and natural long-time speech files.
This answer comes from the articleKitten-TTS-Server: a self-deployable lightweight text-to-speech serviceThe

































