Introduction to Muyan-TTS
Muyan-TTS is an open source text-to-speech model designed for podcasting scenarios, built on the Llama-3.2-3B architecture, combined with SoVITS decoder technology. The model is pre-trained with more than 100,000 hours of podcast audio data to generate high-quality natural speech output.
core functionality
- Zero-sample speech synthesis: Generate podcast-style speech without additional training, with support for multiple tone imitations
- Personalized voice customization: Generate speaker-specific voices by fine-tuning a small amount (a few minutes) of single-voice data.
- Efficient Reasoning: generates ~0.33 seconds of audio per second on NVIDIA A100 GPUs, outperforming most open-source TTS models
- Complete development ecosystem: Provide training code, data processing pipeline and API deployment tools
The project is under the Apache 2.0 license, and the model weights and code are open on GitHub, Hugging Face, and ModelScope platforms.
This answer comes from the articleMuyan-TTS: Personalized Podcast Speech Training and SynthesisThe




























