Current Position:fig. beginning " AI Answers

MOSS-TTSD supports up to 960 seconds of one-shot speech generation and zero-sample two-person speech cloning.

2025-08-19

475

MOSS-TTSD offers significant technical advantages in voice generation. It supports single-shot speech generation up to 960 seconds, a feature that makes it particularly suitable for podcasts or long-form content production. On the other hand, its zero-sample two-person voice cloning feature can accurately clone the target speaker's tone and apply it to dialog scenarios without additional training. Users only need to provide a 10-second target audio clip, and the model can generate dialog voices that match the timbre, effectively distinguishing between different speakers.

This answer comes from the articleMOSS-TTSD: An Open Source Bilingual Dialog Speech Generation ToolThe

May not be reproduced without permission:AI productivity tools " MOSS-TTSD supports up to 960 seconds of one-shot speech generation and zero-sample two-person speech cloning.

MOSS-TTSD supports up to 960 seconds of one-shot speech generation and zero-sample two-person speech cloning.

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

MOSS-TTSD supports up to 960 seconds of one-shot speech generation and zero-sample two-person speech cloning.

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool