Current Position:fig. beginning " AI Answers

CosyVoice's Fine-Grained Sentiment Control Supports 8 Types of Paralinguistic Markers

2025-08-23

782

Engineering Innovations in Emotional Speech Synthesis

CosyVoice realizes real-time emotion control based on symbolic tags for the first time in the field of speech synthesis, and its Tokenizer module presets 8 types of paralinguistic tags, such as [laughter][cry][pause=200ms], and supports rhyme adjustment with 50ms-level accuracy. Multi-level conditional adversarial training is used in the technical scheme:

Underlying characteristics: Modeling Emotional Rhymes Using the Pitch-Contour Prediction Network
Medium level control: Cross-Language Emotion Migration via Prosody-Tokens
upper layer application: Open interfaces for semantic-level control such as [style=happy]

The empirical data shows that adding [laughter] tag can improve the pleasantness score of synthesized speech by 42%, and the pause marking error is less than ±10ms. this feature has been applied to game NPC dialogue system, which reduces the annotation cost by 90% compared with the traditional affective speech synthesis scheme.

This answer comes from the articleCosyVoice: Ali open source multilingual cloning and generation toolsThe

May not be reproduced without permission:AI productivity tools " CosyVoice's Fine-Grained Sentiment Control Supports 8 Types of Paralinguistic Markers

CosyVoice's Fine-Grained Sentiment Control Supports 8 Types of Paralinguistic Markers

Engineering Innovations in Emotional Speech Synthesis

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

CosyVoice's Fine-Grained Sentiment Control Supports 8 Types of Paralinguistic Markers

Engineering Innovations in Emotional Speech Synthesis

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool