Overseas access: www.kdjingpai.com
Bookmark Us
Current Position:fig. beginning " AI Answers

CosyVoice's Fine-Grained Sentiment Control Supports 8 Types of Paralinguistic Markers

2025-08-23 657
Link directMobile View
qrcode

Engineering Innovations in Emotional Speech Synthesis

CosyVoice realizes real-time emotion control based on symbolic tags for the first time in the field of speech synthesis, and its Tokenizer module presets 8 types of paralinguistic tags, such as [laughter][cry][pause=200ms], and supports rhyme adjustment with 50ms-level accuracy. Multi-level conditional adversarial training is used in the technical scheme:

  • Underlying characteristics: Modeling Emotional Rhymes Using the Pitch-Contour Prediction Network
  • Medium level control: Cross-Language Emotion Migration via Prosody-Tokens
  • upper layer application: Open interfaces for semantic-level control such as [style=happy]

The empirical data shows that adding [laughter] tag can improve the pleasantness score of synthesized speech by 42%, and the pause marking error is less than ±10ms. this feature has been applied to game NPC dialogue system, which reduces the annotation cost by 90% compared with the traditional affective speech synthesis scheme.

Recommended

Can't find AI tools? Try here!

Just type in the keyword Accessibility Bing SearchYou can quickly find all the AI tools on this site.

Top

en_USEnglish