Current Position:fig. beginning " AI Answers

Speech transfer feature significantly improves the naturalness of translated speech

2025-09-10

1.9 K

Hibiki's speech transfer technology captures the prosodic features of source speech through deep learning models and intelligently adapts them to the target language output. The system employs the Classifier Free Guidance (CFG) mechanism, which allows users to adjust the speech similarity via the -cfg-coef parameter (recommended value 3). The technical implementation contains three key innovations:

Attention-based acoustic feature migration network
Confrontation training to ensure naturalness of tone
Rhyme decoupling technique separates linguistic and phonological features

Compared with the mechanized synthesized speech of traditional translation systems, Hibiki's output speech maintains the rhythm, accent and other suprasegmental features of the source speech, and the MOS naturalness score is improved by 37%. This feature is especially suitable for movie and TV dubbing, voice socialization and other scenarios that are sensitive to voice quality.

This answer comes from the articleHibiki: a real-time speech translation model, streaming translation that preserves the characteristics of the original voiceThe

May not be reproduced without permission:AI productivity tools " Speech transfer feature significantly improves the naturalness of translated speech