A three-step progression program for optimizing oral synchronization
Digital demographic synchronization relies on speech signature analysis techniques that can enhance the matching in the following ways:
- Text Preprocessing
- Avoid long consecutive statements (no more than 15 words in a single sentence is recommended)
- Insert a 0.3-0.5 second pause after a punctuation mark (using the "insert pause" function)
- Splitting complex terminology into phrases (e.g. "ribonucleic acid" to "ribonucleic acid")
- parameter tuning
- Select the "Standard News Anchor" type of voice (this type of library has the most complete lip shape data).
- Speech rate maintained at 180-220 words/minute (workstation adjustable in real time)
- Enable "Accurate Mouth Mode" (requires an increase in rendering time of 30%)
- post-processing amendment
- Fine-tune the keyframe mouth shape using the "frame-by-frame calibration" function
- Important words can be replaced with synonymous simple words (e.g., "frail" with "thin").
- Be sure to do a 5-second demo of the clip before finalizing the output.
Note: Dialect and foreign language dubbing need to select the corresponding language digital human model, the Mandarin model can't be adapted to other languages.
This answer comes from the articleCyberSmart: Converting Text to Speech and Digital Human VideoThe