Improving the accuracy of noise environment identification requires a phased approach:
- pretreatment stage::
1. Use of built-inSpeechEnhancement
Module:enhanced = speech_enh(noisy_audio)["wav"]
2. VAD algorithm in conjunction with WebRTC to excise silent segments - Recognition parameter adjustment::
modificationsdecode_default.yaml
Medium:
1.beam_size: 20
(Increased search width)
2.penalty: 0.6
(Reduction of duplication penalties) - Post-processing correction::
The integrated language model (e.g., KenLM) is secondarily amended to install the command:pip install kenlm
The method was measured to reduce WER from 351 TP3T to 121 TP3T in an 80 dB white noise environment.
This answer comes from the articleOpusLM_7B_Anneal: an efficient unified model for speech recognition and synthesisThe