Domain adaptation fine-tuning requires a systematic approach:
- Data preparation::
Collect at least 50 hours of audio in the target area (e.g., medical), text to include standardized spelling of terms, recommended format:uttID /path/to/audio.wav|医生诊断:患者患有
- Parameter Configuration::
1. Inconfig.yaml
set up inadapt_dropout: 0.3
2. Adjustmentstransformer_encoder_layers: 12
Retention of basic capabilities - training technique::
Two-stage training is used:
1. Only the last 3 layers were fine-tuned in the first 5 rounds (freeze_layers: 0-9
)
2. The last 10 rounds of full-parameter training (lr: 0.0001
) - Validation Methods::
utilizationespnet2/bin/validate.py
Test term recognition F1 value, recommended threshold > 0.85
The solution improves the term recognition accuracy by 62% in legal document scenarios.
This answer comes from the articleOpusLM_7B_Anneal: an efficient unified model for speech recognition and synthesisThe