The following steps are required to fine-tune the MOSS-TTSD:
- Preparing the dataset: Organized in JSON format, containing the text of the conversation and the corresponding audio, ensuring data quality (e.g., sample rate, clarity).
- Selecting the fine-tuning method: Supports full model fine-tuning or low resource requirement LoRA fine-tuning (required)
lora_config(Configuration file). - Running Scripts: Implementation
python finetune/finetune.py, specify the model path, data directory, output path and training configuration. - Verification results: Test the generation of fine-tuned models by iteratively optimizing the dataset or adjusting hyperparameters.
Note: Full-model fine-tuning requires high computational resources and GPUs are recommended; LoRA fine-tuning is more suitable for resource-limited scenarios.
This answer comes from the articleMOSS-TTSD: An Open Source Bilingual Dialog Speech Generation ToolThe































