Dolphin is an open source speech recognition model jointly developed by DataoceanAI and Tsinghua University, focusing on versatile processing of Asian languages. Its core features include:
- Wide language coverageSupport for 40 Asian languages and 22 Chinese dialects, covering East Asia, South Asia, Southeast Asia and the Middle East.
- multitasking: Integrated speech-to-text (ASR), voice activity detection (VAD), audio segmentation, and language identification (LID) capabilities
- Strong data base: trained on over 210,000 hours of proprietary and publicly available audio data
- Architecture Innovation: A hybrid CTC-Attention architecture is used, with the encoder using E-Branchformer and the decoder using Transformer
- Dual-layer marking system: Precise differentiation of dialect variants by (e.g. )
The project has been open-sourced in GitHub, providing two model specifications, base (140M parameters) and small (372M parameters), taking into account the processing speed and recognition accuracy requirements.
This answer comes from the articleDolphin: Asian Language Recognition and Speech-to-Text Modeling for Asian LanguagesThe




























