Audibit utilizes a dual-engine parallel strategy to guarantee audio quality:
- OpenAI engine: Provides a smooth voice that is close to a real person's voice, with support for tone control and emotional expression.
- Lemonfox engine: Focus on accurate pronunciation of technical terms, especially for tech content
In actual testing, after conversion of a 3,000 word technical article:
- Average generation time is about 90 seconds (depending on article length)
- Audio sampling rate maintained at 44.1kHz CD level
- Background noise control below -60dB
For language support, the current version automatically recognizes the following languages:
- English (American/British pronunciation optional)
- simplified Chinese
- dictionary
- Spanish language
It is worth noting that the system automatically determines the language type based on article metadata, and users can also manually set the preferred speech scheme in config/tts.js. Future iterations plan to achieve more accurate automatic matching through language detection models.
This answer comes from the articleAudibit: turning popular tech articles into ready-to-listen audio podcastsThe