CosyVoice is an open source multilingual speech generation model that focuses on high-quality text-to-speech (TTS) technology. It supports speech synthesis in multiple languages, providing features such as zero-sample speech generation, cross-language speech cloning, and fine-grained sentiment control.Cos- yVoice 2.0 compares to the previous version, significantly...
Qwen-TTS is a text-to-speech (TTS) tool developed by the Alibaba Cloud Qwen team and provided through the Qwen API. It is trained on a large-scale speech dataset, with a natural and expressive voice output that automatically adjusts intonation, speech rate, and emotion.Qwen-TTS supports Mandarin, English...
Kyutai Labs' delayed-streams-modeling project is an open source speech-to-text conversion framework based on Delayed Stream Modeling (DSM) technology at its core. It supports real-time speech-to-text (STT) and text-to-speech (TTS) functions , suitable for building efficient voice interaction applications . The project provides p...
AIVocal is a free AI audio processing platform that provides Text-to-Speech (TTS), Speech-to-Text (STT), Human Voice Separation and Podcast Generation. Users can use it without registration, and it supports 24 languages and more than 900 natural tones, which is suitable for producing podcasts, audiobooks, video dubbing and so on....
SuperMaker AI is a free online authoring platform that helps users quickly generate high-quality video, music, image and voice content. Users can try out the core features without logging in, and it's easy to use for individual creators and small teams. The platform uses artificial intelligence technology to create text, images or creative...
Muyan-TTS is an open source text-to-speech (TTS) model designed for podcasting scenarios. It is pre-trained with over 100,000 hours of podcast audio data and supports zero-sample speech synthesis to generate high-quality natural speech. The model is built on Llama-3.2-3B, and combined with the SoVITS decoder, it provides high...
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...
Audibit is an open source project, the core function is to Hacker News, TechCrunch and other popular technology articles automatically turned into audio podcasts, so that users in the commute, fitness, or busy when listening to information through the Web or mobile. The project uses Next.js and React to develop the front-end , combined with ...
Dia is an open source text-to-speech (TTS) model developed by Nari Labs that focuses on generating hyper-realistic conversational audio. It transforms text scripts into realistic multi-character dialog in a single process, supports emotion and intonation control, and even generates non-verbal expressions such as laughter.At the heart of Dia ...
Orpheus-TTS is an open source text-to-speech (TTS) system developed on the Llama-3b architecture with the goal of generating audio close to natural human speech. It is launched by the Canopy AI team and supports multiple languages such as English, Spanish, French, German, Italian, Portuguese and Chinese...
ElevenLabs MCP is an official ElevenLabs open source project hosted on GitHub. It is a server tool based on the Model Control Protocol (Model Context Protocol, MCP), designed to connect AI models and ElevenLab...
Vapi is a voice AI platform for developers. It enables users to build, test and deploy voice AI assistants in minutes, solving the traditional problem of time-consuming and difficult to scale voice application development.Vapi provides complete tools and infrastructure to support real-time conversations, telephony integrations and multi-platform deployments.Vapi is a platform for developers to build, test and deploy voice AI assistants in minutes, solving the traditional problem of time-consuming and difficult to scale voice applications....
MiniMax Audio is an AI speech generation tool from MiniMax, with the core feature of quickly converting text into highly similar natural speech. It is based on the Speech-02 model, with a speech synthesis similarity of up to 99%, studio-grade sound quality, and support for more than 30 languages and a wide range of mouth...
Text2Voice is an open source tool that provides text-to-speech functionality based on a silicon-based mobility API, best characterized by a clean graphical user interface (GUI). It was created by developer Sheldon Lee on GitHub to allow users to easily turn text into speech through an interface. The project uses Py...
Open-VoiceCanvas is an open source speech synthesis platform developed by the ItusiAI team. It supports more than 50 languages, and can convert text to natural speech, as well as clone personalized voices by uploading audio. The project integrates OpenAI TTS, AWS Polly and MiniM...
Paper to Podcast is an open source tool that specializes in turning academic research papers into lively and entertaining podcasts. It makes complex academic content easy to understand by using artificial intelligence technology to turn a PDF-formatted paper into a conversation between three characters - the host, the learner, and the expert. This project was developed by...
MegaTTS3 is an open source speech synthesis tool developed by ByteDance in cooperation with Zhejiang University, focusing on generating high-quality Chinese and English speech. Its core model is only 0.45B parameters , lightweight and efficient , support for mixed Chinese and English speech generation and speech cloning . The project is hosted on GitHub, providing code and...
Podcastle is an AI-based online platform that specializes in helping users quickly create and edit high-quality podcasts. It integrates recording, editing, and publishing features, and users can do it all through a browser without specialized equipment or complex software. The platform utilizes AI technology to provide noise abatement...
IndexTTS is an open source text-to-speech (TTS) tool hosted on GitHub and developed by the index-tts team. It is based on XTTS and Tortoise technologies, and provides efficient and high-quality speech synthesis through improved module design.IndexTTS uses tens of thousands of hours...