Muyan-TTS is an open source text-to-speech (TTS) model designed for podcasting scenarios. It is pre-trained with over 100,000 hours of podcast audio data and supports zero-sample speech synthesis to generate high-quality natural speech. The model is built on Llama-3.2-3B, and combined with the SoVITS decoder, it provides high...
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...
Audibit is an open source project, the core function is to Hacker News, TechCrunch and other popular technology articles automatically turned into audio podcasts, so that users in the commute, fitness, or busy when listening to information through the Web or mobile. The project uses Next.js and React to develop the front-end , combined with ...
Dia is an open source text-to-speech (TTS) model developed by Nari Labs that focuses on generating hyper-realistic conversational audio. It transforms text scripts into realistic multi-character dialog in a single process, supports emotion and intonation control, and even generates non-verbal expressions such as laughter.At the heart of Dia ...
Orpheus-TTS is an open source text-to-speech (TTS) system developed on the Llama-3b architecture with the goal of generating audio close to natural human speech. It is launched by the Canopy AI team and supports multiple languages such as English, Spanish, French, German, Italian, Portuguese and Chinese...
ElevenLabs MCP is an official ElevenLabs open source project hosted on GitHub. It is a server tool based on the Model Control Protocol (Model Context Protocol, MCP), designed to connect AI models and ElevenLab...
Vapi is a voice AI platform for developers. It enables users to build, test and deploy voice AI assistants in minutes, solving the traditional problem of time-consuming and difficult to scale voice application development.Vapi provides complete tools and infrastructure to support real-time conversations, telephony integrations and multi-platform deployments.Vapi is a platform for developers to build, test and deploy voice AI assistants in minutes, solving the traditional problem of time-consuming and difficult to scale voice applications....
MiniMax Audio is an AI speech generation tool from MiniMax, with the core feature of quickly converting text into highly similar natural speech. It is based on the Speech-02 model, with a speech synthesis similarity of up to 99%, studio-grade sound quality, and support for more than 30 languages and a wide range of mouth...
Text2Voice is an open source tool that provides text-to-speech functionality based on a silicon-based mobility API, best characterized by a clean graphical user interface (GUI). It was created by developer Sheldon Lee on GitHub to allow users to easily turn text into speech through an interface. The project uses Py...
Open-VoiceCanvas is an open source speech synthesis platform developed by the ItusiAI team. It supports more than 50 languages, and can convert text to natural speech, as well as clone personalized voices by uploading audio. The project integrates OpenAI TTS, AWS Polly and MiniM...
Paper to Podcast is an open source tool that specializes in turning academic research papers into lively and entertaining podcasts. It makes complex academic content easy to understand by using artificial intelligence technology to turn a PDF-formatted paper into a conversation between three characters - the host, the learner, and the expert. This project was developed by...
MegaTTS3 is an open source speech synthesis tool developed by ByteDance in cooperation with Zhejiang University, focusing on generating high-quality Chinese and English speech. Its core model is only 0.45B parameters , lightweight and efficient , support for mixed Chinese and English speech generation and speech cloning . The project is hosted on GitHub, providing code and...
Podcastle is an AI-based online platform that specializes in helping users quickly create and edit high-quality podcasts. It integrates recording, editing, and publishing features, and users can do it all through a browser without specialized equipment or complex software. The platform utilizes AI technology to provide noise abatement...
IndexTTS is an open source text-to-speech (TTS) tool hosted on GitHub and developed by the index-tts team. It is based on XTTS and Tortoise technologies, and provides efficient and high-quality speech synthesis through improved module design.IndexTTS uses tens of thousands of hours...
csm-mlx is based on the MLX framework developed by Apple, optimized for the CSM (Conversation Speech Model) voice conversation model specifically for Apple Silicon. This project allows users to run efficient speech generation on Apple devices in a simple way and...
Autiobooks is an open source tool designed to help users quickly convert eBooks in .epub format to audiobooks in .m4b format. It uses high quality speech synthesis technology provided by Kokoro to produce natural and smooth audio. The tool was developed by David Nesbitt and follows the MIT ...
PlayHT is an efficient online platform focusing on AI speech generation, helping users quickly convert text into natural, realistic speech. It provides more than 600 AI voices, supports more than 60 languages and diverse accents, and is suitable for a variety of scenarios such as podcast production, educational content, marketing and promotion. Users only need to input...
MLX-Audio is an open source tool developed on Apple's MLX framework, focusing on text-to-speech (TTS) and speech-to-speech (STS) capabilities. It leverages the powerful computing capabilities of Apple Silicon (e.g. M-series chips) to provide efficient and fast speech synthesis solutions. Whether ...
Spark-TTS is an open source Text-to-Speech (TTS) tool developed by the SparkAudio team, hosted on GitHub, designed to help users efficiently convert text into natural and smooth speech. It is based on advanced deep learning technology and supports multiple languages and voice styles...