VibeVoice-1.5B is a cutting-edge open-source Text-to-Speech (TTS) model released by Microsoft Research. It is specifically designed for generating expressive, long-form, multi-character dialog audio, such as podcasts or audiobooks. The core innovation of VibeVoice is its use of a 7...
Kitten-TTS-Server is an open source project that provides a feature-enhanced server for the lightweight KittenTTS model . Users can use this project to build their own text-to-speech (TTS) service . The core advantage of this project is that it is based on the original model , adding a ...
KittenTTS is an open source text-to-speech (TTS) model focused on lightweight and efficiency. It takes up less than 25MB of storage, has about 15 million parameters, and runs on low-end devices without GPU support.Developed by the KittenML team, KittenTTS offers multiple...
OpusLM_7B_Anneal is an open source speech processing model developed by the ESPnet team and hosted on the Hugging Face platform. It focuses on a variety of tasks such as speech recognition, text-to-speech, speech translation and speech enhancement, and is suitable for researchers and developers to experiment and apply in the field of speech processing. The model .....
MOSS-TTSD is an open source dialog speech generation model that supports bilingual Chinese and English. It can convert two-person dialog text into natural and expressive speech, suitable for AI podcast production, language research and other scenarios. The model is based on low bit rate coding technology and supports zero-sample two-person speech cloning and...
FineShare is a platform focused on AI audio and video technology, offering a variety of tools to help users create high-quality voice, music and video content. The site's core products include FineVoice, Singify, and FineCam for speech generation and conversion, AI music creation, and virtual camera...
Xunfei Zhizuo is a platform developed by Xunfei to provide artificial intelligence content creation services. Its core function is to convert user-entered text into speech, a process often referred to as "AI dubbing" or "speech synthesis". Users can choose from a variety of preset virtual voices (i.e. "anchors")...
ListenHub is a platform that utilizes artificial intelligence technology to quickly turn web pages, documents or user input into podcasts. It supports Chinese and English speech synthesis, and users can generate natural and smooth podcast audio by simply uploading a file, typing a topic or pasting a link. The platform is easy to operate and suitable for mobile use...
Higgs Audio is an open source text-to-speech (TTS) project developed by Boson AI focused on generating high-quality, emotionally rich speech and multi-character dialog. The project is based on over 10 million hours of audio data training and supports zero-sample speech cloning, natural dialog generation and multilingual speech output....
Parrot TTS is a Chrome extension designed to convert web text into natural speech. It uses advanced AI technology to provide a near-human voice experience, solving the problem of traditional text-to-speech tools sounding mechanical. Users can convert articles, news or research materials in one click...
AIdeaFlow Podcast is an AI-based podcast generation platform that allows users to quickly transform text content into high-quality podcast audio. It supports multiple languages and over 120 unique voices for students, professionals and content creators. Users simply enter text or upload a script,...
CosyVoice is an open source multilingual speech generation model that focuses on high-quality text-to-speech (TTS) technology. It supports speech synthesis in multiple languages, providing features such as zero-sample speech generation, cross-language speech cloning, and fine-grained sentiment control.Cos- yVoice 2.0 compares to the previous version, significantly...
Qwen-TTS is a text-to-speech (TTS) tool developed by the Alibaba Cloud Qwen team and provided through the Qwen API. It is trained on a large-scale speech dataset, with a natural and expressive voice output that automatically adjusts intonation, speech rate, and emotion.Qwen-TTS supports Mandarin, English...
Kyutai Labs' delayed-streams-modeling project is an open source speech-to-text conversion framework based on Delayed Stream Modeling (DSM) technology at its core. It supports real-time speech-to-text (STT) and text-to-speech (TTS) functions , suitable for building efficient voice interaction applications . The project provides p...
AIVocal is a free AI audio processing platform that provides Text-to-Speech (TTS), Speech-to-Text (STT), Human Voice Separation and Podcast Generation. Users can use it without registration, and it supports 24 languages and more than 900 natural tones, which is suitable for producing podcasts, audiobooks, video dubbing and so on....
SuperMaker AI is a free online authoring platform that helps users quickly generate high-quality video, music, image and voice content. Users can try out the core features without logging in, and it's easy to use for individual creators and small teams. The platform uses artificial intelligence technology to create text, images or creative...
Muyan-TTS is an open source text-to-speech (TTS) model designed for podcasting scenarios. It is pre-trained with over 100,000 hours of podcast audio data and supports zero-sample speech synthesis to generate high-quality natural speech. The model is built on Llama-3.2-3B, and combined with the SoVITS decoder, it provides high...
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...
Audibit is an open source project, the core function is to Hacker News, TechCrunch and other popular technology articles automatically turned into audio podcasts, so that users in the commute, fitness, or busy when listening to information through the Web or mobile. The project uses Next.js and React to develop the front-end , combined with ...
Top