RealtimeVoiceChat
RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use the microphone to input voice, the system captures the audio through the browser, quickly converts it to text, generates a reply from a large language model (LLM), and then converts the text to speech output, the whole...
Transkriptor
Transkriptor is an AI-driven transcription tool that focuses on converting audio and video to text quickly. It supports over 100 languages with an accuracy rate of up to 99% and is suitable for a wide range of scenarios such as meetings, interviews, classroom notes and more. Users can upload files, record directly or transcribe via links to Zoom, Go...
Conch Speech (MiniMax Audio): AI tool for generating natural speech
MiniMax Audio is an AI speech generation tool from MiniMax, with the core feature of quickly converting text into highly similar natural speech. It is based on the Speech-02 model, with a speech synthesis similarity of up to 99%, studio-grade sound quality, and support for more than 30 languages and a wide range of mouth...
TwinMind
TwinMind is a smart tool developed by ThirdEar AI, Inc. that "helps you remember everything". TwinMind is a smart tool developed by ThirdEar AI Inc. that "remembers everything for you". It can record conversations, meetings or lectures in real time and convert them to text in more than 100 languages, and it can be used offline even if you have your phone in your pocket. Users don't have to take notes themselves, TwinMind will...
OpenAI Realtime Agents
OpenAI Realtime Agents is an open source project that aims to show how OpenAI's real-time APIs can be utilized to build multi-intelligent body speech applications. It provides a high-level intelligent body model (borrowed from OpenAI Swarm) that allows developers to build complex multi-intelligent body speech systems in a short period of time. The project ...
Bailing
Bailing (Bailing) is an open-source voice conversation assistant designed to engage in natural conversations with users through speech. The project combines speech recognition (ASR), voice activity detection (VAD), large language modeling (LLM), and speech synthesis (TTS) technologies to implement a voice conversation robot similar to GPT-4o...
"Always-On" Deepseek AI Assistant: Building an Intelligent Voice Interaction System Based on Deepseek-V3
Always-On AI Assistant is an innovative AI assistant project that creates a powerful and permanently online AI assistant system by integrating advanced technologies such as Deepseek-V3, RealtimeSTT and Typer. The project is especially optimized for engineering development scenarios, providing a complete...
Xiaozhi AI Chatbot
Xiaozhi AI Chatbot is an open source project based on the ESP32 development board, designed to help users build their own AI chat companion. The project is developed by Shrimp and is mainly used for teaching purposes to help more people get started with AI hardware development and understand how to apply the big language model to real hardware devices. Project ...
Fish Agent
Fish Speech Derivative Project Fish Agent is a revolutionary end-to-end AI speech cloning system developed based on V0.1 3B model architecture. As a fully end-to-end speech cloning processing system, its most important feature is that it adopts an innovative semantic tagless architecture design, which does not need to rely on the traditional language such as Whisper .....
Voice-Pro
Voice-Pro is a multifunctional tool based on Gradio WebUI that supports speech-to-text, text-to-speech, real-time translation, YouTube video downloads and human voice separation. It integrates Whisper, Faster-Whisper and Whisper-Timestamp...
Ichigo (llama3-s)
Ichigo is an open source real-time speech AI project that aims to extend text-based language models with native "listening" capabilities. The project uses early fusion techniques inspired by Meta's Chameleon paper.Ichigo's goal is to become an open source data, open source weighted native device speech...
AI Hear
If you're using a MacBook, try AI Hear: it can record audio, convert real-time local speech to text, and translate and eventually export subtitles. You can use it to assist you in listening to cross-country meetings and English audiobooks. AI Hear is a locally-run software that provides one-click real-time translation and transcription in multiple languages....
Fukumaru Chione
Funmaru Thousand Voices is a multilingual AI voice synthesis platform that provides realistic and natural voice generation solutions. Users can easily convert text content into professional-grade audio and support the creation of exclusive AI voices (voice clones) from zero samples to meet personalized needs. The platform also provides video translation function to help users realize...
Tongyi Listening and Understanding: Ali Tongyi Audio and Video Content Transcription AI Assistant
Tongyi Listening and Understanding is a work-study AI assistant launched by Aliyun, focusing on transcribing and analyzing audio and video content. It relies on AliCloud's powerful AI models to transcribe audio and video content into text in real time, and provides translation, summarization, positioning and other functions. Tongyi Listening Woo supports multiple languages and scenarios to help users...
Tencent Smartfilm (developers of the QQ instant messaging platform)
Tencent Smart Shadow is an online intelligent video creation platform launched by Tencent, which can support text dubbing, digital human broadcasting, automatic subtitle recognition and other functions through powerful AI tools provided by cloud-based services.It integrates material search, video editing, rendering exporting and publishing, bringing users convenient video editing and...