Voxtral is its first open audio model released on July 15, 2025 by French AI startup Mistral AI. Voxtral aims to provide commercial applications with speech understanding capabilities out-of-the-box for production environments, at a price that is highly competitive in the market. The Voxtral model is available in two versions for ....
SimpleListenJournal is an audio/video to text tool from Baidu that focuses on quickly converting voice or video content to text and provides AI intelligent analysis. Users can upload audio, video or input text to get high-precision transcription results and automatic summarization. The platform supports multiple languages for...
Tencent Meeting AI Assistant Pro is an intelligent meeting assistance tool launched by Tencent, aiming to improve the efficiency and convenience of online meetings. It analyzes meeting content in real time through artificial intelligence technology, provides personalized reminders, summarizes key information and generates to-do lists, helping users focus on discussions and not miss key...
Flash Notes is a smart note-taking tool launched by Nail, designed to help users quickly record, organize and share information. It supports a variety of recording methods such as voice, text and pictures, which is suitable for individuals and teams to manage notes efficiently in work, study or life. Flash Notes converts voice to text through intelligent technology and automatically...
Kyutai Labs' delayed-streams-modeling project is an open source speech-to-text conversion framework based on Delayed Stream Modeling (DSM) technology at its core. It supports real-time speech-to-text (STT) and text-to-speech (TTS) functions , suitable for building efficient voice interaction applications . The project provides p...
Very Fast Dictation is an open source speech-to-text tool designed for Mac users. It uses fast speech recognition technology to convert what the user says into text in real time, for any scenario that requires text input. The project is hosted on GitHub, developed by developer Avi Aryan, and uses...
Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Project developed by Jaesung Huh , hosted on GitHub , aims to provide a simple and efficient subtitle generation solution . Tools through the audio processing technology .....
Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural and smooth speech, and also supports synchronized subtitle generation, making it suitable for audiobooks, video dubbing or learning aids. Users can choose...
Kimi-Audio is an open source audio base model developed by Moonshot AI that focuses on audio understanding, generation and dialog. It supports a variety of audio processing tasks such as speech recognition, audio Q&A, and speech emotion recognition. The model has been pre-trained with over 13 million hours of audio data, combined with innovative...
On-Device AI is an AI app that runs completely offline, designed for Apple devices, supporting iOS, macOS, and visionOS.It provides local large-scale language model (LLM) running, real-time speech transcription, document analysis, and other features, and it can be used without an internet connection to ensure data privacy. Users can voice...
Vexa is an open source real-time meeting transcription and knowledge management platform designed to provide efficient meeting recording and intelligent knowledge extraction services for enterprises and individuals. It automatically joins Google Meet, Zoom and other platforms through API-driven meeting robots, transcribes voice to text in real time, and supports 99...
realtime-transcription-fastrtc is an open source project that focuses on converting speech to text in real time. It uses FastRTC technology to process low-latency audio streams , combined with the local Whisper model to achieve efficient speech recognition . The project is maintained by the developer sofi444 , tor...
Transkriptor is an AI-driven transcription tool that focuses on converting audio and video to text quickly. It supports over 100 languages with an accuracy rate of up to 99% and is suitable for a wide range of scenarios such as meetings, interviews, classroom notes and more. Users can upload files, record directly or transcribe via links to Zoom, Go...
Otter.ai is an AI-powered meeting management and voice transcription tool with core functionality to convert voice to text in real-time and automatically generate meeting notes, summaries and action items. It is intelligently powered by an AI Meeting Agent that automatically joins meetings such as Zoom, Google Meet, etc., capturing...
TurboScribe is an AI-based transcription tool that focuses on quickly converting audio and video to text. It supports more than 98 languages with an accuracy rate of 99.8% for users who need to process voice content efficiently. Users can upload files to generate transcripts or subtitles, which is easy to operate and fast...
Aqua Voice is an intelligent speech-based text generation tool focused on quickly converting user speech into formatted text. It was founded in 2023 by Finnian Brown and Jack McIntire, is based in San Francisco, USA, and is part of Y Combinator W24 ...
Dolphin is an open source model developed by DataoceanAI in collaboration with Tsinghua University, focusing on speech recognition and language recognition for Asian languages. It supports 40 languages from East Asia, South Asia, Southeast Asia, and the Middle East, as well as 22 Chinese dialects. The model is based on over 210,000 hours of audio data trained...
TwinMind is a smart tool developed by ThirdEar AI, Inc. that "helps you remember everything". TwinMind is a smart tool developed by ThirdEar AI Inc. that "remembers everything for you". It can record conversations, meetings or lectures in real time and convert them to text in more than 100 languages, and it can be used offline even if you have your phone in your pocket. Users don't have to take notes themselves, TwinMind will...
Wispr Flow is a voice-enabled text input tool that helps users write quickly on their computers. With a "3x faster than typing" experience, users can enter text into any application, such as Word, Slack or Gmail, just by speaking naturally.Wispr Flow supports more than 100 languages....