Smart Dictation is a powerful macOS app that utilizes advanced AI technology to help users easily convert audio recordings into text. The app integrates OpenAI's latest GPT-4o and Whisper models to provide accurate transcription, translation and summarization services. Whether you are memorizing .....
Voquill is an AI tool installed in Chrome. It allows users to use voice input instead of keyboard typing on any website. When you're writing an email, replying to a chat message, or editing a document, you can just speak and Voquill will convert your voice into text in real time. In addition to basic voice listening...
Grabcube is a free audio and video processing tool that specializes in video and audio downloads, AI speech to text, subtitle translation and editing. It supports more than 1,000 mainstream platforms, including YouTube, Bilibili, Vimeo, etc., and allows users to download video and audio files in multiple formats without limitations.Grabcu....
Recap is an open source tool designed for macOS to help users quickly record, transcribe and summarize meeting audio. It handles all the data locally without uploading it to the cloud, protecting user privacy. Developer Rawand Ahmad built Recap to address the difficulty of focusing on discussion and recording at the same time in a meeting...
Whisper_Cloudflare is an open source project created by developer thun888 and hosted on GitHub.It is based on OpenAI's Whisper model and combines the serverless architecture of Cloudflare Workers to provide highly efficient speech-to-text...
Spokenly is a speech-to-text tool designed for macOS, designed to help users quickly enter text by voice and improve work efficiency. It utilizes advanced AI technologies (such as Whisper and GPT-4o) to convert speech to text in real-time, supports over 100 languages, and is suitable for a wide range of scenarios. ....
OpusLM_7B_Anneal is an open source speech processing model developed by the ESPnet team and hosted on the Hugging Face platform. It focuses on a variety of tasks such as speech recognition, text-to-speech, speech translation and speech enhancement, and is suitable for researchers and developers to experiment and apply in the field of speech processing. The model .....
OpenWispr is an open source desktop speech-to-text application based on OpenAI Whisper technology that quickly converts user speech to text. It offers local and cloud processing options, emphasizes privacy protection, and data can be left entirely local. Users can quickly start dictation via global hotkeys, and the text automatically sticks...
vosk-browser is a speech recognition tool that runs in the browser, built on WebAssembly technology, using the Vosk speech recognition library. It supports processing microphone input or audio files directly in the browser, providing offline speech-to-text functionality without relying on cloud servers. The tool supports ...
Any2Text is a free online tool focused on converting audio and video files to text quickly. It utilizes advanced AI speech recognition technology, supports over 100 languages, and is suitable for a variety of scenarios such as meeting recording, podcast transcription and subtitle generation. Users don't need to register to use it, and it is easy to operate on...
Whisper App is a free and open source tool that allows users to record notes by voice and use AI technology to convert the voice to text, generating content such as lists, blogs or tasks. Developed by Nutlope and hosted on GitHub, the project is based on Together.ai's Whisper model...
Voxtral is its first open audio model released on July 15, 2025 by French AI startup Mistral AI. Voxtral aims to provide commercial applications with speech understanding capabilities out-of-the-box for production environments, at a price that is highly competitive in the market. The Voxtral model is available in two versions for ....
SimpleListenJournal is an audio/video to text tool from Baidu that focuses on quickly converting voice or video content to text and provides AI intelligent analysis. Users can upload audio, video or input text to get high-precision transcription results and automatic summarization. The platform supports multiple languages for...
Tencent Meeting AI Assistant Pro is an intelligent meeting assistance tool launched by Tencent, aiming to improve the efficiency and convenience of online meetings. It analyzes meeting content in real time through artificial intelligence technology, provides personalized reminders, summarizes key information and generates to-do lists, helping users focus on discussions and not miss key...
Flash Notes is a smart note-taking tool launched by Nail, designed to help users quickly record, organize and share information. It supports a variety of recording methods such as voice, text and pictures, which is suitable for individuals and teams to manage notes efficiently in work, study or life. Flash Notes converts voice to text through intelligent technology and automatically...
Kyutai Labs' delayed-streams-modeling project is an open source speech-to-text conversion framework based on Delayed Stream Modeling (DSM) technology at its core. It supports real-time speech-to-text (STT) and text-to-speech (TTS) functions , suitable for building efficient voice interaction applications . The project provides p...
Very Fast Dictation is an open source speech-to-text tool designed for Mac users. It uses fast speech recognition technology to convert what the user says into text in real time, for any scenario that requires text input. The project is hosted on GitHub, developed by developer Avi Aryan, and uses...
Simple Subtitling is an open source audio subtitle generation tool that focuses on automatically generating subtitles and labeling speakers for video or audio files. Project developed by Jaesung Huh , hosted on GitHub , aims to provide a simple and efficient subtitle generation solution . Tools through the audio processing technology .....
Abogen is an open source tool designed to quickly convert ePub, PDF or plain text files to high quality audio. It uses the Kokoro-82M model to generate natural and smooth speech, and also supports synchronized subtitle generation, making it suitable for audiobooks, video dubbing or learning aids. Users can choose...
Top