vosk-browser is a speech recognition tool that runs in the browser, built on WebAssembly technology, using the Vosk speech recognition library. It supports processing microphone input or audio files directly in the browser, providing offline speech-to-text functionality without relying on cloud servers. The tool supports 13 languages, including English, German, French, Spanish, etc. It is suitable for developers who need to implement speech recognition in the browser. vosk-browser runs through the WebWorker, which optimizes performance and avoids blocking the main browser thread. Users can load the model and initiate speech recognition with simple JavaScript code, making it suitable for scenarios such as chatbots, smart homes or subtitle generation. Developed by Ciaran O'Reilly, the project is hosted on GitHub and has more than 450 stars and a highly active community.
Function List
- offline speech recognition: Handles voice input in the browser without an Internet connection, protecting user privacy.
- Multi-language support: English, German, French, Spanish and 13 other languages are supported, and the model is extensible.
- Microphone and audio file input: Can process live microphone input or uploaded audio files.
- WebAssembly Optimization: Runs through WebAssembly and WebWorker, ensuring that it is efficient and doesn't block the browser.
- simple integration: Installation via CDN or npm, support for quick embedding of web pages.
- Real-time results output: Provides real-time and partial recognition results, suitable for dynamic interaction scenarios.
- Flexible model loading: Support for loading different language models, which are stored in a compressed format and are small in size.
Using Help
Installation process
The vosk-browser is very easy to use and developers can integrate it into web pages in the following ways:
- Introduced via CDN::
Add the following code to the HTML file to load the vosk-browser library into the page:<script type="application/javascript" src="https://cdn.jsdelivr.net/npm/vosk-browser@0.0.5/dist/vosk.js"></script>
After loading the
Vosk
Global variables can be used in JavaScript. The latest version is 0.0.8, so check npm or jsDelivr for the latest version. - Install via npm::
If you are using modular development, you can install it via npm:npm install vosk-browser
Then import it in a JavaScript file:
import * as Vosk from 'vosk-browser';
- Download Language Model::
vosk-browser needs to load the language model file (in the format of.tar.gz
), for examplevosk-model-small-en-us-0.15.tar.gz
. These models can be downloaded from the officially provided links (e.g.https://ccoreilly.github.io/vosk-browser/models/
) Download. The model file contains configuration files and data required for speech recognition, such asmfcc.conf
cap (a poem)model.conf
. For high-precision models, themfcc_hires.conf
. Place the model file under the same path as the script or specify the model's URL.
Procedure for use
The following are detailed steps to implement speech recognition using vosk-browser:
- Loading Models::
utilizationVosk.Model
The constructor loads the model. Suppose the model file ismodel.tar.gz
The code is as follows:async function loadModel() { const model = await Vosk.createModel('model.tar.gz'); return model; }
Models may take a few seconds to load, depending on file size and network speed. It is recommended to use small models (e.g. 50MB) to improve loading speed.
- Initialize the recognizer::
Create the recognizer and specify the sample rate (usually consistent with the audio context, e.g. 16000Hz):async function startRecognition(model) { const ctx = new AudioContext(); const recognizer = await Vosk.createRecognizer(model, ctx.sampleRate); return { recognizer, ctx }; }
- Capture microphone input::
Use your browser'snavigator.mediaDevices.getUserMedia
Get the microphone input:async function setupMic(ctx) { const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); const micNode = ctx.createMediaStreamSource(stream); return micNode; }
- Processing of identification results::
monitorresult
cap (a poem)partialResult
event to get a complete or partial recognition result:recognizer.addEventListener('result', (ev) => { console.log('完整结果:', ev.detail.text); }); recognizer.addEventListener('partialResult', (ev) => { console.log('部分结果:', ev.detail.text); });
- Transmit audio data::
Transmits microphone audio data to the recognizer:async function connectAudio(ctx, micNode, recognizer) { const transferer = await Vosk.createTransferer(ctx, 128 * 150); transferer.port.onmessage = (ev) => recognizer.acceptWaveform(ev.data); micNode.connect(transferer); }
- activate recognition::
Combine the above steps to initiate speech recognition:async function start() { const model = await loadModel(); const { recognizer, ctx } = await startRecognition(model); const micNode = await setupMic(ctx); await connectAudio(ctx, micNode, recognizer); } document.getElementById('startButton').onclick = start;
Featured Function Operation
- real time recognition: By
partialResult
events, developers can get partial recognition results in real time while the user is speaking, suitable for chatbot or real-time captioning scenarios. - Multi-language switching: Simply replace the model file (e.g.
vosk-model-fr-0.22.tar.gz
(switch to French) to support other languages without modifying the code. - offline operation: All processing is done locally in the browser without server support, suitable for privacy-sensitive scenarios.
- event management: Support for dynamically adding or removing event listeners. For example, removing a result listener:
recognizer.removeEventListener('result', callbackFunction);
This is useful in dynamic interfaces such as Vue.js applications.
caveat
- Model Selection: small models are suitable for fast loading, but with lower accuracy; large models (e.g., models with
rescore
models) are more accurate, but require more memory. - Browser compatibility: Make sure your browser supports WebAssembly and Web Audio APIs (e.g. Chrome, Firefox).
- performance optimization: Use WebWorker to avoid blocking the main thread, but note that the model loading time may be long and it is recommended to preload the model.
application scenario
- chatbot
Developers can integrate vosk-browser into a web chatbot for voice interaction via microphone input, suitable for online customer service or virtual assistants. - Subtitle Generation
After uploading audio files, vosk-browser generates subtitles for video content creators or educational platforms. - Smart Home Control
Enables voice command recognition in the browser to control smart devices, such as switching lights on and off or adjusting the temperature. - Language Learning Tools
Students can practice their pronunciation through the microphone, and the vosk-browser provides real-time feedback on the pronunciation text to help improve speaking.
QA
- Does vosk-browser require an internet connection?
Not required. vosk-browser supports completely offline operation, modeling and processing are done locally, suitable for network-less environments. - What languages are supported?
Currently supports 13 languages including English, German, French, Spanish, etc., and can be extended with more languages in the future through new models. - How to improve the recognition accuracy?
Using high-precision models (e.g. withrescore
of the model) and make sure the microphone is of good quality. Adjust the model'smodel.conf
The decoding parameters in the file can also optimize the results. - Why is the recognition delayed?
Latency may be caused by model loading time or hardware performance. It is recommended to use smaller models or optimize browser performance.