Current Position:fig. beginning " AI Tool

Vosk-Browser: Speech Recognition Tool Running in a Browser

2025-07-28

2.9 K 0

https://github.com/ccoreilly/vosk-browser

make a copy of

vosk-browser is a speech recognition tool that runs in the browser, built on WebAssembly technology, using the Vosk speech recognition library. It supports processing microphone input or audio files directly in the browser, providing offline speech-to-text functionality without relying on cloud servers. The tool supports 13 languages, including English, German, French, Spanish, etc. It is suitable for developers who need to implement speech recognition in the browser. vosk-browser runs through the WebWorker, which optimizes performance and avoids blocking the main browser thread. Users can load the model and initiate speech recognition with simple JavaScript code, making it suitable for scenarios such as chatbots, smart homes or subtitle generation. Developed by Ciaran O'Reilly, the project is hosted on GitHub and has more than 450 stars and a highly active community.

Function List

offline speech recognition: Handles voice input in the browser without an Internet connection, protecting user privacy.
Multi-language support: English, German, French, Spanish and 13 other languages are supported, and the model is extensible.
Microphone and audio file input: Can process live microphone input or uploaded audio files.
WebAssembly Optimization: Runs through WebAssembly and WebWorker, ensuring that it is efficient and doesn't block the browser.
simple integration: Installation via CDN or npm, support for quick embedding of web pages.
Real-time results output: Provides real-time and partial recognition results, suitable for dynamic interaction scenarios.
Flexible model loading: Support for loading different language models, which are stored in a compressed format and are small in size.

Using Help

Installation process

The vosk-browser is very easy to use and developers can integrate it into web pages in the following ways:

Introduced via CDN::
Add the following code to the HTML file to load the vosk-browser library into the page:
```
<script type="application/javascript" src="https://cdn.jsdelivr.net/npm/vosk-browser@0.0.5/dist/vosk.js"></script>
```
After loading theVosk Global variables can be used in JavaScript. The latest version is 0.0.8, so check npm or jsDelivr for the latest version.
Install via npm::
If you are using modular development, you can install it via npm:
```
npm install vosk-browser
```
Then import it in a JavaScript file:
```
import * as Vosk from 'vosk-browser';
```
Download Language Model::
vosk-browser needs to load the language model file (in the format of .tar.gz), for example vosk-model-small-en-us-0.15.tar.gz. These models can be downloaded from the officially provided links (e.g. https://ccoreilly.github.io/vosk-browser/models/) Download. The model file contains configuration files and data required for speech recognition, such as mfcc.conf cap (a poem) model.conf. For high-precision models, the mfcc_hires.conf. Place the model file under the same path as the script or specify the model's URL.

Procedure for use

The following are detailed steps to implement speech recognition using vosk-browser:

Loading Models::
utilization Vosk.Model The constructor loads the model. Suppose the model file is model.tar.gzThe code is as follows:
```
async function loadModel() {
const model = await Vosk.createModel('model.tar.gz');
return model;
}
```
Models may take a few seconds to load, depending on file size and network speed. It is recommended to use small models (e.g. 50MB) to improve loading speed.

Initialize the recognizer::
Create the recognizer and specify the sample rate (usually consistent with the audio context, e.g. 16000Hz):

async function startRecognition(model) {
const ctx = new AudioContext();
const recognizer = await Vosk.createRecognizer(model, ctx.sampleRate);
return { recognizer, ctx };
}

Capture microphone input::
Use your browser's navigator.mediaDevices.getUserMedia Get the microphone input:

async function setupMic(ctx) {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const micNode = ctx.createMediaStreamSource(stream);
return micNode;
}

Processing of identification results::
monitor result cap (a poem) partialResult event to get a complete or partial recognition result:

recognizer.addEventListener('result', (ev) => {
console.log('完整结果:', ev.detail.text);
});
recognizer.addEventListener('partialResult', (ev) => {
console.log('部分结果:', ev.detail.text);
});

Transmit audio data::
Transmits microphone audio data to the recognizer:

async function connectAudio(ctx, micNode, recognizer) {
const transferer = await Vosk.createTransferer(ctx, 128 * 150);
transferer.port.onmessage = (ev) => recognizer.acceptWaveform(ev.data);
micNode.connect(transferer);
}

activate recognition::
Combine the above steps to initiate speech recognition:

async function start() {
const model = await loadModel();
const { recognizer, ctx } = await startRecognition(model);
const micNode = await setupMic(ctx);
await connectAudio(ctx, micNode, recognizer);
}
document.getElementById('startButton').onclick = start;

Featured Function Operation

real time recognition: By partialResult events, developers can get partial recognition results in real time while the user is speaking, suitable for chatbot or real-time captioning scenarios.
Multi-language switching: Simply replace the model file (e.g. vosk-model-fr-0.22.tar.gz (switch to French) to support other languages without modifying the code.
offline operation: All processing is done locally in the browser without server support, suitable for privacy-sensitive scenarios.
event management: Support for dynamically adding or removing event listeners. For example, removing a result listener:
```
recognizer.removeEventListener('result', callbackFunction);
```
This is useful in dynamic interfaces such as Vue.js applications.

caveat

Model Selection: small models are suitable for fast loading, but with lower accuracy; large models (e.g., models with rescore models) are more accurate, but require more memory.
Browser compatibility: Make sure your browser supports WebAssembly and Web Audio APIs (e.g. Chrome, Firefox).
performance optimization: Use WebWorker to avoid blocking the main thread, but note that the model loading time may be long and it is recommended to preload the model.

application scenario

chatbot
Developers can integrate vosk-browser into a web chatbot for voice interaction via microphone input, suitable for online customer service or virtual assistants.
Subtitle Generation
After uploading audio files, vosk-browser generates subtitles for video content creators or educational platforms.
Smart Home Control
Enables voice command recognition in the browser to control smart devices, such as switching lights on and off or adjusting the temperature.
Language Learning Tools
Students can practice their pronunciation through the microphone, and the vosk-browser provides real-time feedback on the pronunciation text to help improve speaking.

QA

Does vosk-browser require an internet connection?
Not required. vosk-browser supports completely offline operation, modeling and processing are done locally, suitable for network-less environments.
What languages are supported?
Currently supports 13 languages including English, German, French, Spanish, etc., and can be extended with more languages in the future through new models.
How to improve the recognition accuracy?
Using high-precision models (e.g. with rescore of the model) and make sure the microphone is of good quality. Adjust the model's model.conf The decoding parameters in the file can also optimize the results.
Why is the recognition delayed?
Latency may be caused by model loading time or hardware performance. It is recommended to use smaller models or optimize browser performance.

AI open source project AI Speech to Text

AI productivity tools " Vosk-Browser: Speech Recognition Tool Running in a Browser Posted on 2025-07-28, if you find the URL is out of date, or inaccessible, please contact us.

0Bookmarked

0kudos

Vosk-Browser: Speech Recognition Tool Running in a Browser

Function List

Using Help

Installation process

Procedure for use

Featured Function Operation

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Vosk-Browser: Speech Recognition Tool Running in a Browser

Function List

Using Help

Installation process

Procedure for use

Featured Function Operation

caveat

application scenario

QA

Related articles

Recommended

Can't find AI tools? Try here!

Popular AI tools

New Releases

Latest AI tools

Quick query station AI tool