Comparison of technical features of model versions
Voxtral's innovativeness is reflected in its careful versioning:
- The 24B parameter version is optimized for cloud computing environments with processing power up to:
- Supports 30 minutes of audio transcription
- 40-minute deep understanding task
- 32k Token Context Window
- The 3B Mini version is aimed at edge devices:
- Maintaining the core functionality of the 85%
- Requires only a medium configuration GPU to run
- Particularly suited to IoT and mobile scenarios
This architectural design makes Voxtral the only open source speech model that currently supports both cloud-based SaaS services and private deployments.
This answer comes from the articleVoxtral: an AI model developed by Mistral AI for speech transcription and understandingThe