Mistral has released an open automatic speech recognition (ASR) software bundle called Voxtral in a bid to undercut rivals on price and quality.
The biz claims that using ASR in production has required a trade-off – using open-source models with high error rates and limited semantic understanding or using closed proprietary models for better accuracy at a higher cost.
“Voxtral bridges this gap,” the Paris-based AI biz claimed in a blog post. “It offers state-of-the-art accuracy and native semantic understanding in the open, at less than half the price of comparable APIs.”
Comparable APIs include OpenAI’s Whisper model, which provides transcription at a price of $0.006 per minute, and its gpt-4o-mini-transcribe model, priced at $0.003 per minute. The Voxtral API starts at $0.001 per minute and goes up to about $0.004 with an allegedly better word error rate than gpt-4o-mini-transcribe.
“Voxtral comprehensively outperforms Whisper large-v3, the current leading open-source Speech Transcription model,” Mistral claims, alongside various supporting benchmark result graphs.
“It beats GPT-4o mini Transcribe and Gemini 2.5 Flash across all tasks, and achieves state-of-the-art results on English short-form and Mozilla Common Voice, surpassing ElevenLabs Scribe and demonstrating its strong multilingual capabilities.”
Researchers last year found [PDF] that about 1 percent of OpenAI Whisper transcriptions contained hallucinated passages. Mistral has provided no data on hallucination rates that we’re aware of.
Voxtral supports input (context) of up to 32,000 tokens, which corresponds to about 30 minutes of audio transcription or 40 minutes for understanding. It can respond to questions about the audio or generate summaries. It can automatically detect widely used languages, such as English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, among others. And it incorporates function-calling via voice to trigger code workflows via voice.
While Voxtral models can be downloaded and used in applications at no cost, Mistral is hoping businesses will pay to use its ASR technology for their applications. The AI shop is offering to help companies set up Voxtral for production-scale inference in private infrastructure and to help tune models for industry-specific applications. Mistral also says it’s looking for potential partners who can provide additional functionality like speaker identification or emotion detection in model deployments.
Earlier this month, Mistral joined with dozens of other European companies to urge European lawmakers to pause the EU AI Act because they see the rules limiting the competitive potential of businesses on the continent. ®