Skip to main content

AI model from OpenAI automatically recognizes speech and translates it to English

posted onSeptember 22, 2022
by l33tdawg
Arstechnica
Credit: Arstechnica

On Wednesday, OpenAI released a new open source AI model called Whisper that recognizes and translates audio at a level that approaches human recognition ability. It can transcribe interviews, podcasts, conversations, and more.

OpenAI trained Whisper on 680,000 hours of audio data and matching transcripts in 98 languages collected from the web. According to OpenAI, this open-collection approach has led to "improved robustness to accents, background noise, and technical language." It can also detect the spoken language and translate it to English.

OpenAI describes Whisper as an encoder-decoder transformer, a type of neural network that can use context gleaned from input data to learn associations that can then be translated into the model's output. By open-sourcing Whisper, OpenAI hopes to introduce a new foundation model that others can build on in the future to improve speech processing and accessibility tools. OpenAI has a significant track record on this front. In January 2021, OpenAI released CLIP, an open source computer vision model that arguably ignited the recent era of rapidly progressing image synthesis technology such as DALL-E 2 and Stable Diffusion.

Source

Tags

Industry News Artificial Intelligence

You May Also Like

Recent News

Tuesday, July 9th

Wednesday, July 3rd

Friday, June 28th

Thursday, June 27th

Thursday, June 13th

Wednesday, June 12th

Tuesday, June 11th

Friday, June 7th

Thursday, June 6th

Wednesday, June 5th