August 20, 2017 – For over two decades Microsoft has worked to advance speech recognition and speech-to-text transcription to human levels of accuracy. A team of researchers at the company’s Speech & Dialogue group claim to have achieved just that: a speech recognition system that is on par with human ability with an average error rate of 5.1%. The team strengthened the system’s language model using neural, acoustic and language models, coupled with more powerful software tools and cloud-computing infrastructure to reduce the error rate 12% from last year. This software has already seen implementation into various Microsoft products and services, such as the Speech Translator app, PowerPoint translations, and its own AI assistant, Cortana.
According to Microsoft, this milestone represents a major milestone for software speech recognition. The team will continue to refine the technology while tackling not only how computers and technology transcribe words, but also understand and interpret them, leading to new pathways for helping users with multi-lingual speech requirements.