Friday, February 1, 2013

Computer Speech to Text

Computer speech or voice recognition (VR), which converts speech to text,has been in the works for many years and is now attracting the interest of many.

I got interested in computer voice recognition in the early days when I adapted it for use on a production line to record defects. I wrote of my experiences in Speech Tech magazine.

I had worked with Janet Baker who had started a company called Dragon Systems. The company has changed hands several times since then

Voice recognition is extremely difficult for computers. It requires a process called pattern matching. Pattern matching requires a comparison to a huge library of sounds. This includes many human voices and many pronunciations from different people. 

Pattern matching also requires adaptation to many different microphones and surroundings. Recognition must occur instantly and requires extremely fast processing and memory by the computer.

A single misrecognition can wreak havoc. Even 99 percent recognition is not good enough. Intervention is needed for corrections. In comparison, the computer keyboard always recognizes the letters you type.

There are two primary types of speech recognition popular today. One is quite new. The older systems, made popular by Dragon, require the user to train with the system so that the computer can adapt speech patterns to a specific voice. These also require a special microphone and headset.   They don't work on tablets and therefore are often not there when and where you need them.

Newly, cloud computing has brought about improvements. Cloud computing transmits sounds from individual computers to much more powerful computers in the Internet. A pattern match is made, and the result is transmitted back to the originating computer and user.

This process creates greater accuracy and ability to use less processing power at the source more effectively. 

This technology works well with small mobile devices like smartphones and tablets. It also avoids the complex training needed for the older systems. 

Whereas the over systems work with relatively few users, the newer systems work much more widely without training.

Google, especially, has been effective with voice recognition for browsing and for e-mail.

Even so, a keyboard is needed to make corrections. It helps to be able to tap on a misrecognition for alternatives and select the proper recognition with another tap

It still takes highly motivated and patient users to use older systems, whereas the newer system will be adopted by more users successfully. 

It helps that Google speech recognition is offered free. Devices using the latest Google Chrome browser make it easily accessible for your evaluation. Simply install Google Chrome and go from there. Make sure you have the latest version of Chrome. Google speech recognition works on the latest tablets using the latest version of the Android operating system, and Apple devices, including the original iPad, though not with Apple Pages.

Dragon also has apps which work similarly.   Apple's Siri does not work on all Apple, nor any non-Apple devices.

Dictated and Published from my Nexus 7 (which uses Google Speech Recognition to create documents)