Speech Recognition: What’s Happening?
Speech Recognition: What’s Happening?
In the film 2001: A Space Odyssey, computers were good listeners. In 2001 reality, computers only hear what they want to hear.
As much as people would like to speak with their machines and to browse the Internet by voice rather than by manual keystroke, recent strides in speech recognition technology still fail to provide the ease and spontaneity of a free-flowing dialogue that is clearly enjoyed between humans. While the latest speech engines can recognize spoken words with 90 percent accuracy, the machines mandate the specific words and phrases that users may use. They ignore any commands that stray.
Although speech technology players are developing more conversational systems, no major breakthrough is expected any time soon. The main obstacle for natural speech technology involves simple power. While huge gains in processing speeds have helped computers understand specific words, it requires a lot more power to enable that machine to comprehend the countless combinations of words used to express thoughts.
Speech recognition applications for medical dictation have been commercialized for more than a decade. Yet, speech recognition fundamentals haven’t changed dramatically. The user creates free text dictation and the speech is digitally recorded. The voice server collects and routes dictation. The speech recognition engine then converts the voice to text. The resulting speech and text are presented to medical transcription for simultaneous audio/visual review/edit. Several speech recognition products have been marketed without success due to several factors, which include:
- Software capabilities
- User adaptability
- Cost of implementation
- Diminished technological advances.
What drives the continued interest in speech recognition for medical dictation?
- Inadequate turnaround times
- Poor transcription quality
- Remote health care facility with no transcription providers
- No night or weekend transcription coverage
- Attempted cost savings
- Lack of medical transcriptionists
- Elimination of handwritten reports
- Text is immediately available for editing by the user
Because of its promise to reduce keyboarding, decrease the number of FTEs, and improve turnaround, most nonusers become enamored with speech recognition applications for medical dictation. Demonstrations are very polished and seem to provide the ultimate solution. In practice, when the user gets busy, these systems are often abandoned. Most physicians don’t want to take the time to manage the document. So, who will be responsible for the document? Do the math:
- 50 reports @ 1 minute each x100 wpm = 5,000 words
- 5,000 words @ 90% accuracy = 500 corrections
- At 35 wpm typing = 15 minutes extra typing
- For physicians, corrections will take one hour
- Radiologist hourly salary vs. transcriptionist salary = considerable difference
Before purchasing a speech recognition product for medical dictation, ask:
- If the originator is responsible for content, who is responsible for the document?
- Are users willing to adopt new methods?
- Are appropriate workstations available?
- What contexts will be needed?
- Will speech recognition encompass all dictation? Specialties?
- Will the user save time and/or money?
- Will speech recognition improve transcriptionist productivity and/or turnaround?
Most new action with speech recognition products involves applications that are currently being marketed to the public for mass production and use. The allure of voice browsing is strong, especially for those trying to stay connected on mobile phones and hand-held computers with tiny keypads and screens. Now, new telephone-based voice portals can read online information to callers in response to set spoken commands. Also, a new speech system for navigating hand-held computers is available. This feature may contribute to highway safety. In a survey of Palm users, almost 40 percent acknowledged using the Palm and the cell phone while driving.
The most effective use of speech-recognition technology has been with automatic telephone systems that provide customer service. Because they are usually designed for a specific purpose, those speech engines are customized to understand a select vocabulary. Context is also important. At airlines, credit card companies or brokerage houses, the need for a speech system that can recognize, “I want to order the black high-heeled pumps in a size 7B which are shown on page 208 of your spring catalog, and send me the matching handbag as well,” is not crucial.
Accuracy is increased when the domain to be understood is restricted. Limiting the domain leads to a much more manageable task than trying to master the entire dictionary. The approach is similar at most of the new telephone-based voice portals. While the exact focus and presentation varies, the services keep it simple, limiting their scope to specific matters such as news, weather, e-mail, driving directions and movie listings. Users may struggle until they gain familiarity with the approved commands for navigating the different menus. But users will enjoy having their e-mail read to them, and find it useful to pick up a cell phone in their car and get directions, all through speech recognition technology.
By contrast, most attempts to voice-navigate the full-blown Internet on a personal computer are fraught with frustrations. Because each Web site is designed differently, it is difficult to pack extensive flexibility into a single program on a PC. When makers of voice browsers attempted natural language recognition, and incorporated artificial intelligence into their browser so it could recognize concepts in addition to specific commands, the browser became bloated and sluggish, and navigation mistakes were made frequently.
As a tool, speech recognition technology is becoming an increasing presence in all levels of our lives, whether at home, in the car or at work. It is beginning to have an impact on how we request information, how we receive information, and how we process information. Due to its technological limitations, speech recognition has not established itself as the ultimate panacea for processing medical dictation nor can it be relied upon to fully navigate the Internet. Research and development continues in this arena.
As they become familiar with unobtrusive bits of speech recognition technology, medical transcriptionists will cease to fear the concept of voice recognition and its position as a threat will diminish. In the meantime, let’s all enjoy the convenient novelty of getting driving directions, ordering a pizza, obtaining movie listings, and determining our bank balance through the use of speech recognition technology.
“Okay, there’s enough money on the cash card and that new pizza place is on the way to the movie theatre. Make mine a pepperoni and, no, I don’t want to watch any more science fiction.”
Barbara Williams, CMT, is president-elect of the American Association for Medical Tran-scription (AAMT) and manager of medical transcription services for Saint Francis Health System in Tulsa, OK.