CONTINUOUS SPEECH RECOGNITION IS HERE
By Gretchen Berry
CERTAINLY, YOU’VE HEARD SOMETHING ABOUT IT. Perhaps it was spoken in a cautionary whisper (Speech technology is coming). Maybe it was in the bombastic, expectant tones of a movie trailer (SPEECH TECHNOLOGY IS COMING!). Or maybe it was with the most common inflection, bearing just a hint of dread (Speech technology is coming?!?!). Well forget what you have heard and listen up. Speech technology is not coming…It’s already here.
Unlike the preceding “discrete utterance” systems that required users to slowly and methodically dictate information, “continuous speech recognition” (CSR) promises to convert the spoken word into written form while users speak naturally. People have been trumpeting the ascendancy of speech technology for years, and now, it seems the predicted revolution may finally be underway.
“Speech recognition has been around for years,” said David Owen, director of product management for Lernout & Hauspie’s Healthcare Solutions Group. “And now CSR is available to the public at a reasonable price due to powerful, low-cost Pentium processors.”
Scott Finley, MD, MPH, vice president, clinical solutions for Datamedic, Waltham, MA, is an expert in the field of speech technology and chaired an all-day seminar on the subject at this year’s “Toward an Electronic Patient Record” conference in San Antonio. According to Dr. Finley, the actual voice recognizers (the component that takes the sound waves, runs them through algorithms, determines the written equivalent and prints it on the screen) are about as sophisticated as they can get when configured and used properly.
That’s not to say the current systems are perfect. There are still improvements to be made, but the bottom line is that CSR has finally become competent enough and affordable enough to warrant serious consideration. And that consideration should include asking the following questions:
* What would this technology replace?
“If you are dealing with a fabulous, inexpensive transcription group that makes very few errors and communicates with the physicians, you don’t have much to gain by replacing them,” asserted Dr. Finley.
* Who would use this technology?
A commonly cited detriment to the widespread use of speech technology is physician reluctance. Doctors, the argument goes, will be unwilling to take the time to learn the new system or temper their dictating style (however slightly) to meet the demands of the program.
As with any blanket statement, exceptions are easily found. Richard O’Brien, MD, FACEP, of Moses Taylor Hospital in Scranton, PA, has been using a CSR system for two years and is enthusiastic about its benefits. “I think it leads to better documentation,” he insisted. “You are giving up very little and the benefit is that you have your report ready in two or three minutes.”
* How will this technology impact workflow?
Despite Dr. O’Brien’s endorsements, he is still the only physician at Moses Taylor currently using speech technology. This set-up is fine at his facility, but may not work at others. “If not everyone is willing to use this,” cautioned Dr. Finley, “you have to make sure that your workflow can tolerate it. That is, is there a rational workflow whereby some continue to dictate for conventional human transcription, while others utilize CSR?”
* Why am I considering speech technology?
“Some purchases are motivated by a desire to solve real problems, such as those with transcription or illegibility of notes,” clarified Dr. Finley. “Others are driven by enthusiasm for the technology. It’s important to think practically about these systems and not be blinded by the technology.”
And thinking practically means comparison shopping. Presently, only four companies offer CSR products: Lernout & Hauspie (Burlington, MA), IBM (White Plains, NY), Dragon Systems (Newton, MA) and Philips (Atlanta). On the surface, that should make choices simple, but with the variety of products offered and the number of factors to consider, it can be a bit overwhelming.
A Manner of Speaking
One of the most significant factors in determining the appropriateness of a voice recognition system is the issue of domain specificity. “A computer, unlike humans, can not do context shifts,” explained Dr. Finley. So, not only should a system be programmed to understand what you are saying, it must also, in the most rudimentary sense, be programmed to know what you are talking about. In general, the more specific the language programmed into the system, the better the accuracy rate. For example, a cardiologist utilizing a system programmed with a vocabulary tailored to her specialty will be more successful in her dictation than if she were using a general medicine application.
But over-specificity can also pose problems. “Suppose you had a recognizer for mammography,” Dr. Finley pointed out. “Because you had just that vocabulary, every time you tried to add medical history that did not relate to mammography, your error rate would go up.”
Most of the speech technology providers offer both general medical and domain-specific products. Not all specialties are readily available, but there are new products being introduced regularly and certain providers offer completely customized systems.
Another major consideration is the “free text” capabilities. Free text is what most people think of when they hear “speech technology.” It’s the basic “you talk, it types” system. Many products, however, feature a macro-driven system whereby choices are offered menu style and selected via voice. On one hand, this promotes thorough and accurate documentation. On the other, the process can prove tedious to some users. Dr. O’Brien, for example, prefers to dictate in the free text mode despite the macros offered in his system.
Recognition speed is another important factor. The fastest systems offer “real time” recognition, meaning it writes as you speak. The advantage is that users have the ability to make corrections on the spot. The disadvantage is that the appearance of the words can be distracting.
With real time, users have instantaneous feedback that can ultimately help them to customize their inflections and pronunciations to ensure that the computer does not misunderstand them. This can be beneficial in that the use of the program becomes more efficient. However, any time a physician is forced to temper his dictation, you run the risk of compromising documentation.
“We should not force the user to adapt to the technology,” pointed out Debra Cattani, vice president of sales for Philips Speech Processing. “The technology should adapt to the user and the work environment.”
“As soon as the recognizer is forcing people to change what they are saying, then the tool is assuming a role that it shouldn’t,” added Dr. Finley. “For example, many recognizers have problems recognizing proper names, so the physician may be tempted not to dictate them anymore. There is something insidious about that–excluding something that had always been included because the computer can’t handle it.”
An alternative to this is the utilization of “correctionists.” Similar to a medical transcriptionist, a correctionist reviews dictated transcripts, edits and corrects them, and returns them to the physician. A few products currently available are flexible enough to offer both.
I Meant to Say….
Regardless of who is fixing the mistakes, ease of correction and adaptability are vital. Clicking your mouse on a word and having the audio replay is very handy. But once you make that correction, are you going to have to make it again and again? Or does your system learn not to repeat the mistake? Can you enter your own particular pronunciation for a specific word? Can you add words easily? Can you delete words from the vocabulary?
With all these factors to consider, it might be tempting to go with the most adaptable program available. But overly adaptable programs can also cause problems. “An interesting phenomenon is that adaptable systems can get into trouble at higher and higher accuracy levels,” offered Dr. Finley. “That’s because the slightest erroneous correction can make things worse instead of better.”
Which brings us to the big issue of accuracy. An advertised accuracy of 90 percent still means that there will be one mistake in every 10 words. Most users will not tolerate less than 97 percent to 98 percent accuracy. Almost all the products currently available fall within this range, but keep in mind that the accuracy of a specific program can change from user to user.
“Any system that is adaptable and improves with use cannot be fairly judged by someone that hasn’t taken the time to optimize the system’s performance, because you are only getting the initial accuracy instead of the accuracy after weeks of use,” cautioned Dr. Finley. “One way to determine accuracy is to take your own dictations and make the demonstrator dictate them. That way you can evaluate what is the potential of this recognizer with a trained user with data that isn’t chosen to play to the strengths of the recognizer. But it is far better, if it is feasible, to have at least one person at the customer site train up the system and run it through its paces and really give it a shot. That’s often impractical because of the interfaces that are required.”
Which brings us to the final point–when evaluating your options, take time to consider how this technology will fit into your established workflow and any necessary interfaces. If immediate availability is important, and a paper record is the desired end result, then an independent, free-text, immediate correction model may be exactly what you need. However, if codified data is crucial or if the information must be stored in a computer-based patient record, you may need a system that offers not simply speech recognition but a variety of bells and whistles.
Gretchen Berry is an editorial assistant at ADVANCE.