By John A. Holbrook, MA, MD, FACEP
Over the past decade, we have heard repeated claims that speech recognition by computers would displace human medical transcriptionists (MTs). In reality, speech recognition seemed always to fall short of its promise, and many systems are gathering dust in hospital emergency and radiology departments. Over the past year, however, we have started to see the introduction of systems that really work. By using very large collections of medical texts to “teach” the computer about medical vocabularies and medical context, speech recognition can now attain an accuracy of greater than 98 percent in specific specialties.
Is the Holy Grail Good Enough?
For many years, scientists sought to achieve the “holy grail” of speech recognition: the combination of large vocabulary, speaker independence and continuous speech recognition.
There are now systems that allow physicians to walk up, dictate a record with minimal to no training, in a conversational style, and achieve 98 percent or more accuracy.
However, we have found that even virtually perfect speech recognition has a high potential to fail in a busy department such as an emergency department. Even with a 99 percent accuracy rate, a typical 400 word medical report will have four errors, any one of which may invalidate the whole report. Speaking at 120 words per minute, a physician can dictate that record in 3.5 minutes. But it will typically take another three or four minutes to proofread and correct the report, doubling the overall time of report generation. In addition, we found that physicians make poor editors of their own work: in our surveys, they miss at least one error at least 50 percent of the time.
The latest speech recognition systems link the voice to the text as it is recognized and highlight the text word-by-word as the speech plays back. This allows the MT to edit the physician’s dictation more easily, and then send it back to the physician for signature (probably an electronic signature). The workflow focuses the human function more on the task of quality assurance (QA) or “correcting,” rather than transcribing. De-pending on the accuracy of the speech recognition, the correctionist can process two or more times the number of reports as compared to transcription. We are told, however, that the correctionists feel that their task is more difficult than transcription–especially harder on the eyes.
The “Thin Client” and the Internet
In the past, physicians dictated over telephones, hand-held devices or fixed recording devices. Transcription equipment managed both the voice and the text, usually entirely within the hospital itself. Over time, improvements in telephone systems and reduction in long distance charges allowed transcription to take place remotely from the hospital. However, both transcription and printing required dedicated equipment connected via a permanent network. Speech recognition required the physician to dictate directly into the local computer doing the recognizing (the “thick client”). These computers were very expensive, making hardware one of the significant barriers to the use of this technology.
The new systems, however, allow a physician’s voice to be transmitted to a remote speech recognition computer via the Internet. The correctionist can also access the unedited text through a secure Internet page, edit the text using the Internet application (“thin client”), and send the text back to the physician’s Web site for authentication. This allows both the physician and the correctionist to work on a cheap computer located anywhere in the world that has Internet access.
Natural Language Processing
Natural language processing (NLP) is computer technology that abstracts facts from free text reports. It does this by “understanding” sentence structure, context, syntax, synonyms, punctuation, etc. It is very sophisticated software that can discriminate between “a burn to the back of the hand” and “a burning feeling in the hand.” It knows that the finger is a part of the hand and understands the difference between “fractured 5th rib” and “5 fractured ribs.” The software is accurate enough to reliably assign ICD-9 and CPT-4 codes and thereby create a bill from free-form text. It does this automatically and virtually instantly upon completion of the transcription.
This is new technology: it has been commercially available for emergency medicine for little more than a year, and is now coming online for radiology. Despite how new these programs are, they have successfully coded hundreds of thou-sands of charts and performed well in more than 100 audits. In fact, the audits have shown that the NLP by computer is more consistent and reliable than the coding done by humans because the computer eliminates the differences in as-sumptions and interpretations from one coder to another.
The combination of speech recognition and NLP is an especially exciting combination that promises to revolutionize the workflow in medical records. By providing both the transcription and the bill to the physician immediately after dictation, the physician has an opportunity to view the text and evaluate its coding implications on a real-time basis. If there are documentation deficiencies, they can be corrected by the physician at the time of signature. In addition, the physician, who bears the ultimate responsibility for the coding, can view and verify the code as well as authenticate the text–all before the text is printed or the bill is sent out.
The breakthroughs in speech recognition and NLP, and their imminent availability on the Internet, opens new horizons for the electronic medical record. For example, physicians could dictate from home, and edit and correct their work from home. Coders and transcribers could also work from home over the Internet. The dictated free text can now be automatically abstracted and uploaded into the databases of the hospital information system. Coding, quality assurance and utilization review can all be automated according to predetermined rules.
We are not quite there yet. But both speech recognition and natural language processing have passed the critical threshold of usability, and combining them will open up new possibilities for the electronic medical record. *
Dr. John A. Holbrook is the advisory consultant for business development at A-Life Medical Inc. (www.alifemedical.com), San Diego.
Visit Our Web Site for More On Speech Recognition
New developments in dictation for health care organizations combine continuous speech recognition technology with traditional digital dictation and transcription capabilities and services.
To find out more, log on to our Web site at www.health-information.advanceweb.com and read the feature story called “Integrating Speech Technology with Traditional Transcription.”