Update on SPEECH Recognition
Emerging technology shifts its focus toward enabling physicians to ‘say it their way.’
By John A. Holbrook, MA, MD, FACEP
Throughout the past several years, CIOs have begun to see the promise of automated speech recognition (ASR) pass from successful demos to successful implementations. Recent improvements in ASR, combined with overdue attention to workflow and payoff, have spawned more and more success stories in the medical industry.
The Holy Grail
One of the reasons many of the old systems failed in their implementation was the high human and economic cost of having health care providers “train” a new ASR system to their voice. Because the computer could only “understand” one word at a time, most medical applications appeared as “database dictation,” which made extensive use of macros to leverage the discrete speech capabilities of ASR. This required a second costly element of training — namely, teaching users the structure of the large and complex knowledge base needed to actually create a report. In most cases these shortfalls presented insurmountable barriers to implementation and workflow.
Much was written about the Holy Grail of automated speech recognition (ASR): the combination of (1) “speaker independence,” (2) a large vocabulary and (3) continuous speech recognition in a single ASR application. It was thought that delivering the Holy Grail would open up ASR to widespread medical use. In the last several years, these three advances have been achieved in a single application with more or less success by most of the ASR software engines, including L&H, Dragon, IBM and Philips.
Virtually all the ASR engines over the last decade have utilized the Hidden Markov Model (HMM) approach to speech recognition. There have been no dramatic breakthroughs in this basic algorithm since its inception. However, the addition of a language or linguistic model to existing speech recognition engines has substantially enhanced recognition and limited the need to train the voice for each user. Systems currently available allow 97+ percent accuracy with only minutes of training. For some systems, the training can be automatic, and can occur in background during normal use.
Nontheless, the use of ASR in emergency medicine revealed that the Holy Grail — with 99+ percent accuracy, minimal training and the recognition of fluent speech — was not good enough. Paradoxically, success fell victim to the metrics. Because doctors could dictate freely and continuously rather than being limited to discrete words, physicians using ASR were able to create 400-word reports using free text with very little error. However, each report would have at least two to four misrecognitions, sometimes more. (Even one mistake in a 400-word report is not acceptable and has the potential to invalidate the whole report.) For physicians, adding the time of correction to the time of dictation doubled the time needed to produce reports in the emergency department. And in a busy department, adding three minutes of correction per report to three minutes of dictation added 90 minutes to the daily workflow.
In addition, real-world usage showed that, on average, doctors do a poor job of correcting their own work. For this reason, the most successful recent entries into medical ASR have added the function of a transcriptionist and have combined pre-developed but customizable macros (templates) with highly accurate continuous recognition based on specialty specific advanced linguistic models. The availability of a correctionist integrated into an ASR application becomes necessary for doctors who speak with accents, doctors who choose not to correct their own work, and doctors who choose to use the system like a simple dictation system.
Experience has shown that a successful implementation using 30 percent transcription/correction and 70 percent ASR integrated seamlessly in a single application is far preferable to a failed attempt at a 100 percent ASR solution. From the perspective of a CIO or department administrator, any application incorporating ASR must be a 100 percent solution that captures all the clinical text in a given site regardless of computer phobia or physician preference.
As CIOs have come to discover, ASR carries the hidden costs of network maintenance, user training and the inevitable upgrades. Removing the cost of transcription alone does not usually provide sufficient economic justification to convert to an ASR system. Other incentives — such as the scarcity of transcriptionists and the need for real-time availability of the clinical text — have not historically supported the value proposition for implementing ASR. Hence, the current lack of penetration of ASR in the medical text market.
It is becoming increasingly clear that one of the principal benefits of ASR may be the ability to combine the process of dictation with feedback about the dictation. By integrating the two processes into a single clinical event, one is able for the first time to feed back to the physician on a real-time basis the economic, clinical and quality implications of what he/she is saying. Giving the physician real-time feedback based on different clinical or economic rules involves the information system in the clinical process at a new level. While it’s true that medical informatics has aspired to this goal for over a quarter of a century by attempting to have physicians input clinical data in systems with predefined categories, most of these database systems have failed at the user interface. Why? Physicians generally want to use free text to “say things their way.” The problem has always been how to convert clinical free text to the categories needed for analysis and application of rule-sets.
An answer is found in Natural Language Processing (NLP), the ability of computers to “read and understand” the concepts in an ASCII free-text document. This functionality enables, for example, a fully coded and HCFA-compliant 1500 bill to be automatically extracted from a free-text dictated note from the emergency or radiology department. It enables an APC or a DRG to be automatically extracted from a collection of clinical notes. It enables an automatic and real-time audit for compliance to a set of clinical guidelines.
NLP technology has been pioneered in the medical domain by A-Life Medical, Inc. (San Diego.) Using techniques developed for the intelligence community, it employs a PC-based vector-processing approach to extract concepts and facts from free medical text. It does not rely on simple or complex word search — this approach yields only 20 percent of the answer 20 percent of the time, as Web-searching engines amply demonstrate. For example, word search cannot distinguish “abdominal pain” and “no abdominal pain.” Others have tried a technique that pre-codes templated or boilerplate medical text with hidden — or XML — tags. If the physician uses the templated text, the computer understands the meaning.
System with ‘self-awareness’
The whole point of ASR accepting unrestricted speech is that now physicians can “say it their way.” True NLP requires the interplay of linguistic, semantic and syntactic analysis to achieve a complete understanding of the meaning and context of a particular sentence.
To code adequately, the computer must be able to attribute the correct authority to statements: Does the patient think he might have cardiomegaly, or does the doctor think the patient has cardiomegaly? Only the second question points to a valid ICD code. The computer must know that the finger is part of the hand, and that a “burning feeling in the hand” is very different from a “burn to the hand.”
NLP-based systems are able to create and submit a medical bill without human intervention. The NLP engine has “self-awareness” of what it doesn’t know or isn’t sure of; it can alert humans when the results need review.
Here’s an example.
In its Intelliscribe project, A-Life Medical is coupling NLP with ASR to use the physician’s voice dictation to create a HCFA 1500 bill for emergency medicine and radiology. If ASR and NLP can be joined in one application to create a 1500 bill directly from a physician’s clinical voice dictation, the door is open to automating all ICD-9 and CPT-4 coding. DRGs and APCs can flow automatically from medical text. In fact, plans are underway to expand this combination of ASR and NLP into the creation of DRGs and APCs, as well as engines to automate quality assurance and utilization review into real-time processes.
It is expected that this type of application will create three distinct payoffs: (1) ASR will allow the physician to “say what he wants” and then provide instantaneous feedback to the physician as the clinical data is being created; (2) ASR will automate and standardize the process of coding and medical record abstraction, reducing labor costs, while at the same time increasing reliability; and (3) by using artificial intelligence to bridge the gap between clinical and financial data, the bill can become instantaneous, shortening days in accounts receivable.
Future trends in ASR
The ASR field may be on the verge of breakthroughs in the fundamental algorithms enabled by the use of neural networking systems. But from the perspective of a CIO in a health care organization, these algorithms will remain over the horizon for some time to come. Many other anticipated advances in ASR applications will reflect and exploit the revolution in telecommunications and connectivity.
We can expect to see a multitude of new ASR applications utilizing the traditional HMM algorithm, using many of the networking advances that have become the standard in other industry segments.
Thus, we expect to see “voice” passed over the Internet, recognized remotely and passed back to the client as both “flat text” and as some automated analysis, such as a bill or a quality report. We expect to see the use of hand-held devices both to capture the .WAV file and to present the text and the analysis back to the physician. We expect to see ASR implemented in a thin-client solution, and as an enterprise-wide solution for a whole hospital or health care network. Finally, we expect to see ASR coupled with NLP, using voice to populate the database of an HIS.
The combination of ASR and NLP may enable a new HIS to achieve many of the promises and expectations of the last 30 years in medical informatics. *
Dr. Holbrook is the advisory consultant for business development at A-Life Medical, Inc., San Diego. You can contact him at (858) 268-9999 or visit A-Life on the Web at www.alifemedical.com.
Integrating Speech Technology with TRADITIONAL TRANSCRIPTION
New developments in dictation for health care organizations combine continuous speech recognition technology with traditional digital dictation and transcription capabilities and services. These comprehensive, integrated solutions aim to provide health care organizations with dramatic cost savings, faster report turn-around time and other key benefits without jeopardizing physician acceptance.
These new solutions replace legacy dictation and transcription systems and enable health care organizations to deploy a more integrated department-wide system. Fundamentally different from traditional health care speech solutions, these systems rely on a flexible client/server architecture. Among many capabilities, these systems enable speech recognition to occur on the PC workstation in real time (as with most classic speech recognition applications) — or it can take place on a server using a technique known as “batch processing.”
These two modes allow physicians to choose the dictation style with which they are most comfortable. Client/server architecture also allows health care organizations to completely integrate their existing transcription providers for post-editing and correction. If physicians do not want to correct their own dictations, they can forward the documents to internal or outsourced transcriptionists for correction.
Integrated speech recognition/transcription systems offer a number of new benefits:
* Physicians can dictate clinical reports with completely natural, free-form speech, as if using a traditional transcription service.
* Organizations can significantly reduce their report turn-around times and related transcription costs.
* Speech processing occurs either at the workstation, for instantly available reports, or in batches, on a specialized server.
* Physicians need not change their practices to accommodate the technology, and can choose whether traditional transcriptionists or physicians edit the documents.
* Should the organization continue to use transcriptionists for post-editing and correction, these systems improve productivity by providing a rough draft that is quicker to edit than transcribing from scratch.
* Physicians can migrate seamlessly from existing technology (dictation and transcription) to speech recognition at their own pace with minimal interruption in workflow.
* These systems are a critical component of a health care organization’s information gathering system, providing a front-end tool that enables the organization to capture clinical information without changing the way a physician practices medicine.
* As with existing legacy dictation systems, a full telephony interface enables physicians to call in at any time to listen to the desired reports.
The success of these new systems is a direct result of the integration of advanced new speech technology into flexible and modern dictation and transcription systems. By combining the power of speech recognition technology with the familiarity of traditional dictation and transcription, these new systems provide health care organizations with significant business benefits while at the same time meeting the demanding needs of individual physicians.
— Peter Durlach, vice president of marketing and business development, L&H Healthcare Solutions Group