Medical Editing: The Next Level for MTs


Vol. 11 •Issue 16 • Page 14
Medical Editing: The Next Level for MTs

aamt track

Jefferson Howe, CMT

I can tell the same story that many long-time medical transcriptionists (MTs) tell. When I began medical transcription in 1984 and operated the hospital’s first XT computers in 1985, many believed speech recognition would replace our profession in 10 years. Ten years came and went and 17 years later, with my latest experience in speech recognition technology, I can safely explain that speech recognition won’t replace the MT profession in my lifetime. Speech recognition will enhance our ability to provide accurate and timely medical documen-tation while at the same time reducing our costs. With back- end recognition engines allowing for seamless productivity enhancement to the provider, many companies are banking on our expert knowledge of medical language and our very keen ears and eyes.

Before I share my experience, let me first explain the terminology. Speech recognition was the first type of recognition developed in the late 1960s through the early 1980s. It required discrete speech patterns, had limited vocabulary and was fully dependent on an individual speaker. That speaker would train sometimes for hours speaking in discrete or single-word patterns, and the system would develop and expand a vocabulary model based on that training. It was cumbersome for the user with limited access (usually just one terminal) and rarely successful in the fast-paced medical environment. Since that time, speech recognition technology has evolved and is able to recognize continuous speech patterns and natural flowing language. It allows for greater vocabulary and multiple users.

Front-End vs. Back-End

Some speech recognition systems are considered “front-end,” which require interactivity from the user who speaks and edits the document as it appears on the computer screen. Depending on the sophistication of the system, the selection of the next recognized word is based on the words coming beforehand. The benefit is an immediately accurate and timely document that the provider can authenticate and send at one time. The drawback remains a limited and structured vocabulary and, depending on the type of dictation, it may require high-end levels of computer interaction from the user in order to output the desired result.

“Back-end” speech recognition is seamless to the dictator. The provider dictates normally and the voice file is then processed through a series of recognition engines. Those recognition engines base the probability of a word selection on the context of words coming before and after that word. The result can then be edited by the provider or an MT much faster and more efficiently than standard transcription practices.

Our organization has been using a back-end speech recognition system in development over the last 18 months. It is very exciting, productive and absolutely state of the art. We’re achieving, consistently, at least double productivity (400-600 lines per hour) through an application service provider (ASP) model that allows for editing anywhere a computer can plug in. The provider dictates into a digital dictation system, and four minutes later the voice file is encrypted and transmitted for computer recognition in a data mega-center (such recognition relies on huge amounts of computer memory). The recognition is a highly sophisticated process that runs the voice file and text stream hundreds of times through intense algorithms based on historical text models from the provider. It also formats the stream of text into paragraphs and pre-defined headings. The one drawback is time. The technical process takes approximately four minutes for each minute of dictation to be recognized. Thus, a 10-minute dictation takes 40-minutes in the recognizer before being available for editing. No stat transfer summaries here.

When an MT logs on (again over the Internet with appropriate security and encryption), the recognized text is presented along with the voice file for editing. Because the voice file and text are integrated, the editing process is comparable to karaoke without the music. As the dictation is played, a red box moves word-by-word through the document, surrounding each word. Hot keys allow for quick fixes and adjustment of speed and volume. The MT still has all the conveniences of the latest in word processing software.

Once processed, the report is returned to the mega-center and the system analyzes the corrections made by the editor. In this way, it can learn from its mistakes, update formatting issues and reduce future recognition errors. The document, however, becomes immediately available for signature, printing, faxing or upload.

A Satisfying Experience?

As an MT, I was apprehensive about whether this would be an accepted and enjoyable experience. Medical editing is not as intimately involved with the creation of the report because the text is already formatted and appears on the page. However, the result has been a very satisfying experience. I believe an edited document can be more accurate than a simple transcribed document due to the elimination of any potential for typographical mistakes. An accurately recognized medical term is always spelled correctly. In addition, I am less tired at the end of the day, and there is no inkling of exhaustion in my hands, arms or shoulders. The most exciting benefit is a huge sense of accomplishment and completion, as one MT can produce approximately 70 reports a day of typical-length operative notes and procedures.

The difficulties experienced with speech recognition technology, as I use it, are the result of the seamless process. When an author changes his mind, inserts paragraphs or renumbers a sequence, the concept remains befuddling to a computerized recognizer. The subtleties of “2 grams” vs. “20 milligrams” and the formats of many numbers can also be problematic. It is also necessary to overcome the need to edit a document, when wiping it out and starting from scratch would be the most productive choice. Due to the extensive analysis required to establish an individual speech model, the process is not successful for all candidates. Yet speech recognition remains available to augment our professional field. n

Jefferson Howe is transcription manager at Eastern Maine Medical Center, South Portland, ME.