Creating a Definitive Guide on Speech Recognition

Vol. 18 •Issue 14 • Page 16
Creating a Definitive Guide on Speech Recognition

A cross-organizational group of volunteers is working together to help calm confusion and misinformation concerning speech recognition.

Inconsistencies abound. Confusion prevails. The market is cluttered with sales messages, and lacks definitive, peer-reviewed literature. In the speech recognition marketplace, there are many buzzwords and few solid, indisputable facts. The automated speech recognition technology (ASRT) work group, which consists of a broad group

representing speech recognition vendors, medical transcription service organizations (MTSOs), MTs and consumers, hopes to clear up the confusion that exists by publishing what the group hopes will be a definitive and evolving guide to speech recognition, in the form of a reference guide to the adoption of speech recognition.

ADVANCE spoke to several members of the work group, and found that while their backgrounds may be different, their goal is the same—to finally cut through marketing, misinformation and misperceptions and agree upon a common view on the best practices that should be used when implementing speech recognition. The medical transcription industry isn’t exactly noted for its consistency (can anyone give the definition of a line?), but the work group hopes that its end product will serve as a resource to those lost in a confusing maze of a marketplace. The Medical Transcription Industry Association (MTIA) in the past led an effort to standardize billing methods and produced the visible black character (VBC) white paper in an effort to iron out that bit of inconsistency in the industry, and the ASRT work group aims to help standardize some of the mumbo jumbo that centers around the speech recognition industry.

All Together Now

The ASRT work group was initially slated to be an initiative between MTIA, the American Health Information Management Association (AHIMA) and the Association for Healthcare Documentation Integrity (AHDI). There are representatives from each association on the work group, and MTIA has taken the reigns of the group.

According to George Catuogno, leader of the ASRT work group and president of Sten-Tel, Springfield, MA, the first release of the speech recognition adoption white paper is set for the end of August. Catuogno has plans for future releases, as the white paper will be an evolving document. The work group is chatting with HIT associations as well as the American Medical Association to see if the organizations would be interested in chiming in as subsequent versions of the paper are released. The first paper will mainly be geared toward the MTIA and AHDI audiences, the MTSOs and the MTs in the industry. With that in mind, most of the initial work will be focused on back-end speech recognition, where a medical editor reviews a document that has been run through a speech recognition engine.

Not Yet Up to Star Trek Speed

When speech recognition first hit the market, the first expectations were that the technology would eliminate transcription, be capable of handling anything and basically be much like the computers in Star Trek that understood language. Those expectations soon diminished, according to Klaus Stanglmayr, strategic product marketing manager, Philips Speech Recognition Systems, and ASRT work group member. “I think that over the years people realized, or are realizing, that speech recognition is not perfect, but it’s a useful tool,” Stanglmayr explained. “It’s a productivity tool. If you use it properly, if you use it the way it was designed, it will get the results that you need.”

As for eliminating transcription, that’s not likely, either, according to all of the experts who spoke with ADVANCE. In back-end speech recognition, an MT edits the document after the dictation runs through the speech engine. While MTs may make the transition to editor, they won’t be replaced by computers anytime soon. That leaves the task of training the MT work force for a completely new role, one of editor. The white paper will provide advice on training the work force for the transition to editor. “If you believe the marketing, you think, the industry is going to implement this, roll this out, people are going to get more productive and everybody wins,” said work group member Christopher Rehm, MD, chief medical officer, Spheris, Franklin, TN. “The reality is that you’ve got a work force that’s trained to do transcription, and you’ve got to go through a re-training effort for speech recognition editing.”

Training may be only part of the struggle when implementing speech recognition. Another misconception about speech recognition that the white paper hopes to address is the cost of implementing the technology. A company or facility that implements speech recognition technology may not factor in the costs of implementation, so the anticipated cost savings

may not live up to expectations. The white paper won’t lay out the costs in dollars and cents, according to Nick van Terheyden, MD, chief medical officer, M*Modal, Pittsburgh, PA. The group plans to present the components that need to be considered when implementing speech recognition, and will also address factors that feed into the overall cost equation, allowing readers of the white paper to come to their own conclusions.

Measuring the Impact

Likewise, when it comes to productivity measures, the white paper will present facts and let readers have their questions answered that way, rather than saying, your MTs will be X percent more productive when using speech recognition. Speech recognition technology varies from one vendor to the next, and every implementation won’t produce the exact same results. No one argues that productivity increases overall after the adjustment period, when the MTs are settled into their new roles as medical editors. Just how much productivity increases is debatable.

Isaac Aronov, chief technology officer, Integrated Document Solutions (IDS), Ft. Lauderdale, FL, watched the MTs employed by IDS have their productivity increase by 40-50 lines per hour after the employees got used to the technology. Some MTs become more productive than others, Aronov said, and actually, MTs with more experience tend to struggle more than new MTs. “The back-end speech recognition is a tool to empower the MTs who just came out of school or have very little experience,” Aronov explained.

Stanglmayr noted similar results. Philips published productivity studies on its speech recognition engine, and the productivity increases depend on a variety of different factors, he explained. “If you have a very seasoned, well-trained person with a lot of experience, you will probably see lower productivity gains than with someone who is relatively new to transcription,” Stanglmayr said.

In addition to looking at the factors affecting productivity gains, the work group will also set out to nail down some solid definitions surrounding speech recognition technology. Dr. van Terheyden explained that many terms in the industry are used to reference the technology and how it works, and there needs to be a point of reference for core definitions to make sure that everyone is talking in the same terminology. Dr. Rehm agreed that there has to be set terminology. “There are terms thrown around that people think are written-in-stone terminology, but really it’s company-specific terminology,” Dr. Rehm explained.

A Large Undertaking

Undoubtedly, trying to address all of the inconsistencies and misconceptions that surround speech recognition technology will not be an easy task. Catuogno noted that the white paper will become more of a reference guide for adoption of speech recognition technology as the paper evolves. He acknowledged that the paper will be broad-based and geared toward a wide audience as well. “We feel it would be beneficial to have a broad-based adoption white paper that folks could reference and consider an authoritative source of information to answer questions,” Catuogno said.

The goal of the paper, according to Dr. van Terheyden, will be to bring together the different agendas—the vendor side, the MTSO side and the consumer side, for now—to settle on a common document. “We want to bring all of it together so that everybody can say, hey, you know what, this is a fair representation. It actually defines things clearly and we can all sign off on this,” Dr. van Terheyden explained.

The primary desire, he added, is to get a clear understanding of the technology, its benefit and disadvantages, the economic component and its role in the health care documentation world. If the white paper then becomes the go-to guide on speech recognition, that would help bring the industry to consensus. “I think that would be highly attractive for everybody, so that we’re all working from the same information as opposed to fragmented pieces that may be disconnected,” Dr. van Terheyden said.

Lynn Jusinski is an associate editor with ADVANCE.

MTs/Editors: Sit Tight

You’re not going anywhere. All of the speech recognition experts who spoke with ADVANCE agreed that the MT (or medical editor) is necessary to the transcription industry, and that will remain a fact into the future, despite the use of speech recognition.

Isaac Aronov, chief technology officer with Integrated Document Solutions (IDS), Ft. Lauderdale, FL, told of when his company integrated their own speech recognition system. An MT was asked to write up problems she had with the system, so that the technology folks at IDS could try to make improvements. She presented them with the problem of follow-up, followup and follow up, three different spellings that the MT gave three different connotations for. “That’s where the MT has to be the arbitrator and the judge to decide which word goes where,” Aronov explained. “The speech engine will try to pick up on the context, but it does not have the mental capacity of a human being to be able to put a word in there.”

Yaniv Dagan, CEO of IDS, agreed, adding that having a person there to verify and edit is a must. “The transcriptionists’ value cannot be diminished,” he said. “The MT many times plays the role of the magician. She has to verify, she has to understand what a doctor is trying to say.”

According to Nick van Terheyden, MD, chief medical officer of M*Modal, Pittsburgh, PA, the Star Trek days where the computer recognizes and understands everything that’s being said—and does so perfectly—are “at best, a long way off.” Dr. van Terheyden talked about a blended model of front-end (no MT involved, and the clinician edits) and back-end (complete with a medical editor) speech recognition. Physicians using the front-end technology and editing their own work may not want to do every single report, and can choose to pass along some back-end speech recognition in a blended model of front- and back-end speech recognition.

Christopher Rehm, MD, chief medical officer, Spheris, Franklin, TN, agreed that the human intelligence is a necessary ingredient in the transcription world and added he believes that there will always be not just editing, but also traditional transcription. “There are going to be either physicians who don’t dictate in a manner that’s conducive to back-end or front-end speech recognition, or they have particular formatting or template requirements that are so involved or detailed that the engine doesn’t handle it, so the MT also has to arrange everything anyway, so it’s more productive just to type it,” Dr. Rehm said. “The reality is that we are a long, long way from traditional transcription disappearing.”

—Lynn Jusinski