Dictator v. Dictator 1

No. 2, Volume 1
October 1997

Roger Fletcher learned Chinese in Hong Kong when it was still a British colony and he was still a member of the British armed forces. After becoming a civilian, he put his knowledge of Chinese to use, initially as a court interpreter in Cantonese. He also undertook translation work from Chinese, and the proportion of this increased over the years until he became a full-time translator.
He was always less cautious than many translators in embracing new technology and immediately saw the potential of dictation software for his own work. IBM also saw the potential of a translator case study for publicity purposes and subsequently persuaded him to become an accredited consultant and reseller as well. The British Computer Society awarded him a medal in their 1995 awards for “Recognition of innovation in the application of speech recognition technology using IBM VoiceType Dictation.”

Roger can be reached at 100016.234@compuserve.com

	From the Editor/Webmaster You Asked for It by Gabe Bokor
	Translator Profiles Take care of the sense... by Paul Danaher
	In Memoriam: Dr. Deanna L. Hammond by Jane M. Zorrilla
	Legal Translation Workshop Teaching German-English Legal Translation by Margaret Marks
	Dictator v. Dictator First Impressions of ViaVoice from IBM by Roger Fletcher
	NaturallySpeaking from Dragon Systems by William J. Grimes
	Science & Technology A Translator’s Guide to Organic Chemical Nomenclature IX by Chester E. Claff, Jr., Ph.D.
	Banking and Finance The Language of Inflation by Danilo Nogueira
	Translation in the News The Onionskin by Chris Durban
	Caught in the Web Web Surfing for Fun and Profit by Cathy Flick, Ph.D.
	Translators’ On-Line Resources by Gabe Bokor
	Translators’ Events
	Letters to the Editor
	Call for Papers

First Impressions of ViaVoice
Continuous Dictation Software from IBM

by Roger Fletcher

Since speech recognition software for computers became available three years ago, the Holy Grail for all manufacturers has been continuous speech dictation. The earlier versions required the speaker to insert a short pause of at least one tenth of a second between words. Although for many people this was an easy skill to acquire, it nevertheless posed a psychological barrier for others. In only three years we have moved from an isolated word dictation model requiring a dedicated adapter to a continuous speech model using an industry standard sound card.
   My own experience has been limited to IBM products, and my experience of continuous speech products to the beta version of IBM ViaVoice UK English.
   Believing that the only way fairly to test new software is to use it in a real situation, I recently used it to translate the Chinese radiation protection regulations signed by premier Li Peng himself. This amounted to a total of over 4,000 English words in translation, which I accomplished in less than a day.
   However, the experience was not without its problems, one of which was shedding the ingrained habit of inserting pauses between words. When I did this, little extra words were inserted in the pauses! When I spoke at normal conversational speed, the accuracy improved greatly, but every time I paused to think, the problem occurred again. This was clearly going to be a major problem with translation.
   It then occurred to me that I had raced through the enrolment at great speed in order to be able to begin the translation. Wondering if this had been a significant factor in the performance, I later re-enrolled and tried dictating again. The problem had largely disappeared! It is therefore clear that the manner of enrolment is of crucial importance, and some guidance may be useful here. In the UK English version, there are 473 sentences to read. This can be accomplished in two stages, 100 sentences followed by the remaining 373. I recommend reading the sentences at a moderate and comfortable speed while taking care to pronounce the words correctly. Clearly if we pronounce “and” as in Rock ’n’ Roll, we can expect recognition problems.
   However, it is likely that any continuous dictation product will be more sensitive to, for example, an intake of breath at the beginning of a sentence. I then discovered that the old style of isolated word speech also worked for difficult passages where much thought was required. What did cause the insertion of extra words was hesitation and lingering over a word. The program has learnt during enrolment how much time a particular word takes, and if we take three times as long while thinking, it will clearly try to interpret this as several words.
   As an indication of performance, I am dictating this article at normal conversational speed, and the first error occurred with the name of Li Peng, which is not entirely surprising. It is also not surprising that the names IBM and ViaVoice should appear correctly, but it is also of note that the Holy Grail was also correct and correctly capitalised.
   I have subsequently done several industrial chemistry translations, and the word urea caused problems, coming out as either “your ear” or even worse “your rear.” However, after a few corrections in context, the probability of correct recognition increased significantly. Other delightful examples were “force for us” instead of “phosphorus” and, not unreasonably, “potassium van a date” for “potassium vanadate.” After all, the previous isolated speech models knew what constituted a word because of the silence before and after the utterance, but continuous speech programs have no such obvious clue. Under the circumstances, I am very pleasantly surprised at just how well it manages this task, particularly with general vocabulary, which has already been thoroughly analyzed in the process of creating the program.
   Before doing the chemical translations mentioned above, I decided to give the so-called Vocabulary Expander a workout. I opened the Expander window and from the File menu opened a text file I had created by concatenating half a dozen previous chemical translations. I clicked on the Analyze menu item, and a drop-down list of words in the translation file but not in the ViaVoice vocabulary appeared. I selected those I wished to add by clicking with the mouse while holding down the Ctrl key. After clicking the Add button, I was prompted to record pronunciations of the added words as required. In the space of half an hour, I added some 200 words.
   One final word about system requirements: the minimum specified by IBM is a Pentium 166 MMX with 32 MB RAM for Windows 95 or 48 MB RAM for Windows NT. I have been using it on a Pentium Pro 200 with 32 MB RAM and Windows NT. While processing the enrolment, I kept receiving messages about low system resources, but it limped through to the end. After upgrading to 64 MB RAM, performance was very noticeably faster.
   I have been an enthusiastic advocate of dictation software ever since it appeared, while many others, doubtless correctly, regarded it as an immature technology. With that experience behind me, I can now say that dictating general texts with ViaVoice is, in comparison with previous versions, a much more pleasant and relaxed affair—and at one tenth of the original price, very much better value for money. However, I have lingering doubts as to whether the continuous speech product is yet a complete replacement for the isolated word model when dictating highly technical texts.

Editor’s Note: Click here for the review of another dictation software package appearing in this issue of the Translation Journal.