Since speech recognition software for computers became available three years ago, the Holy Grail for all manufacturers has been continuous speech dictation. The earlier versions required the speaker to insert a short pause of at least one tenth of a second between words. Although for many people this was an easy skill to acquire, it nevertheless posed a psychological barrier for others. In only three years we have moved from an isolated word dictation model requiring a dedicated adapter to a continuous speech model using an industry standard sound card.
My own experience has been limited to IBM products, and my experience of continuous speech products to the beta version of IBM ViaVoice UK English.
Believing that the only way fairly to test new software is to use it in a real situation, I recently used it to translate the Chinese radiation protection regulations signed by premier Li Peng himself. This amounted to a total of over 4,000 English words in translation, which I accomplished in less than a day.
However, the experience was not without its problems, one of which was shedding the ingrained habit of inserting pauses between words. When I did this, little extra words were inserted in the pauses! When I spoke at normal conversational speed, the accuracy improved greatly, but every time I paused to think, the problem occurred again. This was clearly going to be a major problem with translation.
It then occurred to me that I had raced through the enrolment at great speed in order to be able to begin the translation. Wondering if this had been a significant factor in the performance, I later re-enrolled and tried dictating again. The problem had largely disappeared! It is therefore clear that the manner of enrolment is of crucial importance, and some guidance may be useful here. In the UK English version, there are 473 sentences to read. This can be accomplished in two stages, 100 sentences followed by the remaining 373. I recommend reading the sentences at a moderate and comfortable speed while taking care to pronounce the words correctly. Clearly if we pronounce and as in Rock n Roll, we can expect recognition problems.
However, it is likely that any continuous dictation product will be more sensitive to, for example, an intake of breath at the beginning of a sentence. I then discovered that the old style of isolated word speech also worked for difficult passages where much thought was required. What did cause the insertion of extra words was hesitation and lingering over a word. The program has learnt during enrolment how much time a particular word takes, and if we take three times as long while thinking, it will clearly try to interpret this as several words.
As an indication of performance, I am dictating this article at normal conversational speed, and the first error occurred with the name of Li Peng, which is not entirely surprising. It is also not surprising that the names IBM and ViaVoice should appear correctly, but it is also of note that the Holy Grail was also correct and correctly capitalised.
I have subsequently done several industrial chemistry translations, and the word urea caused problems, coming out as either your ear or even worse your rear. However, after a few corrections in context, the probability of correct recognition increased significantly. Other delightful examples were force for us instead of phosphorus and, not unreasonably, potassium van a date for potassium vanadate. After all, the previous isolated speech models knew what constituted a word because of the silence before and after the utterance, but continuous speech programs have no such obvious clue. Under the circumstances, I am very pleasantly surprised at just how well it manages this task, particularly with general vocabulary, which has already been thoroughly analyzed in the process of creating the program.
Before doing the chemical translations mentioned above, I decided to give the so-called Vocabulary Expander a workout. I opened the Expander window and from the File menu opened a text file I had created by concatenating half a dozen previous chemical translations. I clicked on the Analyze menu item, and a drop-down list of words in the translation file but not in the ViaVoice vocabulary appeared. I selected those I wished to add by clicking with the mouse while holding down the Ctrl key. After clicking the Add button, I was prompted to record pronunciations of the added words as required. In the space of half an hour, I added some 200 words.
One final word about system requirements: the minimum specified by IBM is a Pentium 166 MMX with 32 MB RAM for Windows 95 or 48 MB RAM for Windows NT. I have been using it on a Pentium Pro 200 with 32 MB RAM and Windows NT. While processing the enrolment, I kept receiving messages about low system resources, but it limped through to the end. After upgrading to 64 MB RAM, performance was very noticeably faster.
I have been an enthusiastic advocate of dictation software ever since it appeared, while many others, doubtless correctly, regarded it as an immature technology. With that experience behind me, I can now say that dictating general texts with ViaVoice is, in comparison with previous versions, a much more pleasant and relaxed affairand at one tenth of the original price, very much better value for money. However, I have lingering doubts as to whether the continuous speech product is yet a complete replacement for the isolated word model when dictating highly technical texts.
Editors Note: Click here for the review of another dictation software package appearing in this issue of the Translation Journal.
|