Volume 15, No. 3 
July 2011


Michael Wilkinson
 
 



Front Page

Translation Journal
 
Translators' Tools


WordSmith Tools:

The best corpus analysis program for translators?

by Michael Wilkinson


1 Introduction


n recent years I have written several articles discussing some of the ways in which corpus analysis programs, and especially concordancers, can be useful performance-enhancing aids in translating. Some of these articles are viewable online (see, for example, Wilkinson 2005a; 2005b; 2006). Moreover the usefulness of concordancers has been increased by the availability of user-friendly freeware that enables translators to rapidly compile their own specialized corpora (see Wilkinson 2010). In these articles I have made use of WordSmith Tools version 4 (Scott 2004) and version 5 (Scott 2008). But is WordSmith the best corpus analysis program for translators? In this article I shall examine the pros and cons of WordSmith in meeting the translator's needs.


2 Corpus Analysis Tools

Although WordSmith 5 has not been made specifically with translators in mind, it does go a long way in meeting the needs of the translator.
There are currently a variety of corpus analysis tools available. Some are commercially marketed; some are available as free downloads; some are especially designed for working with bilingual and multilingual parallel corpora; and some have been designed to access specific corpora (e.g. XAIRA, formerly called SARA, has been set up for the 100 million word British National Corpus).

However the fact is that, though some of the corpus analysis tools available today can serve as very useful translation aids, they have been designed primarily with linguistic researchers, computational linguists, lexicographers and language teachers rather than with translators in mind. Over a decade ago Jääskeläinen & Mauranen (2000) argued that a tailor-made translators' concordancer is needed before corpora can be adopted on a larger scale in the translation industry.

Most corpus analysis programs include a concordancer which finds all the occurrences of a search word, or search pattern, and displays them in the center of your screen as a keyword-in-context (KWIC) display, together with a span of co-text to the left and right (See e.g. Figure 7).

In addition to the concordancer, many programs, including WordSmith, comprise other features, such as a tool for generating word-lists, or a key word tool that can locate and identify words that occur with an unusually high frequency in your corpus when it is compared with another corpus. However the concordancer is of most direct use to the translator, and this is the tool that I shall concentrate on in this article.


3 About WordSmith Tools

WordSmith Tools can be regarded as dating back to 1993, when its DOS-based predecessor, MicroConcord, developed by Mike Scott and Tim Johns, was released. Version 1 of WordSmith Tools, developed by Mike Scott, was released in 1996; version 2 in 1997; version 3 in 1999; version 4 in 2004; and version 5 in 2008. At the time of writing, version 6 is in beta format, and is due to be released later in 2011.


4 The Translator's Needs

What then would be the ideal features of a translator-customised concordancer? Based on my own decade-long experience of using concordancers as translation aids, and based on the feedback from students of translation whom I have taught in the past 5 years, I would argue that a good concordancer should have all or most of the features listed in Figure 1.

  • It should be easy to learn and fast to use.
  • Selecting corpora for analysis should be straightforward.
  • Entering searches for a single search pattern or several search patterns should be easy.
  • Several wildcards should be available to replace words or characters when searching.
  • The KWIC display should appear rapidly, with the search term centralised and with co-text to left and right.
  • The KWIC display should be clear and "tidy".
  • It should be easy to sort the concordance lines (e.g. arranging words to left or right alphabetically, as well as the centre node if several terms have been searched for simultaneously).
  • It should be easy to view a concordance line within a wider context - at least as a full sentence or within a full paragraph, and preferably within the full text that it derives from.
  • The program should be stable; i.e. it should not freeze or crash.
  • It should be affordable.
Figure 1: The ideal qualities of a concordancer for the translator


Below I will discuss whether WordSmith Tools 5.0 fulfils the requirements listed above.


4.1 Learnability & Usability

Most of the student groups in a workshop experiment conducted by Varantola (2003) at the University of Tampere had problems with WordSmith - the main complaint being that the software tended to "play up" at times and cause frustration, and similarly in an experiment conducted by Zanettin (2002) at the University of Bologna students complained about the lack of user-friendliness of WordSmith. Furthermore, Jääskeläinen & Mauranen (2000) in their study of professional translators in the wood-processing industry using a purpose-made corpus and WordSmith found that the translators often complained that, although their purpose-made special field corpus was a useful resource, the concordancer was complicated and difficult to use.

Perhaps part of the problem in the experiments mentioned above was with WordSmith 3.0 (Scott, 1999). Firstly its interface was rather "cluttered", with an excessive amount of short-cut buttons. Not all of the functions needed by the translator were immediately self-evident, and users were perhaps slightly overwhelmed by all the features available to them. Secondly it was perhaps less stable than later versions of the program. And in addition, perhaps students using the program between 1999 and 2003 were not so computer-literate as those using the software more recently.

WordSmith 4.0 and WordSmith 5.0 tidied up the interface considerably, making it more intuitive and user-friendly. Moreover these later versions seem to be more stable than their predecessor.

Most of my students find the more recent versions of WordSmith easy to learn, especially after a bit of guidance from the teacher. However some have complained that if they do not use the program for several months, it requires some time to get back into the routine of using it. Of course this could be said of most software. Fortunately, for those who do not have a teacher to guide them, or whose skills have got rusty, the WordSmith website has an excellent Step-by-step guide (see Figure 2), which includes a tutorial on the basics of concordancing. This guide is available in various languages. (Note that WordSmith can handle a variety of languages, including Chinese, Japanese and Russian).


4.2 Selecting a corpus

Choosing a corpus to analyse is very straightforward:


Figure 2: Choosing texts

One slightly irritating feature is that each time you choose texts a box appears informing you that "Different texts can be chosen for the various Tools. You're choosing for Concord." You are given the choice of OK/Cancel/Help. This feature is unnecessary. (You don't get this message if you first open the Concord program and choose your texts from there rather than via the main Controller).

Figure 3 shows the view I get after choosing the 101 files of my self-compiled corpus of texts from British and American tourist brochures.


Figure 3: Selecting a corpus

I now press the OK button, and I'm ready to exploit the corpus. (You can de-select the corpus by pressing the "Clear" button.)


4.3 Searching & Wildcards

From the WST Controller I now open the Concord tool and select "New..." from the drop-down menu. In the "Getting started" window (Figure 4), I enter a search word or search pattern and press "OK":


Figure 4: Entering a search patter

Figure 4 also shows some, but not all, of the very useful wildcards at my disposal.



4.4 The KWIC display

After entering my search pattern, I press OK, and the KWIC display appears within seconds. Unfortunately the display generated by the default settings includes numerous columns that are of no use to the translator, and which thus reduce the amount of useful co-text on either side of the search pattern (Figure 5). Holding-and-dragging with the mouse is needed for each new concordance window in order to make maximum room for co-text.

In addition to the concordance line, the only extra information that the translator might need is the name of the file that the concordance line belongs to. In this respect, the default displays of the concordancers mentioned in the last section of this article are better.


Figure 5: Superfluous columns

Admittedly, this problem of superfluous columns can be eliminated by adjusting (and saving) the layout settings (as has been done in Figure 7), but it would be nice if the translator was spared this hassle.


4.5 Sorting & viewing in context

Sorting the KWIC lines is effortless. The Concordance Sort box is just a couple of mouse-clicks away (Figure 6); here one can, for example, sort collocates alphabetically to the left and right of the search word. However one has to ensure that the "Ascending" box is ticked, since otherwise the sort appears in reverse alphabetical order. This box has an unfortunate habit of toggling between ascending and descending. It would be an improvement if this box was always ticked by default.


Figure 6

Figure 7 shows the results generated by the same search pattern as in Figures 4 & 5, but now the layout has been customized to remove the superfluous columns, and the lines have been sorted with L1 as the main sort, L2 as the second sort, and L3 as the third sort.


Figure 7

Double clicking on any line will show us the full context:


Figure 8: Viewing line 184 in context

When I have finished examining the results of my search, I can do a new search by clicking on "File", and selecting "New...." Concord now asks me whether I want to start another window, and whether I want to save my Concordance list:


Figure 9a


Figure 9b

This is another rather irritating feature. After having consulted the concordance lines, the translator rarely wants to keep the window open, let alone save the results. Admittedly this only requires clicking "No" and then "No" again, but it is two clicks too many, and when one is doing a lot of searches this mounts up. The two "rival" corpus analysis programs mentioned at the end of this article avoid this. One of them has a "Save" tab on the KWIC display that the translator can use if s/he wishes.


4.6 Stability

WordSmith Tools is available as a stand-alone version for individual users or as a multi-user network version for organisations, including educational institutions. The network version of version 4 used by my university between 2005 and 2008 did tend to crash quite often. The reason is still a mystery. The network version of version 5, which has been used on campus since 2009, also seems to play up for some students some of the time, but does seem to be more stable than its predecessor.

Since 2006, over a hundred of my translation students have purchased a stand-alone single-user licence for either WordSmith Tools version 4 or version 5. The vast majority of them have found that the program works smoothly and crashes only very rarely, if at all. My experience has been the same: especially version 5 has been very stable. When there have been minor glitches, I or my students have pointed these out to Mike Scott, and he has been very quick to fix them.


4.7 Affordability

A single-user licence for WordSmith costs £50 plus £9.50 VAT. At the exchange rates prevailing at the time of writing (May 26, 2011) this amounts to €75 or $105. Most of my student translators are unwilling to pay this amount. However large discounts are given for bulk purchases - 40% for a bundle of 10 single licences, and even bigger discounts for bigger bundles. The purchase must be made by one person, a group leader or course tutor, who makes one payment covering all. Many of my student translators are keen to purchase the program at the 40% discount rate.


5 Other features and considerations

The concordancer of WordSmith can do a lot more than has been covered here. For example you can restrict a concordance search by specifying a context word or context words which either must (or must not) be present within a certain number of words of your search word. (For more about this feature see the section about Advanced Searching in Wilkinson 2007.) There is also a collocates display that shows the collocates of the search word in frequency order, as well as a cluster display that shows words in your concordance which are found repeatedly together in each other's company.

Note also that you can customise the font of the concordance lines as well as the colours of, for example, the search word and the sort words to draw attention to collocates (as can be seen in Figure 7).

WordSmith 5 has been programmed with the Windows operating system in mind, which seems to be the standard among corpus tools. However it can be run on a Mac by using a dual-boot system like Parallels or Boot Camp.

You can download the program from the Wordsmith website and try it out in demo mode (which does everything the full program does but only shows you a sample of the output) without purchasing a licence.


6 The best corpus analysis program?

Although WordSmith 5 has not been made specifically with translators in mind, it does go a long way in meeting the needs of the translator outlined in Figure 1. Its concordancer is the best I have come across to date, though there are several other viable options available.

Of the other commercially-marketed programs, I think that one of the best alternatives to WordSmith is MonoConc Pro (MP 2.2). You can get more information about MP 2.2 here, download a demo version here, and obtain basic instructions here.

For those on a zero-budget, the best freeware program I have come across is AntConc. You can download the latest version as well as instructions for using it here. Although not as sleek in appearance as WordSmith, and though somewhat slower, it fulfils all of the needs of the translator listed in Figure 1. I agree with Maher et al (2008), who in their article on acquiring or enhancing a translation specialism describe AntConc as "a user-friendly concordancer with an intuitive interface".

 


References:

Anthony, L. (2011). AntConc (Version 3.2.2). Tokyo, Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/

Barlow, Michael (2004) MonoConc Pro 2.2. Athelstan Publications.

Jääskeläinen, Riitta and Mauranen, Anna (2000). "Project SPIRIT: Development of a Corpus on the Timber Industry". Unpublished research report, University of Joensuu, Savonlinna School of Translation Studies.

Maher, Ailish et al (2008). "Acquiring or enhancing a translation specialism: the monolingual corpus-guided approach", in The Journal of Specialised Translation, Issue 10, 2008. Online at: http://www.jostrans.org/issue10/art_maher.php

Scott, Mike (1999). WordSmith Tools version 3, Oxford: Oxford University Press.

Scott, Mike (2004). WordSmith Tools version 4, Oxford: Oxford University Press.

Scott, Mike (2008). WordSmith Tools version 5, Liverpool: Lexical Analysis Software. http://www.lexically.net/wordsmith/index.html

Varantola, Krista (2003). "Translators and Disposable Corpora" in Zanettin, F., Bernardini S. and Stewart D., 2003, (eds.) Corpora in Translator Education Manchester: St Jerome, pp 55-70.

Wilkinson, Michael (2005a). "Using a Specialized Corpus to Improve Translation Quality," in Translation Journal, Volume 9, No 3. Online at: http://translationjournal.net/journal/33corpus.htm

Wilkinson, Michael (2005b). "Discovering Translation Equivalents in a Tourism Corpus by Means of Fuzzy Searching," in Translation Journal, Volume 9, No 4. Online at: http://translationjournal.net/journal/34corpus.htm

Wilkinson, Michael (2006). "The corpus analysis tool - an under-exploited translation aid" in Kääntäjä 7/2006. Online at: http://www.lexically.net/wordsmith/corpus_linguistics_links/Wilkinson.doc

Wilkinson, Michael (2007). "Corpora, Serendipity & Advanced Search Techniques", in The Journal of Specialised Translation, Issue 7, 2007. Online at: http://www.jostrans.org/issue07/art_wilkinson.php

Wilkinson, Michael (2010). "Quick Corpora Compiling Using Web as Corpus", in Translation Journal, Volume 14, No 3. Online at: http://translationjournal.net/journal/53corpus.htm

Zanettin, Federico (2002). "Corpora in Translation Practice". Paper presented at the First International Workshop on Language Resources for Translation Work and Research, Gran Canaria, 28 May 2002.


Acknowledgement

Thanks to Mike Scott for permission to use screenshots from Wordsmith Tools 5 & from the WordSmith Tools website.