Abstract
he types-tokens ratio (TTR), which is calculated by dividing the number of different word forms (types) in a text by the total number of the words
(tokens), roughly characterizes the lexical variety of the text. This makes it intriguing to compare this parameter in the original texts and Sin translations from the theoretical and practical points of view. After analyzing our proper empiric material, four Spanish-Ukrainian translations and
four Ukrainian-Spanish translations compared with their respective originals, along with the results of other researchers in different language
combinations, it turned out that TTR modifications show common tendencies depending on the typological characteristics of the source language and target
language, and the direction of translation, rather than the lexical variety of the text.
1. Introduction
There have been a number of attempts to describe the quantitative characteristics of vocabulary, both of a language or sublanguage system and of the
language of a particular author, from the point of view of the number and frequency of types, tokens, and and lexemes. We will not even attempt to
offer a more or less exhaustive list of these. Gradually, with the development of corpus linguistics, theorists of translation studies have picked up
several quantitative ideas from linguistics and, trying to make their criteria for evaluating the differences between original and target texts and the
lexical-stylistic adequacy of the translation objective, started to calculate those parameters, which was easily accomplished using computing techniques.
Within the last decade there has been a boom in the amoount of research into quantitative parameters in translation studies due to the use of electronic
corpora. Although attempts to create corpora started before, corpus-based research did not emerge until the late 90s (Kruger, 70), being theoretically
generalized in translation studies by M. Baker (Baker, 175-186) and other theoreticians. The availability of copora, their relatively easy compilation
and/compatibility with personal computers meant that investigations carried out by individual researchers, even with manually compiled corpora, became
possible and popular.
TTR can be a useful parameter for comparing translations with the respective originals from the practical and theoretical points of view. | With the use of electronic text analysis tools, it became possible to calculate the number of words and also word-forms (‘types,’ also called
‘orthographic words’) of a text automatically, literally with one click of a mouse, which was previously possible only by the method of total
continuous extraction of samples or the like. Logically, the number of types compared to the total number of the words in a text will give a coefficient
indirectly indicating the lexical richness of a text. “The higher the ratio, the more varied the vocabulary, i.e. the implication is that
there is little repetition” (A.Kruger, 74). This coefficient, obtained by dividing the number of types by the number of tokens (also called
‘running words’), was first named TTR (types/tokens ratio), presumably by M.Templin in 1957 in the area of language didactics (cit. after Rhea,
2007, 476), highlighting a wide field for investigations in particular and general translation studies with the purpose of unveiling one more universal
parameter of translation.
For example, the sentence “I have to buy some bread, because I have no bread” is stylistically awkward, and its TTR is low. (three word forms
are repeated, TTR = 8/11 = 0,73), whereas “I’ve run out of bread, so I need to buy some” is much better stylistically and richer lexically,
and its TTR is higher (only one word form is repeated, TTR = 10/11 = 0,91).
However, this rule is only applicable with reservations expressed by words such as ‘likely’ and ‘indirectly.’ And we should add another implication to that of A.Kruger that there are few repetitions of different types of the same lexeme. As the number of types may be quite extensive
due to the large number of grammatical forms for a lexeme in inflexional or incorporating languages, TTR should be very sensitive to the variety of grammatical
forms in the text. For instance, in the Present tense of Indicative Mode in English a verbal lexeme presents two flexed forms; in Ukrainian, as well as in
Spanish, six forms. That’s why a high TTR may indirectly indicate not only lexical richness, but also grammatical (morphological) richness. A natural
question is: which of the factors, lexical or grammatical richness, is more significant in a TTR? In spite of this doubt, it would be hard to deny that for the same language a text characterized by a higher TTR is certainly richer from the lexical point of view. However, the same statement is
questionable when comparing texts in different languages, as usually happens in translation.
2. Related works and discussion
It has been stated that translated texts in a language differ from their original by a lower TTR (V.Pápai, 157), which can suggest that they are less
rich lexically.
For instance, V.Pápai, having researched explicitation strategies in translation using four English-Hungarian fiction translations in her work
“Explicitation: a universal of translated text?” argues that TTR is lower in translated texts than in non-translated text in Hungarian
(V.Pápai, 159). But this does not necessarily mean that TTR should be lower in a translated text compared with the original. For instance, A.Kutuzov,
from Tyumen State University, shows that in English-Russian translation the TTR becomes higher (A.Kutuzov, 10). Meanwhile, A.Kruger demonstrates that in
English-German translations the TTR is lower than in the original (his empiric base was four Shakespeare texts) (A.Kruger, 74). So, a preliminary
theoretical analysis suggests that TTR changes show a noticeable dependence on the language combination. As these changes may also depend on the
translation direction, in the present research we are attempting to examine this hypothesis using both Spanish-Ukrainian and Ukrainian-Spanish
translations, as well as trying to reveal the regularities of these dependences.
Before introducing our results, we should first stop and think about the strengths and weaknesses of the TTR comparison method to describe the lexical
richness of a text. It must be accepted that this method is too simple and approximate. It is undeniable that this ratio is sensitive to text or corpus
length. The longer a text, the more likely it is that words will be repeated, thus lowering the ratio; thus, in short texts this ratio is not
representative. This ratio is widely used, since it can be easily calculated by any text analysis tool and the functioning of these tools does not depend
on the language system.
However, a high number of types does not necessarily mean a high number of lexemes. To be more exact, if we want to calculate the lexical variety of a
text, we should divide the number of lexemes (i.e. their respective lemmas used in a text) by the number of tokens. Since it is quite time-consuming to calculate
the lexemes, their number is usually not taken into account. Let us incidentally note that also the number of lemmas can be calculated by specific
software known as lemmatizer, which is designed for every language separately and usually requires a time-consuming work of processing a great number of
morphological rules, exceptions, and vocabulary. It is usually not freeware. Thus, the TTR seems to show easily, indirectly, and roughly the variety of
words, rather than the lexical richness of a text; it is “a simple indication of the superficial lexical complexity of a text” (Munday 1998:4)
along with its grammatical complexity--we might add. In spite of the above, we do not deny by any means its theoretical usefulness. A. Kutuzov, for
instance, after researching the variation of TTR from the original to the translated text, concludes that their graphs are
extremely similar from chapter to chapter (A.Kutuov, 8-9). A. Kutuzov’s method by itself can be another useful tool to ‘measure’ the adequacy of translation.
Unfortunately, we cannot afford to concentrate here on other important and interesting uses of TTR, although they do exist.
3. Hypothesis
As shown above, the number of types in a text may depend on two basic factors: the number of lexemes and the number of different grammatical forms. Hypothetically, in
non-flextional and incorporating languages the TTR should be higher, as the same lexeme will present a wide number of types, while in ‘more
analytic’ languages the TTR should be lower and tending to approach the ‘lemmas/tokens’ ratio, since most lemmas would present only one type
(type number ≈ lemmas number). This hypothesis (hypothesis #1), both plausible and logical, we suppose, will not present serious contradictions,
although it is still to be proven in the area of contrastive linguistics. It needs to be tested by comparing original (untranslated) texts in different
languages with the same or similar content, such as international agreements, constitutions, laws, similar literary genres etc. Nevertheless, as indicated
above, we have seen a clear dependence of the changes in TTR on the language combination and translation direction. Ch.Ho-Jeong has
observed in English-Korean and Korean-English translations that several changes, such as contraction/expansion of the text, depend on the
direction of translation (Ch.Ho-Jeong, 362). On the other hand, E. Kelih, investigating translation of a Russian novel into 11 Slavic languages (E.Kelih,
179) implicitly proves that the TTR changes depend on the source and target languages. Let us incidentally note that we deduced that by attentively reading
his article, because the researcher miscalculated the TTR by confusing the divisor and the dividend.
Our actual hypothesis (hypothesis #2) will refer to translation studies, not to contrastive linguistics: when the degree of synthetism of the language increases from the original to the translation, the TTR rises, and, vice versa, when the degree of synthetism decreases from the original to the translation, the TTR will decrease. If hypothesis #2 is correct, it may also
indirectly confirm hypothesis #1.
4. Empirical test
Assuming that hypotheses #1 and #2 are correct, i.e., when translating from an analytic language into a flexional one, the TTR rises, and, vice versa, when
translating from an inflexional or incorporating language into a more analytic one, the TTR decreases, our hypothesis is true (naturally, there should be
room for exceptions for the influence of extralinguistic factors). If we deal with Spanish and Ukrainian texts, Spanish is a more analytic language
compared to Ukrainian. After analyzing four Spanish-Ukrainian and four Ukrainian-Spanish fiction translations, we obtained the following results:
Table 1. TTR changes in Spanish-Ukrainian and Ukrainian translation.
Work
|
Total types in the original
|
Total tokens in the original
|
Types / tokens ratio in the original
|
Total types in the translation
|
Total tokens in the translation
|
Types / Tokens Ratio in the translation
|
TTR change
|
Translator
|
Confir-rmation
of the hypothesis
|
Spanish – Ukrainian translation
|
G.García Márquez “El amor en los tiempos del cólera”
|
15 352
|
145 108
|
0,1058
|
28 357
|
126 394
|
0,2244
|
0,47
(rises)
|
V.Shovkun
|
+
|
B. Pérez Gadós “Doña Perfecta”
|
11 117
|
65 177
|
0,1705
|
15 827
|
54 474
|
0,2905
|
0,59
(rises)
|
Zh.Konye-va
|
+
|
P.A. de Alarcón “El sombrero de tres picos”
|
5 572
|
25 768
|
0,2162
|
7 303
|
20 622
|
0,3541
|
0,61
(rises)
|
Zh.Konye-va
|
+
|
P.A. de Alarcón “El sombrero de tres picos”
|
5 572
|
25 768
|
0,2162
|
6 881
|
20 117
|
0,3420
|
0,63
(rises)
|
L.Dobryan-s’ka, L.Kolesnyk
|
+
|
Ukrainian-Spanish translation
|
І.Франко “Захар
Беркут”
|
13 352
|
50 372
|
0,2651
|
10 049
|
61 472
|
0,1635
|
-0,62
(decreases)
|
S.Ryzva-niuk
|
+
|
М. Коцюбинський “Тіні
забутих предків”
|
6 197
|
15 766
|
0,3811
|
5 639
|
26 027
|
0,2167
|
-0,57
(decreases)
|
J.Bory-syuk
|
+
|
О.Довженко
“Зачарована Десна”
|
6 081
|
15 956
|
0,3811
|
5 523
|
19 828
|
0,2785
|
-0,73
(decreses)
|
R.Hupalo
|
+
|
Ю. Яновський
“Вершники”
|
10 122
|
27 123
|
0,3732
|
8 647
|
38 325
|
0,2263
|
-0,6
(decreases)
|
S.Ryzva-niuk
|
+
|
As we see from the Table 1, the TTR decreases in all instances of the Ukrainian-Spanish translation direction and it rises in all instances of the
Ukrainian-Spanish translations of our corpus. This tendency does not seem to depend on the translator.
5.
Data interpretation and generalization
As we can see from Table 1, the results of the randomly chosen eight texts and their respective translations prove that TTR rises in Spanish-Ukrainian
translation and it decreases in the opposite direction. This seems to be, if not a universally valid, but quite a clear tendency for this pair of
languages. As this conclusion is valid solely for Spanish-Ukrainian and Ukrainian-Spanish translations, in order to extrapolate the results from different
particular theories into the general one, we propose a table which will clearly indicate the general tendency. We’ve gathered several
researchers’ results in Table 2.
Table 2. TTR changes in translation within different language combinations.
|
Direction of translation
|
Degree of synthetism of the target language
|
TTR
|
Researcher
|
Confirmation of the hypothesis
|
1
|
English-Russian
|
Rises
|
rises
|
A. Kutuzov
(Kutuzov,10)
|
+
|
2
|
English-German
|
rises
|
decreases
|
A. Kruger
(Kruger, 74)
|
-
|
3
|
Spanish - English
|
decreases
|
decreases
|
J. Munday 1998, (Munday 4)
|
+
|
4
|
English-Chinese
|
decreases
|
decreases
|
Y. Tsai (Tsai, 75)
|
+
|
5
|
English-Polish
|
decreases
|
decreases
|
R. Uzar
(R. Uzar, 259)
|
+
|
6
|
Russian- Macedonian
|
decreases
|
decreases
|
E. Kellih
(Kelih, 179)
|
+
|
7
|
Russian-Serbian
|
decreases
|
decreases
|
E. Kellih
(Kelih, 179)
|
+
|
8
|
Russian-Bulgarian
|
decreases
|
decreases
|
E. Kellih
(Kelih, 179)
|
+
|
9
|
Russian-Slovene
|
decreases
|
decreases
|
E. Kellih
(Kelih, 179)
|
+
|
10
|
Russian-Croation
|
decreases
|
decreases
|
E. Kellih
(Kelih, 179)
|
+
|
11
|
Spanish-Ukranian
|
Rises
|
rises
|
S. Fokin
(the present study)
|
+
|
12
|
Ukranian-Spanish
|
decreases
|
decreases
|
S. Fokin
(the present study)
|
+
|
13
|
Finnish-Russian
|
decreases
|
decreases
|
M. Kopotev
(Копотев, 379)
|
+
|
Therefore, the general picture mostly confirms the hypothesis. Exception number 2 (English-German translation, A. Kruger’s data) may have an
explanation in extralingustic factors. However, we consider, try as we might, this will remain a tendency and not a general rule, because translation
can be a conscious process, so that sometimes translators could consciously influence the TTR index for their own reasons, for example, trying
to show the richness of their vocabulary or that of their native language, while it is quite absurd to imagine that a translator would try to artificially increase the number of grammatical forms in the translated text.
We cannot deny that translated texts show a lower TTR in comparison with the original texts, but ‘the third code,’ evidently, is not the only
factor that influences the changes in TTR in translation; the typological differences between the source language and the target language turn out to be
a much more powerful factor.
6. Conclusion
TTR can be a useful parameter for comparing translations with the respective originals from the practical and theoretical points of view. Changes in the TTR in
translation can indirectly indicate modifications in the lexical variety; thus, it can be important for roughly evaluating this aspect of the adequacy of
translation, as well as the translator’s and the author’s idiostyle. Much more significant, from our point of view, is its theoretical significance. Apart from
being a universal fact that translated text is characterized by a lower TTR than the original (consequently is less varied lexically), the change in the TTR in
translation follows a common tendency. When translating from an analytic language into a more synthetic one, the TTR rises; translating in the opposite
direction, it decreases. While this is a strong tendency, it is not a universal law because of the strong influence of extralinguistic factors. In order
to make this kind of research more precise, the ratio of the number of lemmas used to the number of tokens (lemmas-token ratio) should be applied when
evaluating the lexical richness of a text in the original and the translation, although this method is further complicated by the lack of lemmatizers for
several languages and their high cost.
References
Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead. In Terminology, LSP, and translation: studies in language
engineering in honour of Juan C. Sager. – Amsterdam: John Benjamins. – P. 175-186.
Dovzhenko, A. (1972). El Desná encantado / Traducido por R. Hupalo. – Kiev: Dnipro.– 86 p.
Franko, I. (1983). Zakhar Bérkut / Traducido por S. Ryzvaniuk. – Kiev: Dnipro. – 199 p.
García Márquez G. (1986). El amor en los tiempos del cólera. – La Habana: Arte y literatura sólo para Cuba. – 460 p.
Ho-Jeong, Ch. (2006). Target Text Contraction in English-into-Korean Translations: A Contradiction of Presumed Translation Universals? In Meta:
journal des traducteurs, vol. 51 – n° 2. – P. 343-367.
Janovskyj, J. (1982). Los jinetes. / Traducido por S. Ryzvaniuk. – Kiev: Dnipro. – 127 p.
Kelih, E. (2009). Preliminary Analysis of a Slavic Parallel Corpus. – NLP, Corpus Linguistics, Corpus Based Grammar Research. Fifth
International Conference Smolenice, Slovakia, 25-27 November 2009. – Bratislava: Tribun. – P. 175-183. (Accessed online on 21 November
2012 at
http://www.uni-graz.at/emmerich.kelih/Publikationen/2009_slovko_slavic_parallel_corpora_kak_zakaljalas_stal_kelih.pdf
)
Kotsiubinskiy, M. (1972). La sombra de los antepasados olvidados y otros relatos / Traducido del por J. Borysiuk. – K.: Dnipro. – 330 p.
Kruger, A. (2002). Corpus-based translation research: its development and implications for general, literary and Bible translation. In. Acta
Theologica, Supplementum 2. – P. 70-106.
Kutuzov, A. (2010). Change of word types to word tokens ratio in the course of translation (based on Russian translations of k. Vonnegut's novels).
In International Computational Linguistic Conference “Dialog-21” (Accessed online on 21 November 2012 at http://arxiv.org/ftp/arxiv/papers/1003/1003.0337.pdf)
Munday, J. (1998). A computer-assisted approach to the analysis of translation shifts. In Meta: journal des traducteurs, vol. 43, n° 4. –
P. 542-556.
Pápai, V. (2004). Explicitation: a universal of translated text?
In Translation universals: Do they exist? / Edited by Anna Mauranen, Pekka Kujamäki. – Amsterdam. – John Benjamins B.V., 2004. – P.
145-164.
Pérez Galdós, B. (1964). Doña Perfecta. – Москва:
Радуга. – 276 с.
Rhea, P. (2007). Language disorders from infancy through adolescence: assessment & intervention, 3rd edn. – St. Louis:
Mosby/Elsevier. – 784 p.
Tsai, Y. (2010). Text Analysis of Patent Abstracts. In The Journal of Specialised Translation. – Issue 13. – National Taiwan University.
– P. 61-80. (Accessed online on 21 November 2012 at http://www.jostrans.org/issue13/art_tsai.pdf)
Uzar, R. (2002). A Corpus Methodology for Analysing Translation. In Cadernos de Tradução. – Universidade Federal de Santa Catarina.
– P. 237-265.
Аларкон, П.-А. (1958). Трикутний капелюх / пер. Л.
Добрянської і Л. Колесник.
– К.: Державне видавництво
художньої літератури. – 80 с.
Аларкон, П.-А. (1983). Трикутний капелюx / пер.
Ж.Конєвої. – К.: Дніпро. – 176 с.
Ґарсія Маркес, Ґ. (1999). Кохання в час холери. –
Львів: Класика. – 346 с.
Довженко, О. (1957).
Зачарована Десна.
Кіноповісті
. – Київ.: Радянський
письменник. – С.459-507.
Копотев, М. (2010).
Я никогда не буду так
говорить. Языковая
компетенция и языковая
рефлексия американской
финки из СССР
. In Slavica Helsingiensia 40 Instrumentarium of Linguistics. Sociolinguistic Approach to Non-Standard Russian. – Helsinki. - (Accessed online on 21
November 2012
http://www.helsinki.fi/slavicahelsingiensia/preview/sh40/pdf/26-sh40.pdf
).
Коцюбинський, М. (1989). Тіні забутих предків. In
Подарунок на іменини.
Оповідання, новели,
повісті. – К.
Перес Гальдос, Б. (1978).Донья Перфекта ; Сарагоса / пер. Ж. Конєвої
– Київ: Дніпро, 1978. – 350 с
Франко, І. (1994). Захар Беркут: Роман /
Микола Костомаров.
Чернигівка: Повість –
Київ: Укр. Центр духовної
культури,. – 312 с.
Яновський, Ю. (1984). Оповідання, романи, п'єси.
– Київ: Наукова думка. – 578 с.
|