Transliteration Using Virtual Texts

Volume 3, No. 1
January 1999

Michael Walker is a writer, cultural theorist, visual artist, and poet. He lives in San Francisco, California. Walker is perhaps best known to the academic and biomedical communities for his work on the reformation of Mongolia’s health care system, and has also authored a number of journal articles on various aspects of Mongolian technological and legal reform. His other areas of research interest include HIV/AIDS education and prevention, geography and navigation, feminist themes in literature, and how the arts and sciences interact in various cultural settings. With regard to languages, his interests include: English, Spanish, Chinese, Mongolian, Russian, Hebrew, Hurrain, Arabic, and Persian. His personal interests include: soccer, running, hockey, cooking, and music. Mike can be reached at: mikewalker@geocities.com

Front Page

July ’97 Issue

October ’97 Issue

January ’98 Issue

April ’98 Issue

July ’98 Issue

October ’98 Issue

	A Unique Medium—The Flip Side by Gabe Bokor
	Index 1997-99
	Translator Profiles
	Correct Science + Elegant Wording = Smiling Client by S. Edmund Berger, Dr. Chem.
	The Profession
	The Bottom Line by Fire Ant & Worker Bee
	Translation Contracts
	Non-English Computing
	Use of “Virtual” Texts and HTML in Transliteration by Michael Walker
	Translator Education
	Translation Studies at a Crossroads by Maj-Britt Holljen
	Biomedical Translation
	Immunology—a Brief Overview, Part 3 by Lúcia M. Singer, Ph.D.
	Science & Technology
	A Translator’s Guide to Organic Chemical Nomenclature XIV by Chester E. Claff, Jr., Ph.D.
	Banking and Finance
	Going Broke in Brazil by Danilo Nogueira
	Caught in the Web
	Web Surfing for Fun and Profit by Cathy Flick, Ph.D.
	Terminology Search on the Worldwide Web by Gabe Bokor
	Translators’ On-Line Resources by Gabe Bokor
	Translators’ Events
	Letters to the Editor
	Call for Papers

Use of “Virtual” Texts and HTML in Transliteration

by Michael C. Walker

Among the most difficult problems facing the translator is the issue of transliterating from a non-Latin script into a Latin-based script. The growing importance in the realms of commerce, science, engineering, and politics of nations that utilize textural systems other than those based on the Latin (Roman) alphabet has precipitated an increase in the need for translations that, by the nature of linguistic differences, require transliteration. Transliteration has traditionally necessitated the creation of a number of working texts in both the source language and the recipient language so that the intricacies caused by not only translating from one language to another but from dealing with radically different textural systems can be resolved. Such processes, though indispensable to the provision of quality translation, typically made transliteration into a long and often laborious undertaking. In the past, when a source document in a non-Latin script must be translated and transliterated into more than one Latin-based recipient language, often a number of different translators would need to utilize the same working documents, establishing a scenario with the potential for grave errors in translation and the lack of uniformity in all the recipient languages.
   The advent of the Internet, Hyper Text Mark-up Language (HTML), and associated technologies have allowed for new methods of approaching transliteration and new concepts in the creation of the so-called working documents that are often so indispensable to the translator. These working documents -in the virtual environment- are frequently capable of being truly dynamic in the sense that they can be changed a multitude of times during the translation process with minimal effort and that variorum facsimiles can be produced for different purposes while persevering the integrity of the original document. HTML, a complex language in its own right, has allowed for the resolution of many technical and linguistic problems inherent to transliteration as current HTML protocols allow for the utilization of a variety of script systems in the Internet environment. While the provision for different scripts was created in response to a need for viewing non-Latin scripts on the Internet, the environment fostered by HTML is very conducive to the simultaneous use of two or more distinct scripts in the same document. Therefore, HTML can be effectively used outside of Internet programming applications as a platform for transliteration. When the end goal of the transliteration/translation process is to present the document on the Internet, the utilization of HTML is even more logical as it provides for the definitive creation of a finished text while proactivly encouraging a dynamic environment for the generation of that text.
   Aside from technical aspects of transliteration in the virtual environment, it is meaningful to consider the methodological and pedagogical import of this modality. The real issue central to any translation that involves transliteration is the fact that the final written document will not be in the same script as the original document, and therefore there may be certain characters -or even complete words and phrases- which cannot be translated verbatim into the recipient language due to differences in the scripts involved. The adept translator must be aware of the unique problems that can be presented by transliteration; furthermore, in cases where linguistic dissimilarities in the scripts involved preclude a verbatim translation, the translator has an obligation to provide a transliteration that will both function as a comprehensible document in the recipient language while also preserving the integrity of the source language as far as is possible. A perfect example of how such a situation can cause long-lasting problems is that of the transliteration of the Russian composer Rachmaninoff's name into English; in transliterating from Cyrillic, there should be no instance where a surname ends in “off,” yet this mistransliteration is so prevalent in English that it has indeed become the standard spelling of the composer's name. When utilizing virtual environments for the transliteration process, such errors should be easier to avoid, given that a complete style guide for both languages involved can be established before transliteration begins and the HTML-based environment can be programmed to prevent transliterations that are not in keeping with established standards.
   The fact that a virtual environment can allow for the production of nearly limitless working copies of documents with relative ease also plays an significant role in the transliteration process as the translator can selectively change various working editions of the document in real time, producing variant versions of the manuscript for different purposes/readers, if necessary. The flexibility of Internet-based texts is not exclusive to translatory functions: scholars of novels and other literary works have found the HTML environment very helpful in the editing, study, and display of literature. Perhaps the most notable and illustrative example of this use would be the effort made by several scholars to produce definitive working variorum copies of James Joyce's novels Finnegan's Wake and Ulysses. Anyone familiar with Joyce's works can appreciate the benefits of the HTML environment as applied to these works: the ability to create not only more extensive footnotes than possible in traditional textural formats, but to also hyperlink such annotations to sites outside of the immediate domain. Thus, a teaching text or facsimile edition for scholarly purposes can be produced and alterations can be made as needed. Additionally, collaborative authorship/editorship can be easily facilitated via the Internet and the HTML environment.
   Returning to applications relevant to translation work, the study of the Bible (in Greek and Hebrew) and other sacred texts -notably, Egyptian Coptic documents in the Coptic language- has been encouraged by the ability of the Internet to unite far-flung scholars. The case of Coptic -which is a technically a non-Latin script although it contains characters that were later incorporated into Roman scripts- is an excellent example of the benefits of working in HTML when dealing with more than one script. In the uses I have seen, the editors of the Coptic documents most often will provide excerpts of the Coptic text alongside the translated text as an effort to offer immediate comparison between the source text and the translation. Of course, such has been done in many book-based transliterations of sacred texts even before the advent of the computer, but the difference here is that the HTML environment allows the editor to make changes efficiently and with next to no cost, whereas a monograph would require at the very least the publication of a list of errors or a retraction, if not an entirely new edition. This advantage of HTML-based publishing has not been lost to those who create textbooks that can require frequent and costly updates, such as medical manuals, several of which have been published as virtual, web-based, texts on the Internet.
   Another aspect of HTML-based technologies that makes these platforms useful for translation applications is the ability to easily build an index of the all text residing on a given site (or within a discreet document) and to use such an index to perform searches of the text and, when necessary, to make changes in the text to specific words and phrases. The ability to make “universal” changes to a text, that is, to change all instances where one word/phrase appears within the text is inherent to HTML-based programs. Such an option is often very useful to the translator working with a transliterated text as when a specific word/phrase is found in need of correction, that word/phrase can be updated as required throughout the entire document. This dynamic handling of text and the ability of the software (due to the implementation of the HTML language) to recognize words as discreet entities with their own specific properties saves considerable time in the editing process of transliterated documents and also encourages editing to be a proactive part of the translation process.
   HTML has evolved from one primary programming language intended for the creation of sites on the Internet (within the viewing and operational constraints of the World-Wide-Web) into a variety of language sub-types, similar to how major languages will over time produce various dialects. Currently, there are several variants of HTML which are collectively known as “dynamic HTML.” “Java Script” is perhaps the best-known of these HTML variants, however, it certainly is not the only advanced HTML variation available. As the Internet and World-Wide-Web continues to develop, there is no doubt that HTML will continue to receive serious attention as a programming platform and that further outgrowths of the language will appear. Admittedly, the creation of virtual texts with HTML (whether on the Internet or outside of its confines) is only one area of HTML usage and small one at that, so little attention has been paid from a programming perspective to the specific needs of persons creating such texts. Fortunately, improvements demanded by other areas of HTML usage have had cross-over benefits for those using the language in translatory applications.
   A prime example of using HTML in the transliteration of a text can be found in the work of the translator Joan Sorren, who constructed an HTML-based Web-environment to facilitate her translations of contemporary Hebrew documents into English. Sorren was working primarily with literary texts by modern Israeli authors so there were few problems in variance of the Hebrew language due to historical/chronological differences; her main challenge was the sheer amount of text she often had to approach and the tight deadlines set by those who had contracted her for these translations. The HTML-environment allow Sorren to create a key-map, or programmed guide of keyboard assignments of Hebrew characters correlated to HTML commands. As Sorren was working on an Apple Macintosh computer with a standard “QWERTY” configured Latin keyboard layout, she found the user-defined assignation of the Hebrew characters most useful. This way, Sorren could use the keyboard however she desired and was not constrained by any differences between the hardware and the software she was utilizing as the software was, in fact, user-defined itself. Instead of creating (or purchasing) a specific translation program, Sorren used a standard word processor (Microsoft Word) and page-layout program (Adobe Pagemaker) to input and manipulate her text. The problem encountered in this approach was that Word was not able to produce a uniform formatting for Hebrew (even with Apple Computer's Hebrew operating system software installed on Sorren's computer) that was completely searchable and indexable to the standards Sorren has set for her work. “I felt that I was doing what the software wanted and not the other way around [and] if I was going to use the computer in transliteration, it had to out-perform my conventional means of working, which the word processing software just wasn't doing for me,” Sorren commented on her initial misgivings about the computer-based environment. However, Sorren's husband -a software engineer- suggested that she try constructing a translating environment in HTML using Adobe System's PageMill software so that she could directly import information that proved cumbersome in Microsoft Word and manipulate it using the more dynamic environment of HTML. Sorren found that the creation of this environment -with her husband's assistance- was not that difficult and that HTML allow her true flexibility in manipulating and exporting her Hebrew text. Using Hebrew fonts in Word, Sorren would enter the text from her original sources (mainly unpublished monographs and fair-copies) and then import this text over to the HTML-based working environment. Sorren and her husband had customized a version of HTML encoding based on the standardized ISO-8859-7 Greek encoding with which she could display and manipulate Hebrew within the web-based environment. Although commercially available encoding standards for Hebrew do exist, Sorren preferred to work with a customized version so that she would have as much versatility in the control of the text as possible.
   The Sorrens provided for the representation of English standard characters in the same encoding as they were using for the Hebrew, as neither language would require all the available “slots” for various characters. Similar approaches have been successfully taken for the encoding of Cyrillic (usually Russian) fonts alongside English fonts. I qualify these as “English,” as opposed to Latin as often the ancillary characters, accent marks, and diacritics needed for the representation of other Latin-based languages cannot be included along with an encoding standard for Cyrillic, Greek, Hebrew, or any other non-Latin script; the Sorrens reported that in their customized encoding, there was no “room” left for any characters beyond the Hebrew alphabet and the upper and lower-case English alphabet. Written alphabetic scripts that contain a much greater variety of characters than the Latin alphabet -such as Arabic or Sanskrit- and iconographic textural systems such as Chinese could not possibly be encoded in such a manner that ample space was left for the inclusion of an English/Latin representation. In fact, in encoding Chinese, additional software is required to provide for the vast number of characters utilized in this language. As HTML and its related programming languages were predominately developed in the United States (and to a lesser degree, in the United Kingdom and Germany), precedence was drawn from Latin-based languages with the inclusion of all other textural systems being secondary. Apple Computer as well as numerous third-party developers have risen to the challenge of providing useful solutions for representing non-Western scripts within the computing environment in general and specifically, within the programming environments fostered by HTML.
   With the integration of voice-recognition software, better textural reading software/hardware combinations (systems that facilitate the scanning in of printed or handwritten text and the conversion of such text to a manuscript that can be manipulated in the word processing environment), the implementation of HTML-based platforms as modalities for translation/transliteration is likely to grow markedly. Voice-recognition software in particular holds some intriguing possibilities for translation work, as it would be conceivable for a translator to orally read the work to be translated (in the source language) and have the computer transcribe the oral input into text. Of course, when working with non-Latin textural systems, many questions germane to the rendering of the oral material would need to be answered. Would the computer transcribe into a phonetic approximation of the source language or would it be better (in most instances) to render the oral material into the appropriate script of the source language? Clearly, such issues would be best solved on a case-by-case basis during the transliteration process, but some input is needed from translators to make the emergent software applicable to translatory functions in the first place. One reason that little attention has been paid to translatory applications in the past by word-processing software manufacturers (save those who design software predominately for such applications) is that translators have not been especially vocal in the computer industry and have remained something of an unknown market. The opportunities being produced by new technologies really allow the translator a great deal of latitude in how he or she chooses to approach the physical task of translation.
   Examples such as that of Joan Sorren should be quite encouraging to translators who are beginning to explore computers as a more sophisticated tool than just a word processing or page-layout environment. What appears to be needed at this point in time to facilitate the highest level of efficacy in computer-assisted translations is a better understanding on the part of translators of the resources available and also a more pronounced involvement of translators in the ongoing standardization process for methods of rendering non-Latin texts in the HTML environment. ISO (International Standards Organization) criterions have been set for Russian (and other languages using Cyrillic), Greek, Japanese, Chinese, Korean, and the central European languages (such as Polish and Czech, which are Latin-based but utilize unique diacritics). Comprehensive standards for such Semitic languages as Hebrew, Arabic, and Persian are still lacking and some of the script-types that have been standardized are perhaps competent for the average user who only desires to view text on the Internet, but are woefully inadequate for the purposes of translators. The Cyrillic standards are a prime example of this quandary as such standards were designed primarily for Russian and are not necessarily applicable for all other Cyrillicized languages. Kazakh and Kirghiz, for example, are Cyrillic-based in their script systems but these languages contain special characters not found in contemporary Russian, so a typical Russian font or encoding system cannot facilitate the correct representation of these languages.
   While the problems caused by standardization have received some interest from academic linguists, it is promising to note that the commercial sector has also risen to these challenges. The Russian corporation Paratype (URL: http://www.paratype.com) produces a number of fonts that can be utilized to represent the less-common Cyrillic scripts and offer software to assist in the HTML implementation of these products. Additionally, Paratype has created fonts in the Georgian and Armenian languages, two scripts for which computer fonts have been rather hard to obtain. The dedication shown by Paratype to less-common languages should not be too surprising given that without the Internet and the increasing “smallness” of our world, it is doubtful that this company would have grown into the international success that it is today. Those interested in transliteration assisted by the HTML environment would do well to become informed about new directions in encoding standardization for HTML and how these standards, in a practical sense, can affect the work of the translator. The software industry is at a point where it can afford to cater to special interest segments of business -translation being one of these- but software developers must first know that a viable market exists.
   I can easily anticipate that over the next decade we will see developments in HTML that shall seem to be nearly miraculous when compared to the humble origins of the Internet only about a decade ago. HTML and the environments it facilitates can nourish the work of the translator and transliterator in a way that has never before been possible, so the evolution of the HTML on a technological level is certain to be exciting. A virtual text -as a concept- is so attractive because such texts can go in nearly any direction and the technology that makes them possible is something that anyone reading this article on his or her computer already has access to, so to paraphrase Celine Dion, truly, “these are special times” for translators.
    The author would like to thank Joan and Lawrence Sorren for their assistance in preparing this article. There are numerous books, CD-ROMS, and websites dedicated to HTML programming just as there are many commercially available programs that assist in the building of websites and the use of HTML. Adobe System's PageMill is one of the most widely-used of such products and allows easy transmission of textural data between the Internet and word-processors. Of the websites dedicated to HTML development and use, http://www.stars.com is one of the very best, replete with articles on advances in HTML and a wealth of tutorials. Paratype can be reached by telephone at (007) 095 332 4001 or on the Internet at either http://www.paratype.com or http://www.paratype.ru