Volume 3, No. 1  
 January 1999  | 
 
 
  
 
Michael Walker is a writer, cultural theorist, visual artist, and poet. 
He lives in San Francisco, California. Walker is perhaps best known to 
the academic and biomedical communities for his work on the reformation 
of Mongolias health care system, and has also authored a number of 
journal articles on various aspects of Mongolian technological and legal 
reform. His other areas of research interest include HIV/AIDS education 
and prevention, geography and navigation, feminist themes in literature, 
and how the arts and sciences interact in various cultural settings. 
With regard to languages, his interests include: English, Spanish, 
Chinese, Mongolian, Russian, Hebrew, Hurrain, Arabic, and Persian. His 
personal interests include: soccer, running, hockey, cooking, and music.
Mike can be reached at: mikewalker@geocities.com
 
 
  |  
          
 
 
 |   
 
 
 | 
 | 
  
  
  |  
 
 |  
 
 
 
 Use of Virtual Texts and HTML in Transliteration 
 by Michael C. Walker  
 |   
Among the most difficult problems facing the translator is the issue of 
transliterating from a non-Latin script into a Latin-based script. The 
growing importance in the realms of commerce, science, engineering, and 
politics of nations that utilize textural systems other than those based 
on the Latin (Roman) alphabet has precipitated an increase in the need 
for translations that, by the nature of linguistic differences, require 
transliteration. Transliteration has traditionally necessitated the 
creation of a number of working texts in both the source language and 
the recipient language so that the intricacies caused by not only 
translating from one language to another but from dealing with radically 
different textural systems can be resolved. Such processes, though 
indispensable to the provision of quality translation, typically made 
transliteration into a long and often laborious undertaking. In the 
past, when a source document in a non-Latin script must be translated 
and transliterated into more than one Latin-based recipient language, 
often a number of different translators would need to utilize the same 
working documents, establishing a scenario with the potential for grave 
errors in translation and the lack of uniformity in all the recipient 
languages. 
    The advent of the Internet, Hyper Text Mark-up Language (HTML), and 
associated technologies have allowed for new methods of approaching 
transliteration and new concepts in the creation of the so-called 
working documents that are often so indispensable to the translator. 
These working documents -in the virtual environment- are frequently 
capable of being truly dynamic in the sense that they can be changed a 
multitude of times during the translation process with minimal effort 
and that variorum facsimiles can be produced for different purposes 
while persevering the integrity of the original document. HTML, a 
complex language in its own right, has allowed for the resolution of 
many technical and linguistic problems inherent to transliteration as 
current HTML protocols allow for the utilization of a variety of script 
systems in the Internet environment. While the provision for different 
scripts was created in response to a need for viewing non-Latin scripts 
on the Internet, the environment fostered by HTML is very conducive to 
the simultaneous use of two or more distinct scripts in the same 
document. Therefore, HTML can be effectively used outside of Internet 
programming applications as a platform for transliteration. When the end 
goal of the transliteration/translation process is to present the 
document on the Internet, the utilization of HTML is even more logical 
as it provides for the definitive creation of a finished text while 
proactivly encouraging a dynamic environment for the generation of that 
text.
    Aside from technical aspects of transliteration in the virtual 
environment, it is meaningful to consider the methodological and 
pedagogical import of this modality. The real issue central to any 
translation that involves transliteration is the fact that the final 
written document will not be in the same script as the original 
document, and therefore there may be certain characters -or even 
complete words and phrases- which cannot be translated verbatim into the 
recipient language due to differences in the scripts involved. The adept 
translator must be aware of the unique problems that can be presented by 
transliteration; furthermore, in cases where linguistic dissimilarities 
in the scripts involved preclude a verbatim translation, the translator 
has an obligation to provide a transliteration that will both function 
as a comprehensible document in the recipient language while also 
preserving the integrity of the source language as far as is possible. A 
perfect example of how such a situation can cause long-lasting problems 
is that of the transliteration of the Russian composer Rachmaninoff's 
name into English; in transliterating from Cyrillic, there should be no 
instance where a surname ends in off, yet this mistransliteration is 
so prevalent in English that it has indeed become the standard spelling 
of the composer's name. When utilizing virtual environments for the 
transliteration process, such errors should be easier to avoid, given 
that a complete style guide for both languages involved can be 
established before transliteration begins and the HTML-based environment 
can be programmed to prevent transliterations that are not in keeping 
with established standards. 
    The fact that a virtual environment can allow for the production of 
nearly limitless working copies of documents with relative ease also 
plays an significant role in the transliteration process as the 
translator can selectively change various working editions of the 
document in real time, producing variant versions of the manuscript for 
different purposes/readers, if necessary. The flexibility of 
Internet-based texts is not exclusive to translatory functions: scholars 
of novels and other literary works have found the HTML environment very 
helpful in the editing, study, and display of literature. Perhaps the 
most notable and illustrative example of this use would be the effort 
made by several scholars to produce definitive working variorum copies 
of James Joyce's novels Finnegan's Wake and Ulysses. Anyone familiar 
with Joyce's works can appreciate the benefits of the HTML environment 
as applied to these works: the ability to create not only more extensive 
footnotes than possible in traditional textural formats, but to also 
hyperlink such annotations to sites outside of the immediate domain. 
Thus, a teaching text or facsimile edition for scholarly purposes can be 
produced and alterations can be made as needed. Additionally, 
collaborative authorship/editorship can be easily facilitated via the 
Internet and the HTML environment. 
    Returning to applications relevant to translation work, the study of the 
Bible (in Greek and Hebrew) and other sacred texts -notably, Egyptian 
Coptic documents in the Coptic language- has been encouraged by the 
ability of the Internet to unite far-flung scholars. The case of Coptic 
-which is a technically a non-Latin script although it contains 
characters that were later incorporated into Roman scripts- is an 
excellent example of the benefits of working in HTML when dealing with 
more than one script. In the uses I have seen, the editors of the Coptic 
documents most often will provide excerpts of the Coptic text alongside 
the translated text as an effort to offer immediate comparison between 
the source text and the translation. Of course, such has been done in 
many book-based transliterations of sacred texts even before the advent 
of the computer, but the difference here is that the HTML environment 
allows the editor to make changes efficiently and with next to no cost, 
whereas a monograph would require at the very least the publication of a 
list of errors or a retraction, if not an entirely new edition. This 
advantage of HTML-based publishing has not been lost to those who create 
textbooks that can require frequent and costly updates, such as medical 
manuals, several of which have been published as virtual, web-based, 
texts on the Internet.
    Another aspect of HTML-based technologies that makes these platforms 
useful for translation applications is the ability to easily build an 
index of the all text residing on a given site (or within a discreet 
document) and to use such an index to perform searches of the text and, 
when necessary, to make changes in the text to specific words and 
phrases. The ability to make universal changes to a text, that is, to 
change all instances where one word/phrase appears within the text is 
inherent to HTML-based programs. Such an option is often very useful to 
the translator working with a transliterated text as when a specific 
word/phrase is found in need of correction, that word/phrase can be 
updated as required throughout the entire document. This dynamic 
handling of text and the ability of the software (due to the 
implementation of the HTML language) to recognize words as discreet 
entities with their own specific properties saves considerable time in 
the editing process of transliterated documents and also encourages 
editing to be a proactive part of the translation process.
    HTML has evolved from one primary programming language intended for the 
creation of sites on the Internet (within the viewing and operational 
constraints of the World-Wide-Web) into a variety of language sub-types, 
similar to how major languages will over time produce various dialects. 
Currently, there are several variants of HTML which are collectively 
known as dynamic HTML. Java Script is perhaps the best-known of 
these HTML variants, however, it certainly is not the only advanced HTML 
variation available. As the Internet and World-Wide-Web continues to 
develop, there is no doubt that HTML will continue to receive serious 
attention as a programming platform and that further outgrowths of the 
language will appear. Admittedly, the creation of virtual texts with 
HTML (whether on the Internet or outside of its confines) is only one 
area of HTML usage and small one at that, so little attention has been 
paid from a programming perspective to the specific needs of persons 
creating such texts. Fortunately, improvements demanded by other areas 
of HTML usage have had cross-over benefits for those using the language 
in translatory applications. 
    A prime example of using HTML in the transliteration of a text can be 
found in the work of the translator Joan Sorren, who constructed an 
HTML-based Web-environment to facilitate her translations of 
contemporary Hebrew documents into English. Sorren was working primarily 
with literary texts by modern Israeli authors so there were few problems 
in variance of the Hebrew language due to historical/chronological 
differences; her main challenge was the sheer amount of text she often 
had to approach and the tight deadlines set by those who had contracted 
her for these translations. The HTML-environment allow Sorren to create 
a key-map, or programmed guide of keyboard assignments of Hebrew 
characters correlated to HTML commands. As Sorren was working on an 
Apple Macintosh computer with a standard QWERTY configured Latin 
keyboard layout, she found the user-defined assignation of the Hebrew 
characters most useful. This way, Sorren could use the keyboard however 
she desired and was not constrained by any differences between the 
hardware and the software she was utilizing as the software was, in 
fact, user-defined itself. Instead of creating (or purchasing) a 
specific translation program, Sorren used a standard word processor 
(Microsoft Word) and page-layout program (Adobe Pagemaker) to input and 
manipulate her text. The problem encountered in this approach was that 
Word was not able to produce a uniform formatting for Hebrew (even with 
Apple Computer's Hebrew operating system software installed on Sorren's 
computer) that was completely searchable and indexable to the standards 
Sorren has set for her work. 
 I felt that I was doing what the software wanted and not the other way 
around [and] if I was going to use the computer in transliteration, it 
had to out-perform my conventional means of working, which the word 
processing software just wasn't doing for me, Sorren commented on her 
initial misgivings about the computer-based environment. However, 
Sorren's husband -a software engineer- suggested that she try 
constructing a translating environment in HTML using Adobe System's 
PageMill software so that she could directly import information that 
proved cumbersome in Microsoft Word and manipulate it using the more 
dynamic environment of HTML. Sorren found that the creation of this 
environment -with her husband's assistance- was not that difficult and 
that HTML allow her true flexibility in manipulating and exporting her 
Hebrew text. Using Hebrew fonts in Word, Sorren would enter the text 
from her original sources (mainly unpublished monographs and 
fair-copies) and then import this text over to the HTML-based working 
environment. Sorren and her husband had customized a version of HTML 
encoding based on the standardized ISO-8859-7 Greek encoding with which 
she could display and manipulate Hebrew within the web-based 
environment. Although commercially available encoding standards for 
Hebrew do exist, Sorren preferred to work with a customized version so 
that she would have as much versatility in the control of the text as 
possible. 
    The Sorrens provided for the representation of English standard 
characters in the same encoding as they were using for the Hebrew, as 
neither language would require all the available slots for various 
characters. Similar approaches have been successfully taken for the 
encoding of Cyrillic (usually Russian) fonts alongside English fonts. I 
qualify these as English, as opposed to Latin as often the ancillary 
characters, accent marks, and diacritics needed for the representation 
of other Latin-based languages cannot be included along with an encoding 
standard for Cyrillic, Greek, Hebrew, or any other non-Latin script; the 
Sorrens reported that in their customized encoding, there was no room 
left for any characters beyond the Hebrew alphabet and the upper and 
lower-case English alphabet. Written alphabetic scripts that contain a 
much greater variety of characters than the Latin alphabet -such as 
Arabic or Sanskrit- and iconographic textural systems such as Chinese 
could not possibly be encoded in such a manner that ample space was left 
for the inclusion of an English/Latin representation. In fact, in 
encoding Chinese, additional software is required to provide for the 
vast number of characters utilized in this language. As HTML and its 
related programming languages were predominately developed in the United 
States (and to a lesser degree, in the United Kingdom and Germany), 
precedence was drawn from Latin-based languages with the inclusion of 
all other textural systems being secondary. Apple Computer as well as 
numerous third-party developers have risen to the challenge of providing 
useful solutions for representing non-Western scripts within the 
computing environment in general and specifically, within the 
programming environments fostered by HTML.
    With the integration of voice-recognition software, better textural 
reading software/hardware combinations (systems that facilitate the 
scanning in of printed or handwritten text and the conversion of such 
text to a manuscript that can be manipulated in the word processing 
environment), the implementation of HTML-based platforms as modalities 
for translation/transliteration is likely to grow markedly. 
Voice-recognition software in particular holds some intriguing 
possibilities for translation work, as it would be conceivable for a 
translator to orally read the work to be translated (in the source 
language) and have the computer transcribe the oral input into text. Of 
course, when working with non-Latin textural systems, many questions 
germane to the rendering of the oral material would need to be answered. 
Would the computer transcribe into a phonetic approximation of the 
source language or would it be better (in most instances) to render the 
oral material into the appropriate script of the source language? 
Clearly, such issues would be best solved on a case-by-case basis during 
the transliteration process, but some input is needed from translators 
to make the emergent software applicable to translatory functions in the 
first place. One reason that little attention has been paid to 
translatory applications in the past by word-processing software 
manufacturers (save those who design software predominately for such 
applications) is that translators have not been especially vocal in the 
computer industry and have remained something of an unknown market. The 
opportunities being produced by new technologies really allow the 
translator a great deal of latitude in how he or she chooses to approach 
the physical task of translation. 
    Examples such as that of Joan Sorren should be quite encouraging to 
translators who are beginning to explore computers as a more 
sophisticated tool than just a word processing or page-layout 
environment. What appears to be needed at this point in time to 
facilitate the highest level of efficacy in computer-assisted 
translations is a better understanding on the part of translators of the 
resources available and also a more pronounced involvement of 
translators in the ongoing standardization process for methods of 
rendering non-Latin texts in the HTML environment. ISO (International 
Standards Organization) criterions have been set for Russian (and other 
languages using Cyrillic), Greek, Japanese, Chinese, Korean, and the 
central European languages (such as Polish and Czech, which are 
Latin-based but utilize unique diacritics). Comprehensive standards for 
such Semitic languages as Hebrew, Arabic, and Persian are still lacking 
and some of the script-types that have been standardized are perhaps 
competent for the average user who only desires to view text on the 
Internet, but are woefully inadequate for the purposes of translators. 
The Cyrillic standards are a prime example of this quandary as such 
standards were designed primarily for Russian and are not necessarily 
applicable for all other Cyrillicized languages. Kazakh and Kirghiz, for 
example, are Cyrillic-based in their script systems but these languages 
contain special characters not found in contemporary Russian, so a 
typical Russian font or encoding system cannot facilitate the correct 
representation of these languages. 
    While the problems caused by standardization have received some interest 
from academic linguists, it is promising to note that the commercial 
sector has also risen to these challenges. The Russian corporation 
Paratype (URL: http://www.paratype.com) produces a number of fonts that 
can be utilized to represent the less-common Cyrillic scripts and offer 
software to assist in the HTML implementation of these products. 
Additionally, Paratype has created fonts in the Georgian and Armenian 
languages, two scripts for which computer fonts have been rather hard to 
obtain. The dedication shown by Paratype to less-common languages should 
not be too surprising given that without the Internet and the increasing  smallness of our world, it is doubtful that this company would have 
grown into the international success that it is today. Those interested 
in transliteration assisted by the HTML environment would do well to 
become informed about new directions in encoding standardization for 
HTML and how these standards, in a practical sense, can affect the work 
of the translator. The software industry is at a point where it can 
afford to cater to special interest segments of business -translation 
being one of these- but software developers must first know that a 
viable market exists. 
    I can easily anticipate that over the next decade we will see 
developments in HTML that shall seem to be nearly miraculous when 
compared to the humble origins of the Internet only about a decade ago. 
HTML and the environments it facilitates can nourish the work of the 
translator and transliterator in a way that has never before been 
possible, so the evolution of the HTML on a technological level is 
certain to be exciting. A virtual text -as a concept- is so attractive 
because such texts can go in nearly any direction and the technology 
that makes them possible is something that anyone reading this article 
on his or her computer already has access to, so to paraphrase Celine 
Dion, truly, these are special times for translators. 
    
The author would like to thank Joan and Lawrence Sorren for their 
assistance in preparing this article. There are numerous books, CD-ROMS, 
and websites dedicated to HTML programming just as there are many 
commercially available programs that assist in the building of websites 
and the use of HTML. Adobe System's PageMill is one of the most 
widely-used of such products and allows easy transmission of textural 
data between the Internet and word-processors. Of the websites dedicated 
to HTML development and use, http://www.stars.com is one of the very 
best, replete with articles on advances in HTML and a wealth of 
tutorials. Paratype can be reached by telephone at (007) 095 332 4001 or 
on the Internet at either http://www.paratype.com or 
http://www.paratype.ru 
 
 | 
 
   © Copyright Translation Journal and the Author 1998 Send your comments to the Webmaster  URL: http://accurapid.com/journal/07translit.htm 
 | 
  
 |