Volume 3, No. 1
January 1999 |
Michael Walker is a writer, cultural theorist, visual artist, and poet.
He lives in San Francisco, California. Walker is perhaps best known to
the academic and biomedical communities for his work on the reformation
of Mongolias health care system, and has also authored a number of
journal articles on various aspects of Mongolian technological and legal
reform. His other areas of research interest include HIV/AIDS education
and prevention, geography and navigation, feminist themes in literature,
and how the arts and sciences interact in various cultural settings.
With regard to languages, his interests include: English, Spanish,
Chinese, Mongolian, Russian, Hebrew, Hurrain, Arabic, and Persian. His
personal interests include: soccer, running, hockey, cooking, and music.
Mike can be reached at: mikewalker@geocities.com
|
|
|
|
|
|
Use of Virtual Texts and HTML in Transliteration
by Michael C. Walker
|
Among the most difficult problems facing the translator is the issue of
transliterating from a non-Latin script into a Latin-based script. The
growing importance in the realms of commerce, science, engineering, and
politics of nations that utilize textural systems other than those based
on the Latin (Roman) alphabet has precipitated an increase in the need
for translations that, by the nature of linguistic differences, require
transliteration. Transliteration has traditionally necessitated the
creation of a number of working texts in both the source language and
the recipient language so that the intricacies caused by not only
translating from one language to another but from dealing with radically
different textural systems can be resolved. Such processes, though
indispensable to the provision of quality translation, typically made
transliteration into a long and often laborious undertaking. In the
past, when a source document in a non-Latin script must be translated
and transliterated into more than one Latin-based recipient language,
often a number of different translators would need to utilize the same
working documents, establishing a scenario with the potential for grave
errors in translation and the lack of uniformity in all the recipient
languages.
The advent of the Internet, Hyper Text Mark-up Language (HTML), and
associated technologies have allowed for new methods of approaching
transliteration and new concepts in the creation of the so-called
working documents that are often so indispensable to the translator.
These working documents -in the virtual environment- are frequently
capable of being truly dynamic in the sense that they can be changed a
multitude of times during the translation process with minimal effort
and that variorum facsimiles can be produced for different purposes
while persevering the integrity of the original document. HTML, a
complex language in its own right, has allowed for the resolution of
many technical and linguistic problems inherent to transliteration as
current HTML protocols allow for the utilization of a variety of script
systems in the Internet environment. While the provision for different
scripts was created in response to a need for viewing non-Latin scripts
on the Internet, the environment fostered by HTML is very conducive to
the simultaneous use of two or more distinct scripts in the same
document. Therefore, HTML can be effectively used outside of Internet
programming applications as a platform for transliteration. When the end
goal of the transliteration/translation process is to present the
document on the Internet, the utilization of HTML is even more logical
as it provides for the definitive creation of a finished text while
proactivly encouraging a dynamic environment for the generation of that
text.
Aside from technical aspects of transliteration in the virtual
environment, it is meaningful to consider the methodological and
pedagogical import of this modality. The real issue central to any
translation that involves transliteration is the fact that the final
written document will not be in the same script as the original
document, and therefore there may be certain characters -or even
complete words and phrases- which cannot be translated verbatim into the
recipient language due to differences in the scripts involved. The adept
translator must be aware of the unique problems that can be presented by
transliteration; furthermore, in cases where linguistic dissimilarities
in the scripts involved preclude a verbatim translation, the translator
has an obligation to provide a transliteration that will both function
as a comprehensible document in the recipient language while also
preserving the integrity of the source language as far as is possible. A
perfect example of how such a situation can cause long-lasting problems
is that of the transliteration of the Russian composer Rachmaninoff's
name into English; in transliterating from Cyrillic, there should be no
instance where a surname ends in off, yet this mistransliteration is
so prevalent in English that it has indeed become the standard spelling
of the composer's name. When utilizing virtual environments for the
transliteration process, such errors should be easier to avoid, given
that a complete style guide for both languages involved can be
established before transliteration begins and the HTML-based environment
can be programmed to prevent transliterations that are not in keeping
with established standards.
The fact that a virtual environment can allow for the production of
nearly limitless working copies of documents with relative ease also
plays an significant role in the transliteration process as the
translator can selectively change various working editions of the
document in real time, producing variant versions of the manuscript for
different purposes/readers, if necessary. The flexibility of
Internet-based texts is not exclusive to translatory functions: scholars
of novels and other literary works have found the HTML environment very
helpful in the editing, study, and display of literature. Perhaps the
most notable and illustrative example of this use would be the effort
made by several scholars to produce definitive working variorum copies
of James Joyce's novels Finnegan's Wake and Ulysses. Anyone familiar
with Joyce's works can appreciate the benefits of the HTML environment
as applied to these works: the ability to create not only more extensive
footnotes than possible in traditional textural formats, but to also
hyperlink such annotations to sites outside of the immediate domain.
Thus, a teaching text or facsimile edition for scholarly purposes can be
produced and alterations can be made as needed. Additionally,
collaborative authorship/editorship can be easily facilitated via the
Internet and the HTML environment.
Returning to applications relevant to translation work, the study of the
Bible (in Greek and Hebrew) and other sacred texts -notably, Egyptian
Coptic documents in the Coptic language- has been encouraged by the
ability of the Internet to unite far-flung scholars. The case of Coptic
-which is a technically a non-Latin script although it contains
characters that were later incorporated into Roman scripts- is an
excellent example of the benefits of working in HTML when dealing with
more than one script. In the uses I have seen, the editors of the Coptic
documents most often will provide excerpts of the Coptic text alongside
the translated text as an effort to offer immediate comparison between
the source text and the translation. Of course, such has been done in
many book-based transliterations of sacred texts even before the advent
of the computer, but the difference here is that the HTML environment
allows the editor to make changes efficiently and with next to no cost,
whereas a monograph would require at the very least the publication of a
list of errors or a retraction, if not an entirely new edition. This
advantage of HTML-based publishing has not been lost to those who create
textbooks that can require frequent and costly updates, such as medical
manuals, several of which have been published as virtual, web-based,
texts on the Internet.
Another aspect of HTML-based technologies that makes these platforms
useful for translation applications is the ability to easily build an
index of the all text residing on a given site (or within a discreet
document) and to use such an index to perform searches of the text and,
when necessary, to make changes in the text to specific words and
phrases. The ability to make universal changes to a text, that is, to
change all instances where one word/phrase appears within the text is
inherent to HTML-based programs. Such an option is often very useful to
the translator working with a transliterated text as when a specific
word/phrase is found in need of correction, that word/phrase can be
updated as required throughout the entire document. This dynamic
handling of text and the ability of the software (due to the
implementation of the HTML language) to recognize words as discreet
entities with their own specific properties saves considerable time in
the editing process of transliterated documents and also encourages
editing to be a proactive part of the translation process.
HTML has evolved from one primary programming language intended for the
creation of sites on the Internet (within the viewing and operational
constraints of the World-Wide-Web) into a variety of language sub-types,
similar to how major languages will over time produce various dialects.
Currently, there are several variants of HTML which are collectively
known as dynamic HTML. Java Script is perhaps the best-known of
these HTML variants, however, it certainly is not the only advanced HTML
variation available. As the Internet and World-Wide-Web continues to
develop, there is no doubt that HTML will continue to receive serious
attention as a programming platform and that further outgrowths of the
language will appear. Admittedly, the creation of virtual texts with
HTML (whether on the Internet or outside of its confines) is only one
area of HTML usage and small one at that, so little attention has been
paid from a programming perspective to the specific needs of persons
creating such texts. Fortunately, improvements demanded by other areas
of HTML usage have had cross-over benefits for those using the language
in translatory applications.
A prime example of using HTML in the transliteration of a text can be
found in the work of the translator Joan Sorren, who constructed an
HTML-based Web-environment to facilitate her translations of
contemporary Hebrew documents into English. Sorren was working primarily
with literary texts by modern Israeli authors so there were few problems
in variance of the Hebrew language due to historical/chronological
differences; her main challenge was the sheer amount of text she often
had to approach and the tight deadlines set by those who had contracted
her for these translations. The HTML-environment allow Sorren to create
a key-map, or programmed guide of keyboard assignments of Hebrew
characters correlated to HTML commands. As Sorren was working on an
Apple Macintosh computer with a standard QWERTY configured Latin
keyboard layout, she found the user-defined assignation of the Hebrew
characters most useful. This way, Sorren could use the keyboard however
she desired and was not constrained by any differences between the
hardware and the software she was utilizing as the software was, in
fact, user-defined itself. Instead of creating (or purchasing) a
specific translation program, Sorren used a standard word processor
(Microsoft Word) and page-layout program (Adobe Pagemaker) to input and
manipulate her text. The problem encountered in this approach was that
Word was not able to produce a uniform formatting for Hebrew (even with
Apple Computer's Hebrew operating system software installed on Sorren's
computer) that was completely searchable and indexable to the standards
Sorren has set for her work.
I felt that I was doing what the software wanted and not the other way
around [and] if I was going to use the computer in transliteration, it
had to out-perform my conventional means of working, which the word
processing software just wasn't doing for me, Sorren commented on her
initial misgivings about the computer-based environment. However,
Sorren's husband -a software engineer- suggested that she try
constructing a translating environment in HTML using Adobe System's
PageMill software so that she could directly import information that
proved cumbersome in Microsoft Word and manipulate it using the more
dynamic environment of HTML. Sorren found that the creation of this
environment -with her husband's assistance- was not that difficult and
that HTML allow her true flexibility in manipulating and exporting her
Hebrew text. Using Hebrew fonts in Word, Sorren would enter the text
from her original sources (mainly unpublished monographs and
fair-copies) and then import this text over to the HTML-based working
environment. Sorren and her husband had customized a version of HTML
encoding based on the standardized ISO-8859-7 Greek encoding with which
she could display and manipulate Hebrew within the web-based
environment. Although commercially available encoding standards for
Hebrew do exist, Sorren preferred to work with a customized version so
that she would have as much versatility in the control of the text as
possible.
The Sorrens provided for the representation of English standard
characters in the same encoding as they were using for the Hebrew, as
neither language would require all the available slots for various
characters. Similar approaches have been successfully taken for the
encoding of Cyrillic (usually Russian) fonts alongside English fonts. I
qualify these as English, as opposed to Latin as often the ancillary
characters, accent marks, and diacritics needed for the representation
of other Latin-based languages cannot be included along with an encoding
standard for Cyrillic, Greek, Hebrew, or any other non-Latin script; the
Sorrens reported that in their customized encoding, there was no room
left for any characters beyond the Hebrew alphabet and the upper and
lower-case English alphabet. Written alphabetic scripts that contain a
much greater variety of characters than the Latin alphabet -such as
Arabic or Sanskrit- and iconographic textural systems such as Chinese
could not possibly be encoded in such a manner that ample space was left
for the inclusion of an English/Latin representation. In fact, in
encoding Chinese, additional software is required to provide for the
vast number of characters utilized in this language. As HTML and its
related programming languages were predominately developed in the United
States (and to a lesser degree, in the United Kingdom and Germany),
precedence was drawn from Latin-based languages with the inclusion of
all other textural systems being secondary. Apple Computer as well as
numerous third-party developers have risen to the challenge of providing
useful solutions for representing non-Western scripts within the
computing environment in general and specifically, within the
programming environments fostered by HTML.
With the integration of voice-recognition software, better textural
reading software/hardware combinations (systems that facilitate the
scanning in of printed or handwritten text and the conversion of such
text to a manuscript that can be manipulated in the word processing
environment), the implementation of HTML-based platforms as modalities
for translation/transliteration is likely to grow markedly.
Voice-recognition software in particular holds some intriguing
possibilities for translation work, as it would be conceivable for a
translator to orally read the work to be translated (in the source
language) and have the computer transcribe the oral input into text. Of
course, when working with non-Latin textural systems, many questions
germane to the rendering of the oral material would need to be answered.
Would the computer transcribe into a phonetic approximation of the
source language or would it be better (in most instances) to render the
oral material into the appropriate script of the source language?
Clearly, such issues would be best solved on a case-by-case basis during
the transliteration process, but some input is needed from translators
to make the emergent software applicable to translatory functions in the
first place. One reason that little attention has been paid to
translatory applications in the past by word-processing software
manufacturers (save those who design software predominately for such
applications) is that translators have not been especially vocal in the
computer industry and have remained something of an unknown market. The
opportunities being produced by new technologies really allow the
translator a great deal of latitude in how he or she chooses to approach
the physical task of translation.
Examples such as that of Joan Sorren should be quite encouraging to
translators who are beginning to explore computers as a more
sophisticated tool than just a word processing or page-layout
environment. What appears to be needed at this point in time to
facilitate the highest level of efficacy in computer-assisted
translations is a better understanding on the part of translators of the
resources available and also a more pronounced involvement of
translators in the ongoing standardization process for methods of
rendering non-Latin texts in the HTML environment. ISO (International
Standards Organization) criterions have been set for Russian (and other
languages using Cyrillic), Greek, Japanese, Chinese, Korean, and the
central European languages (such as Polish and Czech, which are
Latin-based but utilize unique diacritics). Comprehensive standards for
such Semitic languages as Hebrew, Arabic, and Persian are still lacking
and some of the script-types that have been standardized are perhaps
competent for the average user who only desires to view text on the
Internet, but are woefully inadequate for the purposes of translators.
The Cyrillic standards are a prime example of this quandary as such
standards were designed primarily for Russian and are not necessarily
applicable for all other Cyrillicized languages. Kazakh and Kirghiz, for
example, are Cyrillic-based in their script systems but these languages
contain special characters not found in contemporary Russian, so a
typical Russian font or encoding system cannot facilitate the correct
representation of these languages.
While the problems caused by standardization have received some interest
from academic linguists, it is promising to note that the commercial
sector has also risen to these challenges. The Russian corporation
Paratype (URL: http://www.paratype.com) produces a number of fonts that
can be utilized to represent the less-common Cyrillic scripts and offer
software to assist in the HTML implementation of these products.
Additionally, Paratype has created fonts in the Georgian and Armenian
languages, two scripts for which computer fonts have been rather hard to
obtain. The dedication shown by Paratype to less-common languages should
not be too surprising given that without the Internet and the increasing smallness of our world, it is doubtful that this company would have
grown into the international success that it is today. Those interested
in transliteration assisted by the HTML environment would do well to
become informed about new directions in encoding standardization for
HTML and how these standards, in a practical sense, can affect the work
of the translator. The software industry is at a point where it can
afford to cater to special interest segments of business -translation
being one of these- but software developers must first know that a
viable market exists.
I can easily anticipate that over the next decade we will see
developments in HTML that shall seem to be nearly miraculous when
compared to the humble origins of the Internet only about a decade ago.
HTML and the environments it facilitates can nourish the work of the
translator and transliterator in a way that has never before been
possible, so the evolution of the HTML on a technological level is
certain to be exciting. A virtual text -as a concept- is so attractive
because such texts can go in nearly any direction and the technology
that makes them possible is something that anyone reading this article
on his or her computer already has access to, so to paraphrase Celine
Dion, truly, these are special times for translators.
The author would like to thank Joan and Lawrence Sorren for their
assistance in preparing this article. There are numerous books, CD-ROMS,
and websites dedicated to HTML programming just as there are many
commercially available programs that assist in the building of websites
and the use of HTML. Adobe System's PageMill is one of the most
widely-used of such products and allows easy transmission of textural
data between the Internet and word-processors. Of the websites dedicated
to HTML development and use, http://www.stars.com is one of the very
best, replete with articles on advances in HTML and a wealth of
tutorials. Paratype can be reached by telephone at (007) 095 332 4001 or
on the Internet at either http://www.paratype.com or
http://www.paratype.ru
|
© Copyright Translation Journal and the Author 1998 Send your comments to the Webmaster URL: http://accurapid.com/journal/07translit.htm
|
|