Software and Technology Review
Multilingual Computing in Middle East Studies

Josef W. Meri, University of California, Berkeley

Reprinted from the Middle East Studies Association Bulletin, Vol 34, No 1, Summer 2000 (with changes in orthography to HTML standards).
Copyright 2000.

Linguist’s Software Semitic Transliterator (www.linguistsoftware.com)
P.O. Box 580, Edmonds, WA 98020-0580, U.S.A., 1-425-775-1130, fonts@linguistsoftware.com; MRP: USD 99 (6 typefaces for USD 249.95)

Microsoft Products: (www.microsoft.com)
Microsoft Windows 2000 Professional Operating System
, Academic Edition (students, faculty, staff); MSRP: USD 139 (upgrade from Windows 95,98/ME); USD 199 (Full) Market Price: USD 129 (upgrade from Windows 95, 98)

Microsoft Office 2000 Professional Academic Edition (students, faculty, staff); Includes Word, Excel, Outlook, PowerPoint, Access, Internet Explorer 5; MSRP: USD 199 (Full), Market Price: 179 (Full)

Microsoft Office 2000 Proofing Tools MSRP: USD 70

FRUSTRATION CHARACTERIZES the experience of academic PC users of Middle Eastern languages who need to work with several Middle Eastern and European languages in a single word-processing document and who need to exchange documents with colleagues. Colleagues collaborating long-distance on projects have had to contend with converting unintelligible Arabic, Hebrew and Persian documents and e-mail. Equally frustrating is deciding which transliteration font, word processor or operating system to use for Middle Eastern languages while ensuring that colleagues can open, edit and print your documents. Promising solutions and products of the 1990s have failed to provide the level of multilingual text editing capabilities required by academic users. Until recently, those who required extensive use of more than one RTL (Right-to-Left) Middle Eastern language (Arabic, Hebrew, Persian, Syriac and Ottoman Turkish) had to employ such self-contained solutions as Macintosh with World Script add-ins and word-processing programs like Nisus Writer or otherwise, bug-ridden archaic PC word processors and add-in programs sparse in features and capabilities and requiring internal fonts incompatible with more universal applications such as MS Word. Now academic users have a real alternative with Microsoft Windows 2000 Professional operating system and Microsoft Office 2000 Professional. In addition to reviewing these and other software products, this essay explores a number of recent innovations and solutions in software technology, which will benefit non-specialist and specialist users.

The Unicode Standard
While the issue of international standards in multilingual computing has no direct bearing on the average academic user who desires practical solutions that work, such a user may be left behind as colleagues, publishers, and academic institutions adopt new and more efficient standards in word-processing and the exchange of data. The internationally recognized Unicode encoding standard (www.unicode.org), a means of representing the world's modern and ancient languages in a single character set (currently supporting approximately 6,000 languages, including the International Phonetic Alphabet and special characters for transliterating Middle Eastern languages) has rendered obsolete the 256 character ASCII standard.1 Unicode includes support for Arabic and Hebrew scripts with additional support for Persian, Ottoman Turkish, Urdu and Yiddish characters. Support for ancient Near Eastern scripts like Egyptian Hieroglyphics and Ancient Aramaic is forthcoming. Users of Syriac and Windows 2000 will benefit from the eleven free Unicode-enabled fonts (beta release) available through the Syriac Computing Institute. However, users need to register on-line: (www.egroups.com/group/syrcom). Further information on the Syriac project is available in the July 2000 electronic journal Hugoye: Journal of Syriac Studies (syrcom.cua.edu/Hugoye). Formal adoption of the Unicode standard by software manufacturers is beginning to produce significant results. No longer are the multilingual needs of academic users being ignored. Windows 2000 Professional (reviewed below), Microsoft's flagship operating system intended for organizational users, is the first operating system to effectively integrate Unicode multilingual support. This OS contains a built-in Unicode script processor called Uniscribe, which supports complex tasks relating to how particular characters are displayed. PC users can receive and convert Macintosh Arabic and Hebrew documents on a Windows 2000 platform. Word 2000, part of MS Office 2000 (reviewed below) is designed to work with Unicode-enabled as well as traditional TrueType and Bitmap fonts. The reader should be warned, however, that not all software packages provide Unicode support. For example, it is still not possible to enter transliterated characters for Middle Eastern languages and Arabic and Hebrew scripts in bibliographic software like EndNote. Although web-based library and bibliographic databases are common, Unicode support for transliterated Middle Eastern languages along with Arabic and Hebrew scripts has yet to be implemented. Nevertheless, with many library database formats available, the most effective way to support the display of non-Roman and transliterated scripts is through adopting the Unicode standard.

In the field of academic publishing, full implementation of the Unicode standard is a year or two away as desktop publishing applications do not provide multilingual Unicode support. Publishers are receptive to this standard and realize the potential for making the publishing process from the manuscript stage to the printing process more efficient and cost-effective.

Windows 2000 Professional (NT 5)
Gone are the days of the invidious blue screen system crash to which Windows 95 and 98 were susceptible. While Microsoft Windows 98/ME (Arabic, Hebrew and enabled versions) was adequate for users who required only one RTL language, Windows 2000 has broken the language barrier with its implementation of the Unicode standard and support for a wide range of Windows, PC and Macintosh encoding formats. Currently with support for 120 language groups (this includes variations on Arabic, English, French and German, and so forth) from Afrikaans, Arabic and Basque to Sanskrit and Thai, Windows 2000 currently offers the best solution for those who need to use more than one Middle Eastern language. Syriac also will officially be supported in MS Office 10 and in the Windows NT Operating System slated for release next year. No other operating system offers the functionality, multilingual capabilities and the versatility to exchange documents in various encoding formats, which make texts readable.

Microsoft Windows 2000 is essentially an operating system for organizations. Indeed, most users will not have reason to take advantage of the many administrative and utility features packed into the OS. However, it has much to offer academic users and institutions. A multilingual version intended for academic institutions is slated for release in the second quarter of this year. The only difference between this and the Professional release is the ability to change the user interface and menus. However, for most individual users this feature is not needed. Windows 2000 can be deployed in language and learning labs, particularly for advanced students who need to produce essays in Middle Eastern languages as well as for visiting scholars who may prefer working in their native language. Windows 2000 is also an ideal platform for the development of language learning applications, which take advantage of Unicode support and the advanced features of Windows 2000.

Academic institutions in the Middle East will find the multilingual or localized versions of Windows 2000 (Arabic, Hebrew, Turkish) to be superior to Windows 98/ME in every respect. Most users who only use either Arabic or Hebrew along with European languages will continue to benefit from Windows 98/ME Arabic or Hebrew. Unfortunately, users running Word 2000 under Windows 98/ME cannot easily use Persian and Hebrew. In Windows 98/ME, languages are OS dependent. Additional language groups can be installed as needed through Regional Options in the Control Panel folder (e.g., Arabic [includes Persian and Urdu], Armenian, Greek, Hebrew, Turkic, and so forth). Additional input languages and keyboard layouts can be added by clicking the “Input Locales” tab.

Before upgrading to Windows 2000, users should consider the hardware requirements. A minimum configuration of 64 MB RAM and at least a Pentium 233 and 2-3 Gigabyte hard drive are required even though Microsoft recommends only a 2 GB hard drive and a Pentium 133. While it is possible to install Windows 2000 over Windows 95 or 98, a clean install is highly recommended.

Transliteration
Many of us have experienced the difficulty of using extended ASCII transliteration fonts and converting them for simultaneous use on Macintosh and the PC. Interchangeability and cross-platform exchanging of documents containing complex scripts was virtually impossible. Currently, one of the best commercial transliteration fonts available for Arabic, Hebrew, and Persian is Linguist’s Software’s Semitic Transliterator which is available in six different typefaces. Linguist’s has been dedicated to providing specialized fonts and other products along with reliable technical support to linguists, Semiticists and scholars of Biblical languages. Although not Unicode-compliant, Semitic Transliterator, which was updated for compatibility with the latest application software in 1999, is useful for scholars of Biblical languages and Ancient Near Eastern languages such as Akkadian. The installation diskettes come with additional transliteration fonts and keyboards drivers for Akkadian. Although the set-up process can be discouraging for the computer novice, Linguist's simplifies this process by providing two detailed step-by-step manuals covering installation and troubleshooting issues. In addition to the font, Linguist's keyboard drivers need to be installed and Linguist’s provides a keyboard chart showing the key combinations. Unfortunately, the updated TrueType font displays poorly in Word 2000 in 12 point at 100% magnification. Working at 125% or higher magnification is recommended. As with older versions, the updated version of Semitic Transliterator requires installing Linguist's keyboard drivers and disabling AutoCorrect features in Word 97 and Word 2000. Despite this drawback, laser printing produces high quality output. Experienced users of Windows 2000 need not install the accompanying keyboard drivers, but instead may opt to access the special characters by creating a simple macro using the Symbol menu option in Word 2000 and assigning keyboard shortcuts. Although a Macintosh version of this font is available, the character mappings are not identical and finding and replacing characters in converted documents is not a simple process. While Semitic Transliterator is not Unicode-compliant, another font is.

Monotype Corporation’s Arial Unicode MS (Helvetica style), which is included with Microsoft Office 2000, accommodates the scripts of many of the world's languages. Arial Unicode MS produces high quality output to screen and printer. Unfortunately, Microsoft and Monotype currently do not have plans to add the necessary Unicode ranges to Courier New, Times New Roman and other standard fonts. Arial Unicode MS does include IPA and transliteration characters with macron, breve, circumflex, caron, dieresis, dot and other diacritic marks. It also includes all the characters for transliterating Arabic, Hebrew, Persian and Ottoman Turkish. Nevertheless, accessing these special characters even in Windows 2000 requires a little effort. The easiest way is to create keyboard shortcuts (assigning Alt, Ctrl, Alt Gr and Shift key combinations to a character) in a new document using the Symbol sub-menu item on the Insert menu and saving the modified configuration as a Word template. Fortunately, users need not disable AutoCorrect features. Arial Unicode MS also works well in MS Access 2000 with a keyboard macro. One foreseeable problem with adopting a new transliteration font is converting word-processing files that contain another font. There is no easy solution. One might create a find-and-replace macro in MS Word 2000 or perform a global search and replace. In order to avoid unintentionally replacing characters from European languages, it is best to approve each change. It is to be hoped that font manufacturers in conjunction with academic institutions will continue to produce Unicode-enabled fonts to meet the multilingual needs of the academic community.

Microsoft Office 2000 Professional
Office 2000 Professional includes a word processor, database, presentation manager, spreadsheet and e-mail applications in addition to the multilingual R-L capable Internet Explorer 5 web browser (also downloadable for free from the Microsoft web-site). All Office applications allow the input of Arabic and Hebrew. This review focuses on two of these applications—MS Word 2000 (word-processing) and Outlook 2000 (e-mail). In the past, PC users have used programs containing built-in Arabic and Hebrew fonts (incompatible with Microsoft Word or add-in programs) which allowed the importation of a limited amount of text in a Middle Eastern language. Word 97 and Word 2000 as stand alone applications with Windows 98/ME make it possible to use only one Middle Eastern language apart from Turkish. For Arabic or Hebrew this meant using enabled or localized versions. RTL languages are operating system dependent. The powerful combination of Office 2000 and Windows 2000 allows users to customize the user interface via the Microsoft Office Language Settings panel (accessible through the Windows Start menu) in one of sixty-three languages and variations, including Arabic and Hebrew. Institutional users can acquire the MultiLanguage Pack (a set of seven CDs available through Microsoft institutional licensing programs) with which they can install proofing tools (spelling checker, grammar checker, hyphenation, thesaurus, translation dictionary for Arabic and Hebrew, and localized templates) and change the interface language (display, menus and help features) of all Office 2000 applications except for non-Arabic and Hebrew versions of Outlook (e-mail). Individual users can purchase Microsoft Office 2000 Proofing Tools from Microsoft’s on-line shop or an academic retailer for around USD 70. Unlike the Multilanguage Pack, the Proofing Tools do not include the ability to customize the interface language. Among the most useful Office language features is the translation dictionaries (Arabic-English, English-Arabic, Hebrew-English, English-Hebrew). While the Hebrew and Arabic spelling checker is fairly impressive, the grammar checker needs improvement. For instance, it cannot detect errors in subject-verb agreement. The English (US) version of Office 2000 comes with proofing tools for English, French, and Spanish. In switching languages, Word 2000 automatically detects the text language of a document. With Windows 2000 and Office 2000, users can choose from over fifty Arabic/Persian and fifty Hebrew fonts including OpenType face (OTF) fonts, the latest in font technology, which contain a number of languages. OpenType fonts include Arial, Courier New, Lucida Sans Unicode, Tahoma and Times New Roman. MS Office 2000 users can also install a visual keyboard, which can be displayed using any font in all MS applications. Clicking on a key will copy the character to the application window. Users who only require Arabic or Hebrew can purchase an Arabic or Hebrew localized version of Office 2000 instead of the additional proofing tools.

In Word 2000, the user can open and save documents in various PC and Macintosh formats with conversion to and from Macintosh Arabic and Hebrew formats (though no converters are currently available for NisusWriter documents). Select “Save As” from the file menu and select “encoded text” under “Save as type.” Files can be opened the same way by first ensuring that conversion confirmation is checked in Word Options and then selecting the appropriate encoding. For Arabic and Hebrew Macintosh, document formatting is lost in the process. The user can assign names to Documents with Arabic, Hebrew, Persian, Turkish and the other languages installed on Windows 2000. Office also comes with a stand-alone code page and text layout conversion utility (Conv Text), which includes a variety of PC formats for converting Arabic and Hebrew text files.

Outlook
Sending Arabic and Hebrew e-mail has never been easier with Outlook 2000. Using Outlook 2000 will facilitate closer cooperation among colleagues in the Middle East and elsewhere. This e-mail program comes with an impressive repertoire of productivity features, including an appointment/meeting calendar with international holidays, task manager and address book. Users who wish to send multilingual text can do so using a Unicode-enabled font such as Arial Unicode MS, Times New Roman or by attaching a Word 2000 document to the e-mail message. In addition to Unicode (UTF-7, UTF-8), Outlook supports sending and receiving e-mail in Arabic (ISO, Windows) and Hebrew (ISO-logical, Windows) encodings. Colleagues who do not own Outlook 2000 can use the free scaled-down version—Outlook Express (downloadable from the Microsoft site) on an Arabic or Hebrew-enabled platform or Windows 2000. The principal fault with Outlook 2000 is that unlike other MS Office applications, its visual display is not Unicode-based and cannot be updated with the institutional Multilanguage Pack. The address book in non-Arabic or Hebrew editions also cannot store Arabic and Hebrew names. Although MS Office 2000 Professional with the Proofing Tools is a bit pricey, for someone working in multiple Middle Eastern languages, they are worth it.

Conclusions

Adoption of the Unicode standard in institutional computing and in the publishing market will lead to further globalization of informational technology. Users desire the ability to access databases and download multilingual text in universal and readily exchangeable formats. Companies like Microsoft have made this possible. A pressing need exists for Unicode support in bibliographic software as well as library databases and for users to be able to access and exchange data in as few formats as possible. With the adoption of these innovative solutions, colleagues in the Middle East and the West will be able to work more closely on collaborative projects.

1 The author would like to thank Dr. Kenneth Whistler of the Unicode consortium and Sybase for this figure.