<<

TECHNICAL UPDATE David Herren Middlebury College

WORLDSCRIPT, , all of the information that a computer stores AND NISUS is recorded in binary code, something which looks like this: 01101011. Why is it this way? WorldScript is a relatively new piece of Well, because computers really don't even the operating system that is par­ understand zeros and ones in the same way ticularly interesting to those who word that we do. They only know about whether process in foreign languages, and of even a tiny voltage is off or on, and humans un­ greater interest to those who work in the derstand this concept better in terms of less commonly taught languages such as zeros and ones. Now, if zeros and ones were Arabic, Chinese, Japanese, and Russian. all that we humans needed to keep track of, Unicode is a proposed ~~replacement" for that would be great, but again, it's not ''just ASCII. Finally, 'll take a look at a word pro­ that simple." At the very least, we need to cessing program for the Macintosh that keep track of the characters we use to write takes particular advantage of WorldScript. our languages. How, then, are characters stored other than as zeros and ones? WorldScript: What Is It? Part 1: Background To store the characters or letters that we use in our languages, the computer needs Well, I really wish I could just tell you to use a series of zeros and ones to repre­ 11 '\ and be done with it, but it's not just that sent the letters. Most current computers use simple," contrary to a well-known politician a single byte to do this. What is a byte? Easy. from Texas. Before I describe WorldScript, I It consists of eight bits. OK, while true, that want to cover why it was needed and some still doesn't tell you anything. Our number computing history. system is base 10. In other words, we count from zero to nine in a 11 COlumn" before Computers and Memory

The average computer today uses David Herren is Special Assistant in memory to store and work with informa­ Technology to the Vice President for Lan­ tion. This much is common knowledge. guages and the Director of the Language Remember, however, that a computer is re­ Schools at Middlebury College in ally a stupid box and the only thing that it Middlebury, Vermont. truly understands are zeros and ones. Thus,

Vol. 27, No.1, Winter 1994 93 Technical Update rolling over to the next highest unit of ten. 256 character system where we began our A computer is base 2 (binary}, so it counts discussion. from zero to 1 before rolling over, and a byte refers to the fact that the computer uses 8 Now along the way, some standards had columns of zeros or ones (8 bits), and can to be applied: 01001101 had to mean the thus "roll over" 8 times. same thing regardless of the brand of com­ puter, or we'd really have problems So, if a computer uses 8 bits to record transferring information. This standard is characters, like this: 01001010, how many known as the ASCII (American Standard possible characters can be represented in Code for Information Exchange) table, and this system? That too is easy (grin): it's 2 to almost everyone agreed upon its use. the 8th power, or 28 as mathematicians (Where IBM was when this standardization might write it. That means 2 times 2 times 2 occurred and why they chose a different times 2 times 2 times 2 times 2 times 2, or, table for their mainframe computers, EB­ 256 possible characters. Thus, in a one-byte CDIC, instead of ASCII, is a mystery to me). system for representing the characters of a language, we could have 256 different char­ What's the Problem? acters. This is quite enough for most western languages, but clearly inadequate for the Well, clearly the problem is two-fold. thousands of characters needed for Japanese The "A" in ASCII stands for American, but or Chinese. Once again, however, it's not the Russians, the French and others also "just that simple." Computers often used to wanted computers that would work with only store characters using 4 bits of infor­ their languages. By using a full byte to rep­ mation resulting in only 128 possible resent character sets, there were finally characters. This is due to historical reasons enough slots, but the problem of the many including the fact that computers were Eastern languages remained-256 charac­ largely developed in the United States and ters doesn't even come close to being not , and that memory used to be re­ sufficient for Japanese or Chinese. Apple ally expensive. One hundred twenty-eight Computer made foreign character sets a characters were still adequate for English central focus of development-the original since we only have 26lower case letters, 26 Macintosh could word process in any of the upper case letters, 10 digits, special charac­ common western languages right out of the ters like "<>/[]{}®#," and a handful of box, and could print those character sets marks. The computer itself without jumping through any special needed a few characters for computer stuff hoops. Japanese and Chinese, however, re­ like carriage returns, tabs, line feeds, es­ mained a problem. Enter WorldScript. capes, nulls, and several other really interesting things, totaling 32 items. Thus it WorldScript: What Is It? was possible to fit all of this stuff in 128 Part II: The Answer "slots"-a beautiful system until someone decided to work in Russian, or Polish, or WorldScript is a new feature of the Spanish, or French, etc. These languages Macintosh operating system version 7.1 either have more characters in the , which involves an to the system like Russian, or include those troublesome implementing two-byte character represen­ accented which the computer must tation. In other words, instead of having consider as separate characters. Enter the 8 only 8 columns of zeros and ones to repre­ bit system and suddenly we're back to the sent the characters of a language, a Macintosh running WorldScript uses 16 94 IALL Journal of Language Learning Technologies Technical Update

columns of zeros and ones. How many char­ a green crescent representing Arabic; and a acters are possible now? Well, 2 to the 16th small rising sun and Apple logo icon repre­ 16 power, or, 2 , or, 2 times 2 times 2 times 2 senting Japanese. times 2 times 2 times 2 times 2 times 2 times 2 times 2 times 2 times 2 times 2 times 2 To create texts using these alternate times 2, or, 65,536 possible characters. That scripts, all I have to do is open my should be enough for most languages, WorldScript compatible word processor (see though one report I've read said that Chi­ the final section on Nisus) and begin typ­ nese has something on the order of 80,000 ing. Since I generally work in English, my characters. I am hopeful, however, that computer assumes that I want to begin most 65,000 characters will be sufficient for most documents in English. To switch into Rus­ undergraduate students. sian, for example, I can do any one of three things: (a) I can select one of the Russian Do I Need WorldScript? fonts that came with WorldScript (older Russian fonts don't work correctly for the You need WorldScript if you work on a most part), and suddenly the icon in the Macintosh in English and one of the follow­ menu changes to the Russian flag and ing: Arabic, Chinese, Japanese, Korean, or my typing comes out in Russian; (b) I could Russian. Although some Russian colleagues select the Russian flag I find in the new flag may argue with me on this, if they would menu, or; (c) I could hold down the com­ adopt WorldScript, their font problems mand key {83) on the keyboard and press would go away). You may need WorldScript the spacebar. This last system will toggle me for other languages as well, but I'm just not through all of the installed scripts. I find this familiar with all the languages that don't to be the easiest way to change scripts since use alphabets or that read right to left in­ my hands are already on the keyboard. stead of left to right. When working in just two languages like Arabic and English, for example, toggling How Do I Use WorldScript? back and forth is quite easy. In the case of Arabic, the direction of your word process­ Using WorldScript is very easy. Once ing changes with each switch as well (for you've decided that you need it, you just example, left to right switches to right to need to acquire the necessary extensions, left).

\ scripts and keyboard layouts. (See the sec­ tion below and the section about Nisus for WorldScript comes with an assortment the easiest way to get these.) Drop all the of TrueType and bitmap fonts for each of icons on top of your .1 system the supported languages. It's intelligent folder and reboot your computer. You won't enough to know that if I choose a particu­ notice much that is different except for a lar Russian font earlier in the document, small blue diamond icon at the right end of then I'll probably want to return to that font the menu bar. Depending upon how many when I toggle back and forth from English scripts you've installed, you'll find several to Russian, and it does this for me each time selections in the new menu, probably start­ I press {83)-spacebar. ing with a small blue diamond icon, followed by the icons representing the lan­ The real beauty here is that one can word guage of any installed scripts. On my own process in just about any language without computer now, for example, I see a blue dia­ having to reboot the computer to run the mond, indicating that I have the .S. system Arabic, Chinese, or Japanese version of the and keyboard map installed; a Russian flag; Macintosh operating system. This makes it

Vol. 27, No. 1, Winter 1994 95 Technical Update much easier to use in laboratories since any­ Where Can I Get WorldScript? one can read the menus-regardless of the primary language of the machine, the menu There are at least two ways to get selections for switching to the language you WorldScript. If you work in Japanese and want always appear in your language Chinese, you'll have to buy the extensions­ without rebooting. they aren't free-though this might change in the near future. I had a heck of a time How Do I 'JYpe in Japanese or Chinese? finding the Chinese and Japanese exten­ sions, and so I am going to include the Typing in Japanese and Chinese is un­ official Apple part numbers here. Even if believably easy. In fact, one could type in your campus computer reseller is not famil­ simple Japanese without being able to read iar with WorldScript, these part numbers Japanese. All you'd need to know is the pro­ should get the right software: nunciation of the words you want to type. • Japanese Language Kit: M1648/LL/ A For example, to type Fujiyama, I type: "fu" • Kit: M2368LL/ A and the computer converts the letters "f" Pricing will vary of course, but these kits and "u" into the Hirigana character for the should be available for under $150 each. Be sound "fu." To continue, I type "ji" which prepared when they arrive-the Japanese is then converted to Hirigana, "" which Kit includes something on the order of 12 is also converted, and finally "rna." What megabytes of fonts and the whole installa­ appears on screen at this point are the four tion is approximately 17 megabytes. The Hirigana characters of Japanese which rep­ Chinese Kit is even larger depending upon resent the of the word. Of course whether or not one installs both simplified most Japanese wouldn't use the and/ or traditional systems, and both come representation; they would use the with several optional, and very large, characters (the borrowed Chinese charac­ TrueType fonts. The Chinese Kit was only ters). To convert to Kanji, all I do is press officially released in October of 1993 and the spacebar and those four characters are first appeared on the Apple Higher Educa­ converted to the two Kanji which represent tion price sheet on October 21, 1993. Fujiyama. In ambiguous homophonic cases, WorldScript allows the user to select the If you work in Russian, Arabic, or any particular Kanji they intend for the context. of the western languages, it's even easier to get WorldScript, and it's free-sort of. All The Chinese system is similar, in that one you have to do is buy an academic copy of can type in , a system for represent­ Nisus, the greatest word processor ever ing the sounds and tones of Chinese. Thus, written for the Macintosh (in my opinion, to type the character which represents the of course). See the section on Nisus for more act one performs when petting one's cat, I information. would type "fu3"-the sound "fu" and the 3rd . If I don't know the tone, I can just UNICODE: A STANDARD type "fu" and I'll be offered all the charac­ ters which are pronounced "fu" and I'll Unicode is actually a company, Unicode, have to select the correct one. The Chinese Inc., formed by a group of computer com­ system supports not only Pinyin entry, but panies whose members include Microsoft, Dayi, Parrot, Zhuyin, and input IBM, Aldus, NeXT, Apple, GO, Sun, Meta­ methods for Traditional Chinese, and phor, Lotus, Novell, and the Research Pinyin, Wubi Xing, Wubi Hua, and Qiiwei Libraries Group. This group has been work­ for Simplified Chinese. ing on a two-byte standard different from

96 IALL Journal of Language Learning Technologies Technical Update

WorldScript. WorldScript grew out of Nisus: The Incredible Word Processor earlier solutions for the Macintosh and Japa­ nese or Chinese, but is internally One Macintosh word processing pro­ "unicode-aware" so that migration to gram stands far above the others: Nisus Unicode won't be all that difficult once the from Nisus Software. Nisus is simply in­ standard is fully agreed upon. Reportedly, credible. At the time of this writing it is the however, Unicode will support only 27,000 only Macintosh word processing program possible characters, rather than the 65,000 capable of producing Arabic, Chinese, Japa­ \ theoretically possible. Nonetheless, it is rela­ nese, Hebrew, Korean, and Russian all in the tively complete and includes many same document using the English version mathematical symbols, shapes (like fancy of the Macintosh operating system and "bullets", etc.) and support for many lan­ WorldScript. (, unfortu­ guages including , Oriya, and nately, simply doesn't work properly with . (If you know what these lan­ WorldScript. For example, if you pull down guages are, write to me and tell me the font menu, the WorldScript fonts appear something about them.) For more informa­ as garbage characters. When you attempt tion about Unicode and the proposed to type in one of the installed scripts, they standard, contact the work in the entry window, but when placed at [email protected]. in the document, the returns to gar­ bage. You can type correct WorldScript in As an interesting aside, Apple's new Nisus, then paste it into Word and it appears handheld computer line, the Newton, is correctly. However, if you attempt to edit Unicode-based and not ASCII- based. Thus, it, it reverts to garbage characters again.) though the current 1.04 release of the New­ ton operating system doesn't implement Why is Nisus so different? Well, for one alternate script recognition, the core tech­ thing it's much, no much, MUCH faster nology is there and it shouldn't be too than Microsoft Word 5.1. It handles graph­ difficult to add support for languages other ics much better than Word and its graphic than English. When you also take into ac­ editor is better than Word's. It will place count the fact that the Newton's graphics behind your text, in front of your handwriting recognition system is at least text, within your text, or flow the text partly supported by carefully watching the around the graphics. It includes the most as you write upon its screen, sophisticated search and replace engine of you can begin to imagine what a wonder­ any word processor I have ever encoun­ ful tool it is for Chinese and Japanese. tered. For example, just the other day I instructed Nisus to search a 30-page docu­ Well, all of this is wonderful and you're ment of mine for all text between quotation probably ready to run right out and buy marks that appeared in the New Century WorldScript (or a Newton) to begin word Schoolbook font. I wanted Nisus to replace processing in multiple script systems. That that text with the same text found in each would be great except for the fact that to instance between quotation marks (i.., not the best of my knowledge, there is only one with a set phrase), but deleting the quota­ word processing program that fully sup­ tion marks and changing the font to ports WorldScript-Nisus from Nisus Chicago. All of this unless the text included Software (formerly Paragon Concepts). numbers. This feat was accomplished mak­ ing a few simple and intuitive menu selections, and took about 2 seconds. Nisus also implements true style sheets and not

Vol. 27, No. 1, Winter 1994 97 Technical Update just the paragraph formats supported by left text (including Hebrew) very well. In Word and called style sheets. Middlebury's School of Arabic, it is the stan­ dard. It is bundled with most Macintosh Nisus is also different because of its very computers sold in Korea and has dominated clever design. All Macintosh files have two the Far Eastern market as Microsoft Word "forks," a data fork and a . The and WordPerfect have in the West. designers at Nisus Software decided that Nisus files should contain nothing but pure Of course, the best thing about Nisus is ASCII text in the data fork of the file. For­ the way it works with the less commonly I matting information and graphics are stored taught foreign languages. It does. Word in the resource fork of the file and the Nisus doesn't, and Microsoft hasn't announced program uses the resource fork to display support of WorldScript. The WordPerfect the file in its formatted form. Most word Corporation has announced support of processing programs completely ignore the WorldScript but all reports indicate that resource fork and store all formatting and there are spacing problems with the two­ data in the data fork. (That's why when you byte character sets, and that it doesn't try to recover such a file when it becomes support right-to-left languages such as ­ corrupted, the data that you recover is of­ brew or Arabic at all. In addition to Nisus, ten filled with garbage-bad plan if you ask other programs which support WorldScript me). The real genius of Nisus' file format is include: HyperCard, the FirstClass telecom­ that absolutely any word processing pro­ munications system (we can send messages gram or text editor on any kind of computer in Arabic and Japanese with ease on our can read Nisus files directly. The files may FirstClass conferencing system), and a little not display all the special formatting, but known text editor, Style. they don't have to be converted before they can be read, and they won't be filled with Getting used to Nisus if you've been garbage characters or little boxes. (Nisus can using a different word processor will take a also open and/ or save files in other com­ few days. Fortunately, I found it to be much mon formats such as Word, MacWrite, or more intuitive than other word processors WordPerfect, and it supports ' XTND I've used and often I could do things from technology as well). a single menu that used to take me two or three dialog boxes. Any keyboard com­ Another very nice feature of Nisus is its mands can be changed or added, and Nisus size. On my Macintosh, Word took just over will even print a list of all of your keyboard 6 megabytes of disk space. For that 6 mega­ shortcuts. bytes I got a spelling checker in English, a tutorial, some sample documents, and the The only drawback I see with Nisus is application. My complete installation of that it is copy-protected for the "complete Nisus is under 5 megabytes, but it includes flag" edition-the version which works the spelling and thesaurus dictionaries for with all languages. In order to save or print, English, Spanish, French, Italian and Ger­ one has to install a hardware key on the man, a tutorial, sample documents, and the Macintosh. That key is commonly referred application, which is only 564K. If you're to as the"dongle" and it plugs into the same tight on disk space, Nisus can run from a port of the Macintosh as the keyboard. The single BOOK floppy disk. keyboard cable then plugs into the dongle. Without the dongle, Nisus works as a demo Nisus is very popular in the Arabic­ only and you can't save or print. In practice speaking world because it handles right to the dongle hasn't been a problem for me

98 IALL Journal of Language Learning Technologies Technical Update

since I plugged it in and forgot about it. If foreign language dictionaries for one you travel with a powerbook, however, foreign language (other than Chinese and you'll have something else to carry around. Japanese, to which Apple retains exclusive Still, if you work in English and Japanese, license). it's the only game in town. Nisus is avail­ able from: Nisus Software, 107 South Cedros Contributions/suggestions for the "Techni­ Solana Beach, 92075; (800) 922-2993. If cal Update" column may be sent directly to you indicate an academic connection, the David Herren. Mailing address: The Language \ cost is $99 and they will include all of the Schools, Middlebury College, Middlebury, VT necessary WorldScript extensions and 05753; email: [email protected]

Vol. 27, No. 1, Winter 1994 99