<<

INTERNATIONALIZATION April/May 2007 GGETTINGETTING SSTARTEDTARTED: Guide

Think Internationalization ® in Everyday Design

Change Your Encoding, ® Change Your Company

Unicode 5.0 ® From 50,000 Feet

Pierre Cadieux: A Career ® in Internationalization

New Internationalization ® Features of Microsoft Vista

0011 CoverCover #87#87 Intrntztn.inddIntrntztn.indd 1 44/5/07/5/07 77:27:31:27:31 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

Getting Started: In fi ve years, how much has your life changed? Since Internationalization we published our fi rst Getting Started Guide for inter- nationalization in 2002, changes in internationalization Editor-in-Chief, Publisher Donna Parrish have been coming at the speed of the internet. Unicode has proceeded from version 3.0 to Managing Editor Laurel Wagers Translation Department Editor Jim Healey 5.o. The .NET platform has grown exponentially and moved into version 2.0. Perhaps half a Copy Editor Cecilia Spence billion more people have internet access as China and other countries get online. News Kendra Gray But some things haven’t changed. If you want to sell products and services in other cultures, Illustrator Doug Jones these questions still apply. Is it a good product? How adaptable is it for other languages and Production Sandy Compton cultures? Can you easily change the language and script of the display or written text? Can you Editorial Board change culture-specifi c graphics to adapt to meaningful images for other cultures? Can your Jeff Allen, Julieta Coirini, product handle other number, currency, date, time, address and telephone number formats? Bill Hall, Aki Ito, Nancy A. Locke, Ultan Ó Broin, Angelika Zerfaß In this Guide, we update your knowledge of internationalization — questions to ask, points Advertising Director Jennifer Del Carlo toCONTENTS keep in mind, pitfalls to watch for, checklists for action and what resources are available. Advertising Kevin Watson, Bonnie Merrell And brace yourself for the changes coming! Webmaster Aric Spence The Editors Assistant Shannon Abromeit Intern Callie Welch Think Internationalization in Everyday Design Special Projects Terri Jadick Alan Horvath Advertising: [email protected] page 3 www.multilingual.com/advertising Alan Horvath, managing director of STAR Group America, LLC, has a B.S. 208-263-8178 in business administration with a major in computer and information science. Subscriptions, customer service, back issues: Change Your Encoding, Change Your Company [email protected] page 5 Adam Asnes www.multilingual.com/subscribe Adam Asnes is founder of Lingoport, which develops Globalyzer Submissions: [email protected] software and provides internationalization development services. Editorial guidelines are available at Unicode 5.0 From 50,000 Feet www.multilingual.com/editorialWriter Reprints: [email protected] Richard Gillam page 7 This guide is published as a supplement to Richard Gillam is author of Unicode Demystifi ed: A Practical Programmer’s Guide MultiLingual, the magazine about language to the Encoding Standard. He works on the Global Name Recognition team at IBM. , localization, web and Pierre Cadieux: A Career in Internationalization international software development. It may be downloaded at www.multilingual.com/gsg page 10 Nancy A. Locke Nancy A. Locke is a freelance translator, localization educator, multilingual desktop publishing specialist and a member of the MultiLingual editorial board. New Internationalization Features of Microsoft Vista page 13 Bill Hall Bill Hall is an internationalization consultant, author of Globalization Handbook for the Microsoft .NET Platform and a member of the MultiLingual editorial board.

page 2 The Guide From MultiLingual

0022 TOCTOC ##87G.indd87G.indd 2 44/5/07/5/07 111:49:081:49:08 AMAM INTERNATIONALIZATION

GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION Think Internationalization in Everyday Design

ALAN HORVATH

ntering the global marketplace can The first approach involves making sepa- 4) Concatenation. String concatenation be a daunting task for any company. rate copies of the software and replacing is another source of problems for localiza- EIt can be expensive and time consum- the terms and symbols with ones appro- tion. Word orders and plural forms might ing, so proper planning and execution are priate for each country. be different in other languages. Adjectives critically important. It is essential to under- But what happens if a bug is reported? might not agree in gender, number and stand the complete globalization process Now you have multiple versions of code to case, thus creating catastrophic results. and this important fact — internationalize correct! And if you decide to add a new fea- 5) Ambiguity. Keep translators in mind first, then localize! ture to your product, you’ll have to make when writing the resource files. Add com- This article describes some of the most changes to all versions. ments when a sentence could be ambigu- common challenges within the internation- For companies that intend to enter mul- ous. It will improve the quality as well as alization process and how you can avoid tiple markets, this method is out of the reduce the amount of time needed for time-consuming errors. question. It underestimates the time need- translation. ed to modify files that were never meant to 6) Expansion. Keep in mind that trans- A matter of words be localized. lated strings may expand compared to the So many words are used to describe the And there are other issues to consider. source-language string. This will be par- actual process of preparing products for Does your code support Unicode so that ticularly important with dialog boxes and foreign markets that it is easy to become you can handle a variety of scripts as eas- menus. Leave at least 20% to 30% expan- confused. Here are a few definitions to ily as your native language? Does your sion room. help clarify the process. code correctly search and sort characters 7) Design. Be careful when you create Globalization (g11n) includes a compa- in all of the languages you expect to use? icons and bitmaps. You should avoid text ny’s decision to enter foreign markets, in- These issues and many others must be in either because the translated words ternationalize its products and localize its addressed before entering international might not fit, and the cost of conversion products. markets. can be expensive. Avoid any symbols with Internationalization (i18n) is the process The second and preferred approach cultural connotations. They might be inde- of designing a product such that it can han- consists of internationalizing your soft- cipherable or offensive in other countries. dle multiple languages, cultural conven- ware for the global market first and then If required, make sure your product runs tions and local infrastructures without the proceeding with a more streamlined local- on different platforms — PC, Mac, Linux need for re-design. ization process. and so on. Localization (l10n) is the process of Therefore, before you begin translation 8) Terminology. Check for terminology physically, culturally and/or linguistically of your product’s interface, text and files, consistency. If you are not consistent in adapting a product for a target locale. you need to ensure that the product is in- your software, the rest of the package will Translation is the process of converting ternationally ready. be even more inconsistent. Terminology words from one language to the other. An management tools will help keep terminol- experienced translator will be able to con- Ten tips for software internationalization ogy consistent. vey the technical details accurately while Software is a component that requires 9) Locale testing. Check your code in the instilling native nuance and style in the a great deal of attention. Once you have destination market to ensure that all locale translated text. made the decision to go global, every de- issues are handled correctly. sign decision — either with the code or the 10) Translation kit. Once your product is Software internationalization user interface — will be affected. ready for translation, you should create a Let’s say you’ve written a software appli- 1) Use Unicode functions and methods. translation kit. This set of files should con- cation for sale in the United States. Through- 2) Third-party tools. Choose your tools tain everything needed to translate and out your code, you’ve hard-coded terms carefully. Some tools, APIs and add-ins recompile the language resources as well such as dollars and cents. You’ve also hard- might not support Unicode. If you must use as test the application. coded symbols such as $ and used a period them, use the right character conversions. You should also consider legal issues (.) for the decimal point and a comma (,) to 3) User interface separation. Isolate that may arise as you enter different mar- separate numbers into groups of three. Now your translatable resources. Hard-coded kets. For example, there have been in- your company wants to market your soft- strings are very tricky to translate, and, stances where software companies had ware abroad, but the terms and symbols you since the code is constantly changing, they their terms and agreements embedded in used are inappropriate for other currencies. cannot be translated in parallel with the the product, but some of the terms were There are two ways to prepare to enter software code development. Maintain one determined to be illegal in different juris- foreign markets with your software product. core code base for all of the languages. dictions. In many cases, the company’s

April/May 2007 • www.multilingual.com/gsg page 3

003-043-04 Horvath.inddHorvath.indd 3 44/5/07/5/07 77:40:05:40:05 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

attorneys were required to define new In addition, if you plan on targeting Asian Electronic vs. printed publishing. The clauses for certain parts of their contracts. countries, you might have trouble finding obvious reason for electronic publishing Therefore, you should consider the legal a vendor with in-house expertise using the is cost. It is much cheaper to create, pack- circumstances in all target markets before desktop publishing tools you selected due age and ship a CD/DVD than a box of 15 beginning the translation process. to the high price of the Asian software ver- manuals. The electronic book can always Also note, software containing encryp- sions and the software’s complexity. be printed if needed. Nevertheless, parts of tion technology can be subject to Nevertheless, if you decide to use these your document set, such as a “Setup Guide” restriction, while communication software applications, you should keep the follow- or an “Administrator’s Guide,” should still may be subject to telecommunication reg- ing guidelines in mind: be provided in print format. ulation in the target country. • Avoid creating unlinked text boxes One-to-one page correspondence. This throughout the document. Some files that is an option that is rarely used but might Multibyte character support we have processed in the past contained make sense for your organization. Its main Delivering translated products to Asian hundreds of unconnected stories, and pro- objective is to streamline the support op- countries for the first time can be an excit- cessing the files with some translation tools eration. The example used is that some- ing and challenging time for organizations, becomes very cumbersome. Instead, flow one in a foreign country has a problem but getting it wrong can be very costly. your text from one box to the other. When with your application. First-tier support English software typically uses about the file is translated, the expanded text will in that country cannot solve the problem 100 different characters to represent words move from one box to the other automati- and needs assistance from the next level. and numbers. Asian languages, on the oth- cally, even from one page to the other. If you maintain a one-to-one page cor- er hand, can use more than 10,000 symbols • Leave plenty of white space (20% to respondence, everyone can be looking to display messages. 30%) in the pages to allow for expansion. at the solution on the same page of your To facilitate this, software systems used It is a good rule for all types of documen- document. This approach also allows you to use what is termed multibyte or double- tation, but particularly important with to create standardized packaging mate- byte character systems (DBCS) to store text/ marketing material because the number of rials since all your manuals will have the characters. Today, Unicode has replaced pages for such documents is usually fixed. same dimensions. This option, however, these systems, but be aware that occasion- • Be careful when selecting screen- requires more planning and may require ally you may need to provide multibyte char- shots for your marketing material. The you to sacrifice aesthetics. acter support. component that you choose might be the An alternate approach to document A common error in software development last piece translated, and the localized internationalization. Consider using an is to use third-party software — software screenshot might not be available at the information management system as an libraries, DLL, OCX and so on — that are not time you go to print. alternative to the traditional approach Unicode enabled. Sometimes the develop- • Make sure that any in-line graph- to document creation and management. ment teams do not realize this until they be- ics are anchored to the surrounding text In these systems, information can be en- gin the translation process. As a result, the so that they can move when a paragraph tered simultaneously from any number of product then has to be re-engineered to fix goes, for example, from one page to an- locations in any number of languages. In- the issues at additional expense. other. If you are designing long technical formation is entered and stored in a pure documents, you have more choices. You and intelligent way, with no regard for Documentation internationalization can still use applications such as InDesign final layout. The information exists only Documentation internationalization does or QuarkXPress, but it is not recommend- once and can be used to produce any num- not require as much planning as does soft- ed because they lack the large-document ber of publications in virtually any form ware internationalization. Some simple handling features of applications such as — such as Help, HTML, DOC, XML — and guidelines can help you prepare docu- FrameMaker or Word for Windows. in any language. By the very nature of this ments that can be easily localized. Graphics. Remember that the outsourc- approach, information that is created and Desktop publishing software. With so ing of graphics creation and translation is managed in this way is already “interna- many choices available, it can be difficult expensive, particularly in Asian languag- tionalized.” The result is an overall reduc- to choose the appropriate desktop pub- es. However, careful planning and clever tion in translation/localization costs with lishing application. Your choice will be af- design of your graphics can eliminate a big a simultaneous increase in quality and fected by the type of documents you are part of your graphic localization costs. Re- consistency. creating, the languages that you are tar- moving the text from the actual graphics geting and whether or not translation tools can reduce your costs by more than 95%. Summary will be used. You will also have to decide Remove all localizable callout text from Product internationalization is the most if the documents will be printed, electroni- graphics and include it in the documen- important step in the globalization pro- cally published or both. tation’s text so that it can be added to a cess. Products must be designed to handle If you are designing marketing docu- translation memory for re-use. Replace the multiple languages, cultural conventions ments, applications such as FrameMaker, callout text with numbered (not lettered) and local infrastructures without the need InDesign and QuarkXPress produce good callouts, arranged clockwise on graphics, for re-design. Incorporating these guide- results. The major drawback is that the pro- and cross-reference the numbers to the lines into your internationalization strat- cess needed to translate these documents text into the main document. This allows egy will save you time and money and go is more complex. Leading TM tools do an automatic re-use of the same graphics in a long way in streamlining an otherwise excellent job of handling these file types. all localized versions. complex process. G

page 4 The Guide From MultiLingual

003-043-04 Horvath.inddHorvath.indd 4 44/5/07/5/07 77:40:29:40:29 AMAM INTERNATIONALIZATION

GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION

t’s a mark of greatness when a company can effectively develop products and Change Your Encoding, Icompete worldwide. Yet software inter- nationalization is often one of life’s pain- Change Your Company ful forgotten labors that suddenly intense and panicked attention as it leaps ADAM ASNES out and grinds globalization plans to a halt. You’d think that enabling technology so that it’s easily leveraged for any market oppor- tunity would be a pretty glamorous and ex- 1. Somebody sold something — there • What is the result for your company’s citing pursuit. With rare exception, the first, has been some new marketing partner- equity value by stretching into new mar- second or twentieth-plus time a company ship or a new powerful customer oppor- kets effectively? does this is still a painful effort that holds tunity that requires multi-locale support. And, of course, to get an effort funded, back global top-line revenue opportunities. A classic example is that the company these bottom-line business issues need to Of course, it doesn’t have to be this way gains a business contract that will neces- be answered: — but when you look at the nature of how sitate supporting Japanese or another • What will it cost? software is actually developed and comes language. In some cases we’ve seen new • How long will it take? to market, unless internationalization is a license deals for entire countries, such as • Who is going to do the work? very firm requirement at the project’s out- in health care or education. It’s a big hurry • Do we have to give up other feature set, it shouldn’t be a surprise that it gets up to meet the customer demands. requests to prioritize internationalization? overlooked until it’s an ugly problem. This 2. Localization is realized as a competi- • How can we improve the process? is not one of those “if only everyone always tive necessity. Perhaps the company has • What expensive surprises do we need internationalized” diatribes. I hope to de- already invested in global sales efforts and to watch for? scribe the business issues around interna- finds growth is limited given a poor compet- • How do we maintain the internation- tionalization, including the fundamentals itive position without internationalization. alized product going forward? of what it does for a company, the competi- 3. A global company has just purchased Learn your CFO’s language. He or she will tive implications, funding the effort, and another company or intellectual property want to understand the return on investment managing and maintaining global market and wants to make the new product use- and may consider amortizing the effort as a requirements. ful for its worldwide sales efforts and expense. The decision isn’t about the One point I want to get beyond quickly is product line. technical issues of bits and bytes. the belief that you can just force your trans- 4. The CEO is mandating a new global Globalization is never just about one lations without internationalizing software initiative. This is an important new step for customer, sale or “language.” It’s a new first. I get asked about this a few times per the company’s evolution. You can’t go to a engineering and company process that month, often by managers who are even in management conference these days with- opens opportunities. the localization business. Incidentally, de- out hearing about globalizing revenue op- velopers never ask this. portunities and for good reason. Development issues In the case of some limited products, it At Lingoport, we’ve picked up the pieces may be possible not to internationalize, but Top-line and bottom-line considerations enough to see that internationalization it’s a bad idea anyway. You risk having a Software internationalization can have projects have been typically frustratingly product that doesn’t work or works poorly. dramatic effects on top-line revenues late, as in quarters to years, and rife with In the best scenario, your software can’t and revenue plans as well as on bottom- cost overruns. be leveraged across markets or even main- line costs and profitability. It’s never just Managing an internationalization effort, tained from release to release. Not interna- about minimizing a cost. You have to look especially for the first time, can be challeng- tionalizing is like throwing lots of money and at the whole picture to calculate return ing for a development team. Given the top- resources away for an inferior result which on investment and in terms of long-term line sales and marketing objectives, there’s has no future. For complex applications, it’s changes in process. typically a shortage of time. Compounding simply not going to work. Developers nearly When valuing any internationalization this is that understanding the scope of re- always accept this, but it’s an abstract con- effort, give attention to top-line business quirements and detail of tasks for an inter- cept which management can have trouble questions such as these: nationalization effort is not an obvious thing understanding. Internationalization, when • How much does your company have for your development team. It’s tempting for done well, allows you to support any locale riding on success in its target markets? many teams to just start out looking at em- requirement quickly. You have one product • What are the revenue projections over bedded strings as the most obvious prob- to support over time that’s good for the one, two and more years? lem. While they are important and can be whole world. Your translations are easily • What is the top-line cost of not hav- tedious, there’s much more involved. The is- updated from release to release. ing a product ready for a specific market sue of who does the work isn’t always obvi- opportunity? ous or easy. Chances are your development Business issues • What is the impact to your company’s team members aren’t sitting on their hands Several common events push a company strategic partners or sales force if a prod- looking for something to do. You’ll need to to expand its product development to in- uct doesn’t work well or isn’t ready for a balance inter nationalization demands with clude locale supporting requirements. particular market? new feature development, too.

April/May 2007 • www.multilingual.com/gsg page 5

005-065-06 AsnesAsnes #87G.indd#87G.indd 5 44/5/07/5/07 77:41:24:41:24 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

Building requirements starts with identi- planning expenses can be challenging. expenses. We now offer it as a standalone fying target locale requirements. The obvi- Architectural changes, third-party prod- product and adapt it for scalable use among ous issue is language, but there’s more to it uct issues such as graphics and reporting large development teams. regarding culturally sensitive formatting of tools, installers, databases and more must Any tool won’t help you find what’s not issues such as dates, times, numerical val- be accounted for. You have to find and fix in your code. You still have to be savvy ues, addresses and more. Changes to your internationalization issues buried within with your architecture, with a long-term database tend to have far-reaching effects your hundreds of thousands to millions of eye towards your product life cycle. into your application. Changes to program- lines of code. Without a strong detection, ming logic or the graphical user interface extraction and refactoring tool, this alone Localization, testing and beyond further complicate things. As you might has the potential of being an error-prone Unless your company is only internation- imagine, clearly built requirements will be and time-consuming iterative process. Em- alizing to support managing multi-locale an important pivot for all your efforts. Make bedded strings must be quickly and easily customer data but not localizing the data- sure that whoever is working on this isn’t distinguished and filtered from program- base, you’ll likely be interested in when the doing it for the first or second time. There matic elements such as debug statements role of localization comes in. It’s quite rea- are so many pitfalls that even globaliza- or SQL queries. Externalization should be sonable to dovetail string extraction efforts tion architects who have led similar efforts automated to avoid further human error with your localization vendor so that your many times are still learning. potential. Every programming language localized releases aren’t dependent on first For some companies, internationalization has its unique locale-limiting methods and completing the entire internationalization can be performed in stages. For example, it functions as well as character encoding effort. For initial testing you can use pseudo- can start with supporting storage, retrieval issues. As simple as HTML (including JSP, localization. To do this, add new characters and processing of customer data for Unicode ASP, ASPX and so on) may be as a language, from your target locales to either side of a or ISO-Latin encoding. A second possibility it takes some sophisticated programmatic string, expanding the string as needed. This might be more completely internationalizing language to comb through it. C++ has hun- lets you make sure that your product sup- but limiting the team to addressing Western dreds of locale-limiting issues that are ports extended characters, resizing and European locale requirements. Others may highly dependent on the target encoding the like, without having to wait for localiza- require a full Unicode enablement to sup- and supported operating systems. Even tion testing or needing to have your tester port “double-byte” locales such as Japan, Java and C# don’t internationalize them- speak the target language. You’ll want to China and Korea. You can separate the ideal selves, though they were built to be con- use pseudo-localization for the interface, as from the practical if need be and consider siderably more internationalization friendly well as passing data and locale-formatted optimizing business decisions regarding than most other languages. You also have variables through your application’s data- budget, , long-time plans and to effectively distribute the knowledge of base and functions. Once you’ve received competitive market needs. internationalization complexity to your de- the translations from your localization com- Figuring out the scope of an internation- velopment team. Our team created a tool pany, you’ll need to perform linguistic test- alization effort, assigning resources and to help analyze source and cut time and ing as well. Expect that some translations may need to be adjusted — they may be technically correct but not the best choice given the specifics of your interface, prod- uct domain or word usage. Finally, you need to create a sustained plan for systematically auditing new code development, making sure that it doesn’t break new internationalization requirements over the years. Make sure you have strong documentation on your internationalization architecture and procedures. That way you have a legacy that can be clearly followed over the years. Through it all, I can’t overestimate the need to communicate. Most development efforts fail due to lack of clear requirements and ongoing communication. You’re going to have to blend internationalization objec- tives with new features. That means a clear development path, source control practic- es, testing processes, education, tools and cooperation among developers. You’ll have a whole world of new clients and worldwide stakeholders to support. And that funda- mentally changes a company with the op- portunity to further make it great. G

page 6 The Guide From MultiLingual

005-065-06 AsnesAsnes #87G.indd#87G.indd 6 44/5/07/5/07 77:41:46:41:46 AMAM INTERNATIONALIZATION

GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION Unicode 5.0 From 50,000 Feet

RICHARD GILLAM

y now, there’s probably no one This article can’t possibly cover all that, same standard for representing text, they reading this magazine who hasn’t so what we’ll try to do is take the prover- can pass text back and forth between each Bat least heard of Unicode. In its bial “50,000-foot view” of what Unicode other, and they’ll both be able to do things 15-year history, Unicode has become the is and what problems it solves. To go fur- with it properly. character encoding standard of choice in ther, there are several good “introduction The problem, of course, is that there new applications. It’s the default encod- to Unicode” resources and a useful “cheat are so many different standards. Most ing of HTML and XML; it’s the fundamental sheet,” and the standard itself is actually modern computing systems use the ASCII character type in programming languages quite accessibly written. or something based on it. ASCII was pub- such as Java, C# and Javascript; and it’s lished in the 1960s by what is now the the internal character encoding in the American National Standards Institute Windows and Macintosh operating sys- (ANSI) and uses the values from 32 to 126 tems. Virtually all Unix flavors include to represent the 26 uppercase and lower- support for it, too. Unicode is to comput- case letters of the English alphabet, the ing in the twenty-first century what the 10 digits, and various punctuation marks American Standard Code for Information and symbols. The values from 0 to 31 and Interchange (ASCII) was to computing in the value 127 were reserved for various the twentieth century. control signals, and byte values from 128 If you’re just getting into software in- to 255 weren’t used. ternationalization, Unicode is something However, ASCII only includes codes for you want to know about. It can make the letters in the English alphabet. Speak- your life much easier, but it’s important ers of other languages don’t have codes to keep in mind just what Unicode is and for the letters of their alphabets. Even other isn’t. Just what it means to say you’re Uni- languages that use the Latin alphabet, code-based or Unicode-compatible can such as French, are missing codes for the be rather squishy and is highly depen- accented versions of the letters that they dent on just what your application does. use. Since the byte values from 128 to 255 More importantly, it’s important to keep weren’t standardized by ASCII, computer in mind that supporting Unicode is nei- vendors, national governments and other ther necessary nor sufficient to writing bodies came up with other standards that an internationalized program. Unicode Character encoding standards used these code values for the letters of and internationalization are related, but Unicode is a character encoding stan- other alphabets. very different concepts. Unicode makes dard. Computers don’t have any innate Now there’s a plethora of character en- it easier to write internationalized pro- knowledge of text or characters or images coding standards out there, each of which grams, but you can write them without or sounds; all computers really understand defines code values for a single language using Unicode. And you can very easily are numbers. To represent text in soft- or a small group of related languages. write Unicode-based programs that still ware, you adopt a convention where each There are several problems with this: 1) aren’t internationalized. character you need to represent is given a The standards are mutually incompatible. Many articles in this guide will help you number. You decide that in your applica- While you can usually count on the value get up to speed on just what it means to tion, any time you see, say, the number 1 in 65 representing the capital letter A, the write an internationalized program. The a memory location you know is supposed value 215 can represent lots of different purpose of this article is to help you get up to hold text, you interpret it as the letter characters, depending on the encoding to speed on just what it means to support A. When you see the number 2, it’s B and standard. 2) Because most legacy encod- Unicode and which problems it does and so on. Sequences of these numbers repre- ing standards only encode a small num- doesn’t solve. sent sequences of characters. ber of characters for a small number of At first glance, Unicode can be quite Further, text is so common that rather languages, mixing languages in a single an intimidating beast. The latest version than having each developer adopt his or document frequently requires changing — Version 5.0 — sprawls across a 1,400- her own convention for representing text from one encoding standard to another in page book and a CD full of appendices, with numbers, the industry issues stan- the middle of the document, and there are character property databases, and other dards, official documents that define ofttimes no mechanisms in the software supplemental material and comprises conventions for assigning numbers to for doing that or for reliably interchanging nearly 100,000 character assignments. characters. If two applications follow the such documents with other applications.

April/May 2007 • www.multilingual.com/gsg page 7

007-097-09 UnicodeUnicode Gillam.inddGillam.indd 7 44/5/07/5/07 77:43:11:43:11 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

3) Often, encoded text travels across me- the character you need, the chances are dia without any external indication of the overwhelming that Unicode has it, and, encoding standard it follows. Software if it doesn’t, no other encoding stan- receiving a message of unknown encod- dard in reasonably wide use is going to ing has to guess or simply assume. Many have it either. This comprehensiveness times, the sending software intends for makes it possible to represent text in a particular numeric value to represent any Unicode-encoded language or com- ,/#!, some character, and the receiving soft- bination of languages without having to ).3)'(4 ware interprets it as something totally worry about specifying which character different, thus leading to garbage. If encoding standard your application or you’ve ever received an e-mail message document is following and without hav- ',/"!, with strange characters where you expect ing to be concerned about changing that dashes or quotation marks to be, you’ve encoding standard in the middle of your +./7,%$'% seen this problem in action. document or going without characters Unicode was designed to solve these because you can’t change encodings. problems. The idea was to use a larger More importantly, Unicode is unique ,/#!,):!4)/. data type than a byte for each charac- in approaching the business of assign- ter and then give every ing numbers to charac- #/.4%.4 character in every lan- ters with far more rigor #2%!4)/. guage its own unique than any other encoding numeric representation. standard has attempt- %80%24)3%30!..).' This means you can mix hatever the ed. The further away from s4ECHNOLOGY languages freely in a W the Latin alphabet you s-OBILE$EVICES document without the character you get, the less clear-cut us- s,IFE3CIENCES software having to worry ing numbers to represent s%LECTRONICS about mixed encodings, text becomes. In many s#ONSUMER2ETAIL need, the and you can send text writing systems, the let- &5.#4)/.!,#/6%2!'% from one system to an- chances are ters don’t march in a nice s$OCUMENTATION other without worrying orderly fashion from the s-ARKETING7EB about it getting mangled overwhelming left-hand side of the page s4RAINING on the other end — as to the right. In some, they s,EGAL(2 long as the sending and that go from right to left. In s0ATENTS receiving systems both some, they knot together support Unicode. Unicode has it. in complex ways. In some, 6ISITUSAT It should be clear that they’re adorned with vari- WWWLIONBRIDGECOM Unicode doesn’t solve all ous accent, tone or vowel your internationalization problems. You marks that attach to the letters in many dif- still have to translate the text. You still ferent places. Straightening this out into have to remember to call number-format- a one-dimensional sequence of numbers ting and date-formatting routines that is complex, and the right answer is often !NDINTRODUCING can produce different output for users of ambiguous. &REEWAY different languages. All Unicode does is It’s also not always clear just when two make it possible to represent text in many different squiggles are the same character /UR&REE WEB BASEDTRANSLATION different languages without having to keep and when they’re different. For example, DELIVERYPLATFORM track of the encoding or deal with data loss in many writing systems, the shape of a s)NSTANTCOLLABORATIONAMONG in interchange. letter can change dramatically depending CLIENTS 0-SANDTRANSLATORS on the letters around it, or two letters can What Unicode does merge into a totally different shape when s'REATER4-ANDTERMINOLOGY Unicode is unique among character en- they appear together. Different encoding LEVERAGEFORTHEENTERPRISE coding standards in the sheer number of decisions may need to be made for differ- s#ONNECTIVITYWITHLEADING characters to which it assigns numbers ent scripts, yet you still have to be able to #-3PRODUCTS — nearly 100,000 in the most recent ver- mix them in a document and have things sion. Those 100,000 character assign- work sensibly. There are many characters ments cover all of the characters in all of with similar appearances, leading to po- the writing systems for all the languages tential security issues that have to be ad- in common business use today, as well as dressed. You also can’t infer much about the characters needed for many minority a character from its position in the code languages and obsolete writing systems, space or its appearance in the code charts. WWWLIONBRIDGECOM and a whole host of mathematical, sci- There are too many characters for that, entific and technical symbols. Whatever with more being added all the time.

page 8 The Guide From MultiLingual

007-097-09 UnicodeUnicode Gillam.inddGillam.indd 8 44/5/07/5/07 77:43:33:43:33 AMAM INTERNATIONALIZATION GGETTINGETTING SSTARTEDTARTED:Guide

Because of these and many other is- standard. You can often stand on the characters ever encoded in a single stan- sues, the Unicode standard goes far shoulders of experts who have done dard. It’s the most comprehensive collec- beyond any other character encoding most of the heavy lifting for you. tion of rules, guidelines and best practices standard in describing just how those Tremendous blood, sweat and tears ever compiled for handling text in computer 100,000 character assignments get used have gone into those 100,000 character software. together to represent real text and how assignments and their accompanying You could write an internationalized ap- software should carry out various pro- rules, guidelines and property databases. plication without using Unicode, but why cesses on the characters. For example, Unicode is not just the largest collection of would you? G since you can’t infer things from a charac- ter’s position in the encoding space, the standard includes a very large database of character properties that lays out in tremendous detail the exact meaning of a character code: Is it a letter, a digit or a punctuation mark? If it’s a letter, is it uppercase or lowercase? Which charac- ter is its partner in the opposite case? If the character is a number, which numeric value does it represent? If it’s a diacriti- cal mark, how does it attach to its base character? Is the character part of a right- to-left writing system? Does it join cur- sively to other characters? What other characters, both in Unicode and in other standards, is it equivalent to? How does it sort when compared to other characters? And so on and so on. Unicode also includes many rules on how to do different things with encoded P 1.800.697.2062 text. For example, because there are more E [email protected] assigned character codes than can fit in a single 16-bit word, Unicode includes meth- www.basistech.com ods of representing text using sequences of 8-, 16- and 32-bit values. There are also I18n Assessments • Software Reengineering • International Quality Assurance Testing rules and guidelines not just for how to display a sequence of character codes, but Internationalization Success

for determining when two strings are the Internationalization Success Software oror ApplicationApplication same, locating line and word boundaries, looking toto gogo GlobalGlobal mapping strings to equivalent represen- tations in other encoding standards, per- forming regular-expression searches or language-sensitive sorts, using Unicode in programming-language identifiers and much, much more.

What Unicode means for text handling All of these things make it possible to handle more languages and to handle more languages well than any other character encoding standard. The vari- ous rules and guidelines in the standard help with many of the processes needed in writing internationalized text, thereby making them easier or more powerful. There are many comprehensive software- internationalization packages that use Internationalized Unicode as their base. Because of this, Software or Application it’s frequently possible to write interna- Software or Application tionalized text without having to know all the nitty-gritty details of the Unicode

April/May 2007 • www.multilingual.com/gsg page 9

007-097-09 UnicodeUnicode Gillam.inddGillam.indd 9 44/5/07/5/07 77:43:56:43:56 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED Pierre Cadieux: A Career in Internationalization

NANCY A. LOCKE

n the language industry, precise termi- university training included anything re- After completing his master’s degree in nology — as both a means and an end sembling internationalization, Cadieux 1982 and because he already had a solid I— ranks high on the list of priorities. laughs. “That would be a no,” he con- background and track record in scientific Ironically, the persistent ambiguity of the firms. He adds that twenty years later programming, Cadieux was hired by Mé- terminology that describes the processes he suspects that internationalization téo, French shorthand for the Meteorologi- used by the language industry also ranks still does not figure in his alma mater’s cal Service of Canada. Although at the time high as a challenge. Buzzilicious euphe- computer science curriculum. He also Météo boasted one of the first, fully-func- misms aside, the fuzziness of the terms suspects that the rea- tioning machine translation globalization, localization and internation- son the subject is not (MT) applications, Cadieux’s alization — perfectly serviceable words in taught is that it’s misun- work did not bring him into other contexts that were then co-opted derstood. “They prob- contact with that aspect of and re-purposed by the language indus- ably think it’s too easy, the important government try — can be downright annoying. In a too simple, not worth a agency. He did, however, get recent interview, Montréal-based inter- whole course.” to work on a supercomputer. nationalization consultant Pierre Cadieux Luckily, since its launch “I was one of the first, describes how he wound up in interna- in 2002, the Localization if not the first Canadian to tionalization, a field that by any measure Certificate Program of- actually develop software is relatively new and, despite increasing fered by the Faculty of on a Cray,” he says proud- awareness, not well understood. Not Continuing Education at ly. According to a Wikipe- surprisingly, terminology on many levels the Université de Mon- dia entry, in the 1970s and emerges as a core issue. tréal has recognized the 1980s “IBM Corp. and Cray importance of teaching Research competed to be INTERNATIONALIZATION Development of an internationalization. The Pierre Cadieux the maker of the fastest internationalization engineer program requires stu- computer on earth. Cray When people dream of being their dents to take an introductory course in won every time. . . .” The problem was own bosses, a dream that often includes the subject taught by Cadieux. Although that Météo’s Cray hadn’t been delivered ditching the commute, the cube farm and the course is designed as a multidisci- yet, so Cadieux worked remotely on a “casual Fridays” to work from the casual plinary program that accepts students Cray housed at the manufacturer’s site comfort of home, they may well dream with translation, computer science and in Minnesota — an arrangement that of what for Cadieux is a reality. When business orientations, Cadieux says that slowed things down some. “It was slow he is not traveling, he works from his an overwhelming majority of his students and painful,” he admits. “But I managed home on a quiet, tree-lined street within are translators. to port my software to Cray, test it and walking distance of the antique shops, it worked.” boutiques, restaurants and cafés in the The road to internationalization After six months and completing what city’s chic Plateau district. Red brick During his seven years as a student, he had set out to do at Météo, Cadieux walls, hardwood floors, cathedral ceil- Cadieux also worked a minimum of 20 went on to join former university col- ings, exposed beams, a fireplace and, in hours a week as a consulting program- leagues who had found jobs with Alis the courtyard, a hot tub create a warm mer, so making the transition from a stu- Technologies. Compared to the huge and living environment. The office, a large dent life to a professional career posed complex governmental environment, Alis room equipped with a battery of com- no problems. He says, for example, that was a small, young and dynamic company puters, multiple screens and cold steel as a student, “I was a subcontractor to a that embodied more closely Cadieux’s no- bookcases, is all business. teacher who got a contract on the space tions of an optimal work environment. Cadieux holds bachelor’s and mas- shuttle project trying to figure out the “My dream has always been to travel ter’s degrees in computer science, both motion equations for the Canadian arm. and work and be with a company where I from the Université de Montréal, in a And he figured it out.” could make a difference,” he says. concentration called Systems, which Cadieux also gained on-the-job train- As “employee number 8” at Alis, he describes as “compilers, operating ing in building teams to tackle larger Cadieux made a difference. His responsi- systems, basically hard-core program- projects. At one point, he employed four bilities included tasks well beyond pro- ming.” He spent eight years focusing people in order to fulfill a contract for gramming. “Needless to say, I designed, on assembly languages beginning with Farrés Mattar, an “amazing” physicist I developed, I documented, I supported, PASCAL and FORTRAN. When asked if his whose focus was optical lasers. I sold, I traveled.” At Alis, in addition to

page 10 The Guide From MultiLingual

110-120-12 CadieuxCadieux profileprofile Locke.indd10Locke.indd10 1010 44/5/07/5/07 77:44:56:44:56 AMAM INTERNATIONALIZATION GGETTINGETTING SSTARTEDTARTED:Guide

LinguaLinx - Your Link TripleInk: Translations to Internationalization™ Localization Is for Global Markets A cultural consulting and translation industry TripleInk is a multilingual communications agency leader, LinguaLinx leverages worldwide resources to More Than Translation... that provides industrial and consumer products provide accurate, cost-effective globalization solutions. companies with precise translation and multilingual Specializing in professional consultation covering all Global challenges require fl exible and professional production services for audio-visual, interactive and aspects of internationalization and multilingual service providers. Take advantage of our experience print media. From technical documentation to communications, LinguaLinx enterprise solutions and know-how and make your product a worldwide advertising, we offer complete, integrated range from glossary development to translation success! SAM Engineering was established in 1994 communication solutions. Employing a total quality memory deployment and global content management. and provides localization as well as translation and management process along with state-of-the-art Offering a comprehensive suite of multilingual engineering services to IT organizations and vertical technology resources, our knowledgeable project solutions in over 100 languages, the international industries through its network of translation partners, managers and international communication consultants LinguaLinx engages possess the diversity specializing in the translation of business applications professionals deliver the comprehensive services and expertise to assess the impact of concepts and and technical documentation. SAM Engineering GmbH you need to meet your global business objectives languages in diverse worldwide cultures. is located in Muehltal, near Frankfurt, . For more information, see www.sam-engineering.de — on target, on time and on budget. LinguaLinx Language TripleInk Solutions, Inc. SAM Engineering GmbH 60 South 6th Street, Suite 2600 650 Franklin Street, Suite 502 Kirchstrasse 1, D-64367 Muehltal, Germany Minneapolis, MN 55402 Schenectady, NY 12305 49-6151-9121-0 • Fax: 49-6151-9121-18 612-342-9800 • Fax: 612-342-9745 518-388-9000 • Fax: 518-388-0066 [email protected] [email protected] • www.tripleink.com [email protected] • www.lingualinx.com www.sam-engineering.de

realizing at least part of his career objec- as the chief architect of the medical developing software architecture is “es- tives, Cadieux also encountered interna- knowledge base for Purkinje, a developer tablishing some basic concepts and some tionalization for the first time. of electronic medical record systems, basic terms and defining them.” Founded by Bachir Halimi, in 1983 Alis where he first recognized the centrality While working at Purkinje, Cadieux had already created a niche for itself of terminology to GILT processes. When also grew to understand and appreciate in the globalization, internationaliza- asked the provenance of the name Pur- the essentially “organic” nature of human tion, localization and translation domain kinje, Cadieux waxes poetic. Jan Evange- language. Working closely with trans- — the “GILT space” in Cadieux’s parlance lista Purkinje (1787–1869) was a Czech lators has only increased his respect — by creating bidirectional, specifically doctor who discovered certain fibers for the complexities and difficulties in- Arab-language, products. One of the that occur in both the heart and the volved in translation and language-based company’s first notable accomplishments brain. “It’s such a beautiful name for a processes. was the “arabization” of Multiplan, one computer science company for that rea- of the first spreadsheet programs intro- son,” Cadieux says. Entering internationalization for real duced by Microsoft and a precursor of the At Purkinje, Cadieux says, the compa- After leaving Purkinje, Cadieux re- now ubiquitous Excel. The success of the ny’s “so-called medical knowledge base” worked his curriculum vitae and decided Multiplan project and the superior macro- was “really a terminology base, a massive to highlight his experience in internation- driven flexibility of the Planet technology amount of medical terminology, about alization. The rapid and enthusiastic re- developed by Cadieux led to Alis win- 300,000 terms that were structured in sponse to the service offering somewhat ning the contract for the arabization of such a way that a doctor could use a pen surprised him. In short order, he won a MS-DOS in 1987. Cadieux describes his computer and quickly enter clinical in- contract to work with Bowne Global So- brainchild as “a first attempt at a general- formation.” While a real departure from lutions (BGS). Unfortunately, the dot- purpose text-rendering” tool that went Alis, the job at Purkinje had some inter- com debacle, which had a devastating on to include multilingual keyboards and esting similarities including creating ba- impact across the board on the emerging menus and became a “general-purpose sic terminology. Programmers, Cadieux language industry, cut short the compa- library to create bidirectional and multi- says, work in one programming language ny’s plans to develop a production unit lingual devices.” while software architects, as the job title in Los Angeles headed up by Cadieux. In search of new challenges, Cadieux suggests, must lay the groundwork of a In November 2000, however, an inter- left Alis in 1995 and worked three years durable structure. An essential aspect of nationalization workshop in Montréal —

April/May 2007 • www.multilingual.com/gsg page 11

110-120-12 CadieuxCadieux profileprofile Locke.indd11Locke.indd11 1111 44/5/07/5/07 77:45:14:45:14 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

organized and animated by Cadieux and and an expertise in Asian languages. Living “a major success which I respect enor- promoted by BGS — created new possibili- in Québec means that the official French mously.” That said, he notes that work ties. The workshop attracted the attention <—> English language pair gets a fair remains to be done. of the Localisation Industry Standards As- amount of his attention. While Cadieux’s To start with, he explains that any effort sociation (LISA), which invited Cadieux to client list is heavy on software develop- to create a “universal” requires that a cer- present workshops at its events. Eventual- ers, his visibility at LISA, UNICODE and tain level of specificity be compromised. ly, Cadieux was also invited to be the tech- other industry conferences has also made And, logically, the burden of compromise nical editor of The Globalization Insider, an him the “go-to guy” when language ser- falls most heavily on the new languages online newsletter published by LISA. In col- vice providers are asked to provide inter- now being added. In addition, he says laboration with Bert Esselink, he developed nationalization services. that the emergence of multiple encodings and published a formula and a definition in such as UTF8, UTF12, UTF32 with BE and an effort to clarify two oft-used and fre- LE variants — and the inconsistent appli- quently misused terms: cation of existing rules — create incom- patibilities that will need to be addressed Formula: e explains and may require a significant revamping, Globalization = H if not tomorrow, at some point. Internationalization + N x Localization that any effort Aside from Unicode, Cadieux stresses (N is the number of targeted locales) the importance of m-computing (mo- to create a bile computing). Smaller devices mean Definition smaller display screens, which represent Internationalization of a thing con- “universal” a future challenge. Also, smaller mobile sists in any and all preparatory tasks devices tend to be geared towards more that will facilitate subsequent localiza- requires that a personal needs, which will require a more tion of said thing. significant internationalization effort. certain level Finally, Cadieux has noticed a signifi- Since his first workshop in 2000, work- cant if slow boom developing in the de- shops have become an important source of specificity be mand for internationalization services. of revenue for Cadieux’s company, i18n Slowly but surely, he says, companies inc., in addition to internationalization compromised. are waking up to the need for interna- audits and the development of software tionalization and the importance of ad- tools designed to support translation dressing it earlier in the development and localization professionals with an The future of internationalization cycle. Despite lingering confusion, the emphasis on redundancy analysis. As for future trends in international- upsurge in demand may mean that the Cadieux still takes a lively interest in the ization, Cadieux focuses on Unicode, terminology and processes employed by issues related to bidirectional languages, which forms “the foundation” of inter- the language industry are finally gaining and he has developed a fascination with nationalization. He describes Unicode as currency. G

page 12 The Guide From MultiLingual

110-120-12 CadieuxCadieux profileprofile Locke.indd12Locke.indd12 1212 44/5/07/5/07 77:45:48:45:48 AMAM INTERNATIONALIZATION

GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION

ista is Microsoft’s latest operating sys- tem for the desktop. Along with Vista’s New Internationalization Vnew appearance and features are additions to its supported locales, a newer Features of Microsoft Vista version (3.0) of .NET, and a fix to a relatively unknown but useful pair of international- BILL HALL ization APIs for C++ programmers. Left over from Windows XP is a promising but not yet complete set of routines providing interest- ing information about world regions. might want to keep in mind that in the fu- Lower Sorbian (Germany) — Spoken by ture, your organization may be required to a Slavic minority in Brandenburg by about New locales in Vista support some of these locales. 10,000 people. Sometimes called Wendish The locales supported on an operating or Lusatian. There is also a Lusatian com- system are important to companies that Minority languages in Europe munity in Texas. create software products since a wider au- Alsatian (France) — Low Alemannic Ger- Luxembourgish (Luxembourg) — Lux- dience becomes potentially available. Non- man language spoken in the Alsace region embourgish is a West Germanic language profit and charitable organizations are also of France. spoken in Luxembourg. It is one of three of- interested for a variety of reasons, including Breton (France) — Celtic language of ficial languages; the other two are French providing an opportunity to those otherwise Brittany with about 250,000 speakers. and German. About 300,000 people speak lacking the resources to participate in the Closely related to Cornish (revived in the Luxembourgish worldwide. world of computing. On Windows, the place twentieth century) and Welsh. Occitan (France) — Occitan is spoken in to search is the Regional and Language Op- Corsican (France) — Corsican is closely Occitania (Southern France and Monaco) tions Applet, where you can find a list of cur- related to Italian. The language is used at and in a few valleys of as well as in the rently supported locales. The Vista operating all levels of education in Corsica. Regional Aran Valley in Spain. Fewer than 500,000 system now provides 205, an increase of al- radio broadcast service available. Books proficient speakers live in France. most 30% over Windows XP. In addition, the and occasional newspaper articles pub- Romansh () — One of four locale naming system has been regularized lished in the language. national languages of Switzerland. The to the format found in .NET — a combination Frisian (Netherlands) — Second official number of speakers is about 50,000 to of language and region or language, region, language in The Netherlands. Genetically 70,000 in the canton of Grisons. It is and script. As a result, approximately 70 of most closely related to English: “Good but- the smallest of the official languages of the previous locales have new names. ter en green tzieze (cheese) is good English Switzerland. For example, a typical older form of a locale name in Windows XP is Afrikaans, Windows XP Vista Comments whereas the new one is Afrikaans (South Language only replaced by language Africa); in place of Serbian (Cyrillic) are both Afrikaans Afrikaans (South Africa) and region. Serbian (Cyrillic, Serbia) and Serbian (Latin, Serbia). In fact, the only remaining Windows Serbian (Cyrillic) Serbian (Cyrillic, Serbia) Region information is available. locale with a single, language-only name is Change in language name. “isi” is a Zulu isiZulu (South Africa) Persian, formerly Farsi. For the most part, prefix for language. a display name follows the Language (Re- The only locale with a language only gion) or Language (Script, Region) patterns Farsi Persian provided by the DisplayName property of name in Vista. the corresponding CultureInfo object in Spelling change of language. Both are Faeroese Faroese (Faroe Islands) Microsoft .NET. Although the new locale acceptable forms. names are certainly much more informative Norwegian Norwegian, Bokmål Language and dialect format. and consistent, the name changes may af- (Bokmål) (Norway) fect your previous globalization efforts if Sesotho sa Leboa Change in language name and a they served as an identifier for retrieval of Northern Sotho information or used for some other purpose (South Africa) reference to region. in your code. If so, you may want to review Table 1: A few of the differences between the old (Windows XP) and new (Vista) formats. your current strategy when you begin a Vis- ta project. Table 1 shows a few examples of en good Friese.” The modern versions of Swedish (Finland) — Wikipedia gives old and new formats. both languages have diverged over the sev- a count of about 300,000 speakers. The Of course, the other interesting items eral hundred years of separation. current Finnish alphabet recognizes the are the new locales. The list is fairly exten- Irish (Ireland) — Although a minority special characters required for Swedish. It sive (40+ new locales), so we have broken language today, Irish is recognized by the may seem strange to classify Swedish as a down the list into related groups. The in- constitution as the national and first offi- minority language but the concept is rela- formation is contained in a series of tables cial language of Ireland. It is also an offi- tive to the region where it is spoken. with some comments about the language cial language of the European Union and is Upper Sorbian (Germany) — Spoken and occasionally the region itself. You recognized in Northern Ireland. by a Slavic minority in Saxony of about

April/May 2007 • www.multilingual.com/gsg page 13

113-153-15 HallHall - texttext notnot tables.ind13tables.ind13 1313 44/5/07/5/07 77:46:34:46:34 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

40,000 people. Sometimes called Wendish or Lusatian. There is Turkmen (Turkmenistan) — Until 1991 a republic of the Soviet also a Lusatian community in Texas. Union. Turkmen is a member of the Turkic family of languages.

New in the Balkans New Southeast Asian locales Bosnian (Cyrillic, Bosnia and Herzegovina) — Windows XP sup- Filipino (Philippines) — Before Vista, the only locale available ported a Bosnian locale based on Latin script. Windows Vista has for the Philippines was English (Philippines). extended the script support to Cyrillic. Khmer (Cambodia) — Language of the Khmer people and the official language of Cambodia. The script is based on the Pallava New Russian locales script of India. The language is not tonal. Bashkir (Russia) — Spoken in the Republic of Bashkortostan Lao (Lao People’s Democratic Republic) — A tonal language of and other parts of the Russian federation. A Turkic language, it is the Tai family. Lao is based on the same script as Thai. A second currently written with Cyrillic characters. script, Tham, is also used and was derived from the one in Lan Na Yakut (Russia) — Sometimes known as Sakha. A North Turkic before the Thai script was standardized. language family including Shor, Tuvan, and Dolgan. Uses the Cyrillic script. New locales for Africa One has a name change; the others are new. Note also that the Additional locales in the Indian subcontinent Setswana locale in Vista was the Tswana locale in Windows XP. So, Generally, if you can handle Hindi, you should be able to man- Setswana (South Africa) is not a new locale. age these languages as well. Amharic (Ethiopia) — The official language of Ethiopia. Assamese (India) — Indian language spoken in the state of As- Hausa (Latin, Nigeria) — Hausa speakers are located in Niger sam (Northeast India) as well as Bangladesh and Bhutan. The lan- and the north of Nigeria. Hausa also acts as a lingua franca in guage is Indo-Aryan written with a version of Bengali. About 20 West Africa. million speakers. Igbo (Nigeria) — Spoken in Nigeria by about 18 million people. Bengali (Bangladesh) — An Indo-Aryan language with two liter- Dialects are many but generally mutually intelligible. Interest is ary styles (elegant and current), about 200 million speakers, and building in standardizing the written language. spoken in Bangladesh, West Bengal, and some western countries Kinyarwanda (Rwanda) — Main spoken language of Rwanda. such as the United States and United Kingdom. Also spoken in the eastern Congo and southern Uganda. Nepali (Nepal) — Indo-Aryan language spoken in Nepal, Bhutan Kiswahili (Kenya) — Bantu language spoken by the people of and some parts of India and Burma. eastern and central Africa. It is a national language in Kenya, Tan- Oriya (India) — An Eastern Indo-Aryan language spoken mainly zania, and Uganda. Also called Swahili. in the state of Orissa as well as other regions such as West Bengal Tamazight (Latin, Algeria) — Berber languages (Tamazight) and Jharkhand. About 31 million speakers. are spoken in Morocco and Algeria. Latin characters are the norm. Sinhala (Sri Lanka) — Sinhala is an Indo-Aryan language of Much discussion is taking place on standardizing the numerous about 12 million non-Tamil people of Sri Lanka. language variations. Wolof (Senegal) — Language spoken in Senegal, Gambia and Minority languages in the People’s Republic of China (PRC) Mauritania. Native language of the Wolof people. If you have been involved with GB 18030 certification, you will Yoruba (Nigeria) — Native tongue of the Yoruba. Spoken in Ni- certainly recognize these four languages. geria, Benin, and Togo and in communities of Brazil, Sierra Leone Mongolian (Traditional Mongolian, PRC) — Mongolian is writ- (called Oku), and Cuba (called Nago). ten in both Cyrillic and traditional scripts. Few speakers know the traditional script, but it is beginning to be taught once again in New English locales schools. In the inner Mongolian Autonomous Region of China, the An Indian film director once said that “English is just another Classical Mongol script is still used. Indian language.” The concept has apparently been extended in Tibetan (PRC) — The language itself is derived from the Brahmi the other two regions as well. script but the classification is as a member of the Sino-Tibetan lan- English (India) — Don’t expect to find the same formats for guages. The population is estimated to be about 7.3 million. number, currency, time and date (long and short)! Uighur (PRC) — Uighur is a language in the eastern branch of English (Malaysia) — There is a strong demand for English the Turkish group. The people in the PRC live in the western part of teachers in Malaysia. China. Estimated population is supposed to be over 8 million. English (Singapore) — A government campaign encourages the Yi (PRC) — Yi is a family of related Tibeto-Burman languages local population to speak good English rather than “Singlish.” spoken by the Yi people. It has about 6 million speakers. A useful addition to the Spanish locales New locales for Central Asia Spanish (United States) — A locale that has long been needed. Dari (Afghanistan) — Dari is a member of the Indo-Iranian group of languages, which includes Pashto as well. New indigenous languages of Central and South America Pashto (Afghanistan) — Together, Dari and Pashto make up the K’iche (Guatemala) — Part of the Mayan language family spo- two official languages of Afghanistan. ken by nearly a million people in the highlands of Guatemala. Has Tajik (Cyrillic, Tajikistan) — Indo-European language of the Ira- many dialects but most are mutually intelligible. nian group and the official language of the country. The language Mapudungun (Chile) — Spoken in central Chile and west central is normally written in Cyrillic, but other scripts are possible. Argentina by the Mapuche people. About 400,000 speakers.

page 14 The Guide From MultiLingual

113-153-15 HallHall - texttext notnot tables.ind14tables.ind14 1414 44/5/07/5/07 77:47:06:47:06 AMAM INTERNATIONALIZATION GGETTINGETTING SSTARTEDTARTED:Guide

Indigenous languages from Greenland and North America Greenlandic (Greenland) — Eskimo-Aleut language spoken in Greenland. Sometimes called the Eastern Es- kimo language. It is written using Latin letters, with a number of doubled vowels and consonants. Inuktitut (Latin, Canada) — The language is used in schools and local government to some degree and on ra- dio and television. It can be written using English letters. Inuktitut (Syllabics, Canada) — A parallel locale but using the Inuktitut syllabary. Mohawk (Mohawk) — Mohawk belongs to the Iro- quoian group of Native American languages. The script uses a basic collection of Latin letters. It has been taught Figure 1: Non-spacing characters in a string of Unicode elements. in schools since the early 1970s, and there has been a standard form of the written language since 1993. It is easy to understand the usefulness of CharNext and CharPrev if this is your only tool. Let’s create a string of letters Geographical information with the first having two non-spacing diacritic elements — in this GetGeoInfo is a promising Windows API that seems to be a work case a combining candrabindu followed by a combining minus in progress. Available in Windows XP, it appears not to have been sign below. In the demonstration program Figure 1, you can see upgraded — except that the documentation tells you explicitly the values as well as the composed form of the string. Also note that the official languages and time zone information is not yet that only three characters actually appear in the output string implemented. But even the information available is useful. Espe- even though the string length is five. The first character is clearly cially interesting was the hope of being able to read time zone and a grapheme cluster having three elements. The next two are sim- official language information around the globe. But just in case ply single characters (B and C). you are interested in what works, here are a couple of examples in Now let’s test CharNext using this string. In the program, you the current output. enter the Unicode code points as a series of 16-bit hex values as parameters. Thus, charw 0041 0310 0320 0042 0043 is the execu- GEO_NATION: 5 GEO_NATION: 20 tion line. The important output is shown in Figure 2. GEO_LATITUDE: 40.356 GEO_LATITUDE: 32.303 GEO_LONGITUDE: 47.869 GEO_LONGITUDE: -64.752 Result in XP Result in Vista GEO_ISO2: AZ GEO_ISO2: BM GEO_ISO3: AZE GEO_ISO3: BMU 0041 0310 1 0041 0042 3 GEO_RFC1766: fr-az GEO_RFC1766: fr-bm 0310 0320 1 0042 0043 1 GEO_LCID: 0000040C GEO_LCID: 0000040C 0320 0042 1 0043 0000 1 GEO_FRIENDLYNAME: GEO_FRIENDLYNAME: Azerbaijan Bermuda 0042 0043 1 GEO_OFFICIALNAME: GEO_OFFICIALNAME: 0043 0000 1 Republic of Azerbaijan Bermuda Figure 2: Results of running CharNext. GEO_TIMEZONES: 0 GEO_TIMEZONES: 0 GEO_OFFICIALLANGUAGES: 0 GEO_OFFICIALLANGUAGES: 0 In the XP case, the movement from beginning to end is by one Unicode element at a time, ending at the null character. In the Vista You have probably noticed the peculiar RFC-1766 identifiers as result, the first movement crosses the three Unicode elements well. The documentation says you can fix this, but it is tiresome “A,” the moon-dot, and the minus sign below that together make to determine exactly what they mean, especially when you have a up the first grapheme. The next two moves traverse “B” and “C” tight publishing deadline. and stop at the ending null character. Admittedly, the output is not particularly pretty. The important Finding grapheme boundaries without using .NET point is that in the Vista results, the program treated the first let- If you program in .NET, you know that there are a pair of classes, ter and its two diacritic marks together as a single grapheme hav- TextElementEnumerator and StringInfo, that work together to help ing three Unicode elements. So, the functionality of CharNext and you find grapheme boundaries in Unicode strings. Earlier versions CharPrev has been restored. G of Windows also had a pair of C++ API’s, CharNext and CharPrev, that provided some of this information, and if you are not using References .NET, this may be your only means (unless you fall back on ICU). www.microsoft.com/globaldev/vista/whats_new_vista.mspx With Windows XP, they were present but did not work although www.microsoft.com/globaldev/vista/vistahome.mspx they were functional in earlier versions of Windows. I don’t know when they became broken, but apparently I was one of the first to An appendix to this article, which is a table showing old and notice. I passed on the information to Mihai Nita at Adobe, who new forms of existing locales in Windows XP and Vista, is part of verified the problem and sent it on to Microsoft. While the APIs are the downloadable Getting Started Guide available online at www still not functional in XP, they are working reasonably in Vista. .multilingual.com/gsg

April/May 2007 • www.multilingual.com/gsg page 15

113-153-15 HallHall - texttext notnot tables.ind15tables.ind15 1515 44/5/07/5/07 77:47:47:47:47 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED

Old and new forms of existing locales in Windows XP and Vista

Windows XP Vista Windows XP Vista Afrikaans Afrikaans (South Africa) Kazakh Kazakh (Kazakhstan) Albanian Albanian (Albania) Konkani Konkani (India) Armenian Armenian (Armenia) Korean Korean (Korea) Azeri (Cyrillic) Azeri (Cyrillic, Azerbaijan) Kyrgyz Kyrgyz (Kyrgyzstan) Azeri (Latin) Azeri (Latin, Azerbaijan) Latvian Latvian (Latvia) Basque Basque (Basque) Lithuanian Lithuanian (Lithuania) Belarusian Belarusian (Belarus) Maltese Maltese (Malta) Bulgarian Bulgarian (Bulgaria) Maori Maori (New Zealand) Catalan Catalan (Catalan) Marathi Marathi (India) Croatian Croatian (Croatia) Mongolian (Cyrillic) Mongolian (Cyrillic, Mongolia) Croatian (Bosnia Croatian (Latin, Bosnia Norwegian (Bokmål) Norwegian, Bokmål (Norway) and Herzegovina) and Herzegovina) Norwegian (Nynorsk) Norwegian, Nynorsk (Norway) Czech Czech (Czech Republic) Polish Polish (Poland) Danish Danish (Denmark) Punjabi Punjabi (India) Divehi Divehi (Maldives) Romanian Romanian (Romania) English (Philippines) English (Republic of the Philippines) Russian Russian (Russia) English (Trinidad) English (Trinidad and Tobago) Sanskrit Sanskrit (India) Estonian Estonian (Estonia) Serbian (Cyrillic) Serbian (Cyrillic, Serbia) Farsi Persian Serbian (Latin) Serbian (Latin, Serbia) Faeroese Faroese (Faroe Islands) Slovak Slovak (Slovakia) Finnish Finnish (Finland) Slovenian Slovenian (Slovenia) Macedonian (Former Yugoslav FYRO Macedonian Republic of Macedonia) Swedish Swedish (Sweden) Galician Galician (Galician) Syriac Syriac (Syria) Georgian Georgian (Georgia) Tamil Tamil (India) Greek Greek (Greece) Tatar Tatar (Russia) Gujarati Gujarati (India) Telegu Telegu (India) Hebrew Hebrew (Israel) Thai Thai (Thailand) Hindi Hindi (India) Turkish Turkish () Hungarian Hungarian (Hungary) Ukrainian Ukrainian (Ukraine) Icelandic Icelandic (Iceland) Urdu Urdu (Islamic Republic of Pakistan) Indonesian Indonesian (Indonesia) Uzbek (Cyrillic) Uzbek (Cyrillic, Uzbekistan) Xhosa isiXhosa (South Africa) Uzbek (Latin) Uzbek (Latin, Uzbekistan) Zulu isiZulu (South Africa) Vietnamese Vietnamese (Vietnam) Japanese Japanese (Japan) Welsh Welsh (United Kingdom) Kannada Kannada (India) Northern Sotho Sesotho sa Leboa (South Africa)

page 16 The Guide From MultiLingual

113-153-15 HallHall - texttext notnot tables.ind16tables.ind16 1616 44/9/07/9/07 66:52:32:52:32 AMAM Code Got You Stranded?

nd more! ocale-limiting issues a ded strings, over 10K l e: over 40K embed llion Lines of Cod Typically found in 1 mi Lingoport is your passport for Internationalization Get the benefits of our Globalyzer Enterprise Software combined with WorldReady development services

Enterprise tools for internationalizing large Our teams of developers get your products ready applications over multiple technologies. for anywhere - on time, on budget, on target. · Find and fix internationalization issues fast · Internationalization experts, responsive engineering · Save over 50% on engineering time and budget · Let your team stay focused on new features · Free Trial at www.lingoport.com/passport · Get it done

Complete Globalization solutions from code to market. Find out how we can help you achieve your release objectives. Visit lingoport.com/passport or email us at [email protected].

+1 303 444 8020 · lingoport.com/passport

1166 LingoPortLingoPort ad.inddad.indd 1616 44/9/07/9/07 66:53:03:53:03 AMAM INTERNATIONALIZATION GGETTINGETTING SSTARTEDTARTED:Guide An invitation to subscribe to

his guide is a component of the magazine MultiLingual, for- for conducting fully international e-commerce, you’ll benefit from merly MultiLingual Computing & Technology. With a new look the information and ideas in each issue of MultiLingual. Tand a new sense of purpose, MultiLingual continues to lead the world in keeping track and informing its readers of the latest in Managing content the electronic universe. How do you track all the words and the changes that occur in In addition to the coverage that it provided before, the new mag- a multilingual website? How do you know who’s doing what and azine provides more insights from industry leaders, an improved where? How do you respond to customers and vendors in a prompt news section and expanded calendar, as well as basic industry ter- manner and in their own languages? The growing and changing minology and references. field of content management and global management systems MultiLingual’s eight issues a year are filled with news, technical (CMS and GMS), customer relations management (CRM) and other developments and language information for people who are inter- management disciplines is increasingly important as systems ested in the role of language, technology and translation in our become more complex. Leaders in the development of these sys- twenty-first-century world. A ninth issue, the Resource Directory tems explain how they work and how they work together. and Index, provides listings of companies in the language industry and a key to the previous year’s content. Internationalization Four issues each year include Getting Started Guides like this Making software ready for the international market requires one, which are primers for moving into new territories both geo- more than just a good idea. How does an international developer graphically and professionally. prepare a product for multiple locales? Will the pictures and col- The magazine itself covers a multitude of topics. ors you select for a user interface in France be suitable for users in Brazil? Elements such as date and currency formats sound like Translation simple components, but developers who ignore the many inter- How are translation tools changing the art and science of com- national variants find that their products may be unusable. You’ll municating ideas and information between speakers of different find sound ideas and practical help in every issue. languages? Translators are vital to the development of interna- tional and localized software. Those who specialize in technical Localization documents, such as manuals for computer hardware and soft- How can you make your product look and feel as if it were built in ware, industrial equipment and medical products, use sophisti- another country for users of that language and culture? How do you cated tools along with professional expertise to translate complex choose a localization service vendor? Developers and localizers text clearly and precisely. Translators and people who use transla- offer their ideas and relate their experiences with practical advice tion services track new developments through articles and news that will save you time and money in your localization projects. items in MultiLingual. And there’s much more Language technology Authors with in-depth knowledge summarize changes in the From multiple keyboard layouts and input methods to Unicode- language industry and explain its financial side, describe the chal- enabled operating systems, language-specific encodings, systems lenges of computing in various languages, explain and update that recognize your handwriting or your speech in any language encoding schemes and evaluate software and systems. Other — language technology is changing day by day. And this technol- articles focus on particular countries or regions; translation and ogy is also changing the way in which people communicate on a localization training programs; the uses of language technology in personal level — changing the requirements for international soft- specific industries — a wide array of current topics from the world ware and changing how business is done all over the world. of multilingual computing. MultiLingual is your source for the best information and insight MultiLingual is a critical business asset in our electronic world. into these developments and how they will affect you and your Readers of MultiLingual explore language technology and its appli- business. cations, project management, basic elements and advanced ideas with the people who are building the future. G Global web Every website is a global website, and even a site designed for one country may require several languages to be effective. Experienced web professionals explain how to create a site that works for users Subscribe to MultiLingual at everywhere, how to attract those users to your site and how to keep www.multilingual.com/subscribe the site current. Whether you use the internet and worldwide web for e-mail, for purchasing services, for promoting your business or

April/May 2007 • www.multilingual.com/gsg page 18

1177 SSubOfferubOffer pagepage #87G.indd#87G.indd 1717 44/10/07/10/07 1:35:311:35:31 PMPM