INTERNATIONALIZATION April/May 2007 GGETTINGETTING SSTARTEDTARTED: Guide
Think Internationalization ® in Everyday Design
Change Your Encoding, ® Change Your Company
Unicode 5.0 ® From 50,000 Feet
Pierre Cadieux: A Career ® in Internationalization
New Internationalization ® Features of Microsoft Vista
0011 CoverCover #87#87 Intrntztn.inddIntrntztn.indd 1 44/5/07/5/07 77:27:31:27:31 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED
Getting Started: In fi ve years, how much has your life changed? Since Internationalization we published our fi rst Getting Started Guide for inter- nationalization in 2002, changes in internationalization Editor-in-Chief, Publisher Donna Parrish have been coming at the speed of the internet. Unicode has proceeded from version 3.0 to Managing Editor Laurel Wagers Translation Department Editor Jim Healey 5.o. The .NET platform has grown exponentially and moved into version 2.0. Perhaps half a Copy Editor Cecilia Spence billion more people have internet access as China and other countries get online. News Kendra Gray But some things haven’t changed. If you want to sell products and services in other cultures, Illustrator Doug Jones these questions still apply. Is it a good product? How adaptable is it for other languages and Production Sandy Compton cultures? Can you easily change the language and script of the display or written text? Can you Editorial Board change culture-specifi c graphics to adapt to meaningful images for other cultures? Can your Jeff Allen, Julieta Coirini, product handle other number, currency, date, time, address and telephone number formats? Bill Hall, Aki Ito, Nancy A. Locke, Ultan Ó Broin, Angelika Zerfaß In this Guide, we update your knowledge of internationalization — questions to ask, points Advertising Director Jennifer Del Carlo toCONTENTS keep in mind, pitfalls to watch for, checklists for action and what resources are available. Advertising Kevin Watson, Bonnie Merrell And brace yourself for the changes coming! Webmaster Aric Spence The Editors Assistant Shannon Abromeit Intern Callie Welch Think Internationalization in Everyday Design Special Projects Terri Jadick Alan Horvath Advertising: [email protected] page 3 www.multilingual.com/advertising Alan Horvath, managing director of STAR Group America, LLC, has a B.S. 208-263-8178 in business administration with a major in computer and information science. Subscriptions, customer service, back issues: Change Your Encoding, Change Your Company [email protected] page 5 Adam Asnes www.multilingual.com/subscribe Adam Asnes is founder of Lingoport, which develops Globalyzer Submissions: [email protected] software and provides internationalization development services. Editorial guidelines are available at Unicode 5.0 From 50,000 Feet www.multilingual.com/editorialWriter Reprints: [email protected] Richard Gillam page 7 This guide is published as a supplement to Richard Gillam is author of Unicode Demystifi ed: A Practical Programmer’s Guide MultiLingual, the magazine about language to the Encoding Standard. He works on the Global Name Recognition team at IBM. technology, localization, web globalization and Pierre Cadieux: A Career in Internationalization international software development. It may be downloaded at www.multilingual.com/gsg page 10 Nancy A. Locke Nancy A. Locke is a freelance translator, localization educator, multilingual desktop publishing specialist and a member of the MultiLingual editorial board. New Internationalization Features of Microsoft Vista page 13 Bill Hall Bill Hall is an internationalization consultant, author of Globalization Handbook for the Microsoft .NET Platform and a member of the MultiLingual editorial board.
page 2 The Guide From MultiLingual
0022 TOCTOC ##87G.indd87G.indd 2 44/5/07/5/07 111:49:081:49:08 AMAM INTERNATIONALIZATION
GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION Think Internationalization in Everyday Design
ALAN HORVATH
ntering the global marketplace can The first approach involves making sepa- 4) Concatenation. String concatenation be a daunting task for any company. rate copies of the software and replacing is another source of problems for localiza- EIt can be expensive and time consum- the terms and symbols with ones appro- tion. Word orders and plural forms might ing, so proper planning and execution are priate for each country. be different in other languages. Adjectives critically important. It is essential to under- But what happens if a bug is reported? might not agree in gender, number and stand the complete globalization process Now you have multiple versions of code to case, thus creating catastrophic results. and this important fact — internationalize correct! And if you decide to add a new fea- 5) Ambiguity. Keep translators in mind first, then localize! ture to your product, you’ll have to make when writing the resource files. Add com- This article describes some of the most changes to all versions. ments when a sentence could be ambigu- common challenges within the internation- For companies that intend to enter mul- ous. It will improve the quality as well as alization process and how you can avoid tiple markets, this method is out of the reduce the amount of time needed for time-consuming errors. question. It underestimates the time need- translation. ed to modify files that were never meant to 6) Expansion. Keep in mind that trans- A matter of words be localized. lated strings may expand compared to the So many words are used to describe the And there are other issues to consider. source-language string. This will be par- actual process of preparing products for Does your code support Unicode so that ticularly important with dialog boxes and foreign markets that it is easy to become you can handle a variety of scripts as eas- menus. Leave at least 20% to 30% expan- confused. Here are a few definitions to ily as your native language? Does your sion room. help clarify the process. code correctly search and sort characters 7) Design. Be careful when you create Globalization (g11n) includes a compa- in all of the languages you expect to use? icons and bitmaps. You should avoid text ny’s decision to enter foreign markets, in- These issues and many others must be in either because the translated words ternationalize its products and localize its addressed before entering international might not fit, and the cost of conversion products. markets. can be expensive. Avoid any symbols with Internationalization (i18n) is the process The second and preferred approach cultural connotations. They might be inde- of designing a product such that it can han- consists of internationalizing your soft- cipherable or offensive in other countries. dle multiple languages, cultural conven- ware for the global market first and then If required, make sure your product runs tions and local infrastructures without the proceeding with a more streamlined local- on different platforms — PC, Mac, Linux need for re-design. ization process. and so on. Localization (l10n) is the process of Therefore, before you begin translation 8) Terminology. Check for terminology physically, culturally and/or linguistically of your product’s interface, text and files, consistency. If you are not consistent in adapting a product for a target locale. you need to ensure that the product is in- your software, the rest of the package will Translation is the process of converting ternationally ready. be even more inconsistent. Terminology words from one language to the other. An management tools will help keep terminol- experienced translator will be able to con- Ten tips for software internationalization ogy consistent. vey the technical details accurately while Software is a component that requires 9) Locale testing. Check your code in the instilling native nuance and style in the a great deal of attention. Once you have destination market to ensure that all locale translated text. made the decision to go global, every de- issues are handled correctly. sign decision — either with the code or the 10) Translation kit. Once your product is Software internationalization user interface — will be affected. ready for translation, you should create a Let’s say you’ve written a software appli- 1) Use Unicode functions and methods. translation kit. This set of files should con- cation for sale in the United States. Through- 2) Third-party tools. Choose your tools tain everything needed to translate and out your code, you’ve hard-coded terms carefully. Some tools, APIs and add-ins recompile the language resources as well such as dollars and cents. You’ve also hard- might not support Unicode. If you must use as test the application. coded symbols such as $ and used a period them, use the right character conversions. You should also consider legal issues (.) for the decimal point and a comma (,) to 3) User interface separation. Isolate that may arise as you enter different mar- separate numbers into groups of three. Now your translatable resources. Hard-coded kets. For example, there have been in- your company wants to market your soft- strings are very tricky to translate, and, stances where software companies had ware abroad, but the terms and symbols you since the code is constantly changing, they their terms and agreements embedded in used are inappropriate for other currencies. cannot be translated in parallel with the the product, but some of the terms were There are two ways to prepare to enter software code development. Maintain one determined to be illegal in different juris- foreign markets with your software product. core code base for all of the languages. dictions. In many cases, the company’s
April/May 2007 • www.multilingual.com/gsg page 3
003-043-04 Horvath.inddHorvath.indd 3 44/5/07/5/07 77:40:05:40:05 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED
attorneys were required to define new In addition, if you plan on targeting Asian Electronic vs. printed publishing. The clauses for certain parts of their contracts. countries, you might have trouble finding obvious reason for electronic publishing Therefore, you should consider the legal a vendor with in-house expertise using the is cost. It is much cheaper to create, pack- circumstances in all target markets before desktop publishing tools you selected due age and ship a CD/DVD than a box of 15 beginning the translation process. to the high price of the Asian software ver- manuals. The electronic book can always Also note, software containing encryp- sions and the software’s complexity. be printed if needed. Nevertheless, parts of tion technology can be subject to export Nevertheless, if you decide to use these your document set, such as a “Setup Guide” restriction, while communication software applications, you should keep the follow- or an “Administrator’s Guide,” should still may be subject to telecommunication reg- ing guidelines in mind: be provided in print format. ulation in the target country. • Avoid creating unlinked text boxes One-to-one page correspondence. This throughout the document. Some files that is an option that is rarely used but might Multibyte character support we have processed in the past contained make sense for your organization. Its main Delivering translated products to Asian hundreds of unconnected stories, and pro- objective is to streamline the support op- countries for the first time can be an excit- cessing the files with some translation tools eration. The example used is that some- ing and challenging time for organizations, becomes very cumbersome. Instead, flow one in a foreign country has a problem but getting it wrong can be very costly. your text from one box to the other. When with your application. First-tier support English software typically uses about the file is translated, the expanded text will in that country cannot solve the problem 100 different characters to represent words move from one box to the other automati- and needs assistance from the next level. and numbers. Asian languages, on the oth- cally, even from one page to the other. If you maintain a one-to-one page cor- er hand, can use more than 10,000 symbols • Leave plenty of white space (20% to respondence, everyone can be looking to display messages. 30%) in the pages to allow for expansion. at the solution on the same page of your To facilitate this, software systems used It is a good rule for all types of documen- document. This approach also allows you to use what is termed multibyte or double- tation, but particularly important with to create standardized packaging mate- byte character systems (DBCS) to store text/ marketing material because the number of rials since all your manuals will have the characters. Today, Unicode has replaced pages for such documents is usually fixed. same dimensions. This option, however, these systems, but be aware that occasion- • Be careful when selecting screen- requires more planning and may require ally you may need to provide multibyte char- shots for your marketing material. The you to sacrifice aesthetics. acter support. component that you choose might be the An alternate approach to document A common error in software development last piece translated, and the localized internationalization. Consider using an is to use third-party software — software screenshot might not be available at the information management system as an libraries, DLL, OCX and so on — that are not time you go to print. alternative to the traditional approach Unicode enabled. Sometimes the develop- • Make sure that any in-line graph- to document creation and management. ment teams do not realize this until they be- ics are anchored to the surrounding text In these systems, information can be en- gin the translation process. As a result, the so that they can move when a paragraph tered simultaneously from any number of product then has to be re-engineered to fix goes, for example, from one page to an- locations in any number of languages. In- the issues at additional expense. other. If you are designing long technical formation is entered and stored in a pure documents, you have more choices. You and intelligent way, with no regard for Documentation internationalization can still use applications such as InDesign final layout. The information exists only Documentation internationalization does or QuarkXPress, but it is not recommend- once and can be used to produce any num- not require as much planning as does soft- ed because they lack the large-document ber of publications in virtually any form ware internationalization. Some simple handling features of applications such as — such as Help, HTML, DOC, XML — and guidelines can help you prepare docu- FrameMaker or Word for Windows. in any language. By the very nature of this ments that can be easily localized. Graphics. Remember that the outsourc- approach, information that is created and Desktop publishing software. With so ing of graphics creation and translation is managed in this way is already “interna- many choices available, it can be difficult expensive, particularly in Asian languag- tionalized.” The result is an overall reduc- to choose the appropriate desktop pub- es. However, careful planning and clever tion in translation/localization costs with lishing application. Your choice will be af- design of your graphics can eliminate a big a simultaneous increase in quality and fected by the type of documents you are part of your graphic localization costs. Re- consistency. creating, the languages that you are tar- moving the text from the actual graphics geting and whether or not translation tools can reduce your costs by more than 95%. Summary will be used. You will also have to decide Remove all localizable callout text from Product internationalization is the most if the documents will be printed, electroni- graphics and include it in the documen- important step in the globalization pro- cally published or both. tation’s text so that it can be added to a cess. Products must be designed to handle If you are designing marketing docu- translation memory for re-use. Replace the multiple languages, cultural conventions ments, applications such as FrameMaker, callout text with numbered (not lettered) and local infrastructures without the need InDesign and QuarkXPress produce good callouts, arranged clockwise on graphics, for re-design. Incorporating these guide- results. The major drawback is that the pro- and cross-reference the numbers to the lines into your internationalization strat- cess needed to translate these documents text into the main document. This allows egy will save you time and money and go is more complex. Leading TM tools do an automatic re-use of the same graphics in a long way in streamlining an otherwise excellent job of handling these file types. all localized versions. complex process. G
page 4 The Guide From MultiLingual
003-043-04 Horvath.inddHorvath.indd 4 44/5/07/5/07 77:40:29:40:29 AMAM INTERNATIONALIZATION
GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION
t’s a mark of greatness when a company can effectively develop products and Change Your Encoding, Icompete worldwide. Yet software inter- nationalization is often one of life’s pain- Change Your Company ful forgotten labors that suddenly grabs intense and panicked attention as it leaps ADAM ASNES out and grinds globalization plans to a halt. You’d think that enabling technology so that it’s easily leveraged for any market oppor- tunity would be a pretty glamorous and ex- 1. Somebody sold something — there • What is the result for your company’s citing pursuit. With rare exception, the first, has been some new marketing partner- equity value by stretching into new mar- second or twentieth-plus time a company ship or a new powerful customer oppor- kets effectively? does this is still a painful effort that holds tunity that requires multi-locale support. And, of course, to get an effort funded, back global top-line revenue opportunities. A classic example is that the company these bottom-line business issues need to Of course, it doesn’t have to be this way gains a business contract that will neces- be answered: — but when you look at the nature of how sitate supporting Japanese or another • What will it cost? software is actually developed and comes language. In some cases we’ve seen new • How long will it take? to market, unless internationalization is a license deals for entire countries, such as • Who is going to do the work? very firm requirement at the project’s out- in health care or education. It’s a big hurry • Do we have to give up other feature set, it shouldn’t be a surprise that it gets up to meet the customer demands. requests to prioritize internationalization? overlooked until it’s an ugly problem. This 2. Localization is realized as a competi- • How can we improve the process? is not one of those “if only everyone always tive necessity. Perhaps the company has • What expensive surprises do we need internationalized” diatribes. I hope to de- already invested in global sales efforts and to watch for? scribe the business issues around interna- finds growth is limited given a poor compet- • How do we maintain the internation- tionalization, including the fundamentals itive position without internationalization. alized product going forward? of what it does for a company, the competi- 3. A global company has just purchased Learn your CFO’s language. He or she will tive implications, funding the effort, and another company or intellectual property want to understand the return on investment managing and maintaining global market and wants to make the new product use- and may consider amortizing the effort as a requirements. ful for its worldwide sales efforts and capital expense. The decision isn’t about the One point I want to get beyond quickly is product line. technical issues of bits and bytes. the belief that you can just force your trans- 4. The CEO is mandating a new global Globalization is never just about one lations without internationalizing software initiative. This is an important new step for customer, sale or “language.” It’s a new first. I get asked about this a few times per the company’s evolution. You can’t go to a engineering and company process that month, often by managers who are even in management conference these days with- opens opportunities. the localization business. Incidentally, de- out hearing about globalizing revenue op- velopers never ask this. portunities and for good reason. Development issues In the case of some limited products, it At Lingoport, we’ve picked up the pieces may be possible not to internationalize, but Top-line and bottom-line considerations enough to see that internationalization it’s a bad idea anyway. You risk having a Software internationalization can have projects have been typically frustratingly product that doesn’t work or works poorly. dramatic effects on top-line revenues late, as in quarters to years, and rife with In the best scenario, your software can’t and revenue plans as well as on bottom- cost overruns. be leveraged across markets or even main- line costs and profitability. It’s never just Managing an internationalization effort, tained from release to release. Not interna- about minimizing a cost. You have to look especially for the first time, can be challeng- tionalizing is like throwing lots of money and at the whole picture to calculate return ing for a development team. Given the top- resources away for an inferior result which on investment and in terms of long-term line sales and marketing objectives, there’s has no future. For complex applications, it’s changes in process. typically a shortage of time. Compounding simply not going to work. Developers nearly When valuing any internationalization this is that understanding the scope of re- always accept this, but it’s an abstract con- effort, give attention to top-line business quirements and detail of tasks for an inter- cept which management can have trouble questions such as these: nationalization effort is not an obvious thing understanding. Internationalization, when • How much does your company have for your development team. It’s tempting for done well, allows you to support any locale riding on success in its target markets? many teams to just start out looking at em- requirement quickly. You have one product • What are the revenue projections over bedded strings as the most obvious prob- to support over time that’s good for the one, two and more years? lem. While they are important and can be whole world. Your translations are easily • What is the top-line cost of not hav- tedious, there’s much more involved. The is- updated from release to release. ing a product ready for a specific market sue of who does the work isn’t always obvi- opportunity? ous or easy. Chances are your development Business issues • What is the impact to your company’s team members aren’t sitting on their hands Several common events push a company strategic partners or sales force if a prod- looking for something to do. You’ll need to to expand its product development to in- uct doesn’t work well or isn’t ready for a balance inter nationalization demands with clude locale supporting requirements. particular market? new feature development, too.
April/May 2007 • www.multilingual.com/gsg page 5
005-065-06 AsnesAsnes #87G.indd#87G.indd 5 44/5/07/5/07 77:41:24:41:24 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED
Building requirements starts with identi- planning expenses can be challenging. expenses. We now offer it as a standalone fying target locale requirements. The obvi- Architectural changes, third-party prod- product and adapt it for scalable use among ous issue is language, but there’s more to it uct issues such as graphics and reporting large development teams. regarding culturally sensitive formatting of tools, installers, databases and more must Any tool won’t help you find what’s not issues such as dates, times, numerical val- be accounted for. You have to find and fix in your code. You still have to be savvy ues, addresses and more. Changes to your internationalization issues buried within with your architecture, with a long-term database tend to have far-reaching effects your hundreds of thousands to millions of eye towards your product life cycle. into your application. Changes to program- lines of code. Without a strong detection, ming logic or the graphical user interface extraction and refactoring tool, this alone Localization, testing and beyond further complicate things. As you might has the potential of being an error-prone Unless your company is only internation- imagine, clearly built requirements will be and time-consuming iterative process. Em- alizing to support managing multi-locale an important pivot for all your efforts. Make bedded strings must be quickly and easily customer data but not localizing the data- sure that whoever is working on this isn’t distinguished and filtered from program- base, you’ll likely be interested in when the doing it for the first or second time. There matic elements such as debug statements role of localization comes in. It’s quite rea- are so many pitfalls that even globaliza- or SQL queries. Externalization should be sonable to dovetail string extraction efforts tion architects who have led similar efforts automated to avoid further human error with your localization vendor so that your many times are still learning. potential. Every programming language localized releases aren’t dependent on first For some companies, internationalization has its unique locale-limiting methods and completing the entire internationalization can be performed in stages. For example, it functions as well as character encoding effort. For initial testing you can use pseudo- can start with supporting storage, retrieval issues. As simple as HTML (including JSP, localization. To do this, add new characters and processing of customer data for Unicode ASP, ASPX and so on) may be as a language, from your target locales to either side of a or ISO-Latin encoding. A second possibility it takes some sophisticated programmatic string, expanding the string as needed. This might be more completely internationalizing language to comb through it. C++ has hun- lets you make sure that your product sup- but limiting the team to addressing Western dreds of locale-limiting issues that are ports extended characters, resizing and European locale requirements. Others may highly dependent on the target encoding the like, without having to wait for localiza- require a full Unicode enablement to sup- and supported operating systems. Even tion testing or needing to have your tester port “double-byte” locales such as Japan, Java and C# don’t internationalize them- speak the target language. You’ll want to China and Korea. You can separate the ideal selves, though they were built to be con- use pseudo-localization for the interface, as from the practical if need be and consider siderably more internationalization friendly well as passing data and locale-formatted optimizing business decisions regarding than most other languages. You also have variables through your application’s data- budget, technologies, long-time plans and to effectively distribute the knowledge of base and functions. Once you’ve received competitive market needs. internationalization complexity to your de- the translations from your localization com- Figuring out the scope of an internation- velopment team. Our team created a tool pany, you’ll need to perform linguistic test- alization effort, assigning resources and to help analyze source and cut time and ing as well. Expect that some translations may need to be adjusted — they may be technically correct but not the best choice given the specifics of your interface, prod- uct domain or word usage. Finally, you need to create a sustained plan for systematically auditing new code development, making sure that it doesn’t break new internationalization requirements over the years. Make sure you have strong documentation on your internationalization architecture and procedures. That way you have a legacy that can be clearly followed over the years. Through it all, I can’t overestimate the need to communicate. Most development efforts fail due to lack of clear requirements and ongoing communication. You’re going to have to blend internationalization objec- tives with new features. That means a clear development path, source control practic- es, testing processes, education, tools and cooperation among developers. You’ll have a whole world of new clients and worldwide stakeholders to support. And that funda- mentally changes a company with the op- portunity to further make it great. G
page 6 The Guide From MultiLingual
005-065-06 AsnesAsnes #87G.indd#87G.indd 6 44/5/07/5/07 77:41:46:41:46 AMAM INTERNATIONALIZATION
GGETTINGETTING SSTARTEDTARTED:GuideINTERNATIONALIZATION Unicode 5.0 From 50,000 Feet
RICHARD GILLAM
y now, there’s probably no one This article can’t possibly cover all that, same standard for representing text, they reading this magazine who hasn’t so what we’ll try to do is take the prover- can pass text back and forth between each Bat least heard of Unicode. In its bial “50,000-foot view” of what Unicode other, and they’ll both be able to do things 15-year history, Unicode has become the is and what problems it solves. To go fur- with it properly. character encoding standard of choice in ther, there are several good “introduction The problem, of course, is that there new applications. It’s the default encod- to Unicode” resources and a useful “cheat are so many different standards. Most ing of HTML and XML; it’s the fundamental sheet,” and the standard itself is actually modern computing systems use the ASCII character type in programming languages quite accessibly written. or something based on it. ASCII was pub- such as Java, C# and Javascript; and it’s lished in the 1960s by what is now the the internal character encoding in the American National Standards Institute Windows and Macintosh operating sys- (ANSI) and uses the values from 32 to 126 tems. Virtually all Unix flavors include to represent the 26 uppercase and lower- support for it, too. Unicode is to comput- case letters of the English alphabet, the ing in the twenty-first century what the 10 digits, and various punctuation marks American Standard Code for Information and symbols. The values from 0 to 31 and Interchange (ASCII) was to computing in the value 127 were reserved for various the twentieth century. control signals, and byte values from 128 If you’re just getting into software in- to 255 weren’t used. ternationalization, Unicode is something However, ASCII only includes codes for you want to know about. It can make the letters in the English alphabet. Speak- your life much easier, but it’s important ers of other languages don’t have codes to keep in mind just what Unicode is and for the letters of their alphabets. Even other isn’t. Just what it means to say you’re Uni- languages that use the Latin alphabet, code-based or Unicode-compatible can such as French, are missing codes for the be rather squishy and is highly depen- accented versions of the letters that they dent on just what your application does. use. Since the byte values from 128 to 255 More importantly, it’s important to keep weren’t standardized by ASCII, computer in mind that supporting Unicode is nei- vendors, national governments and other ther necessary nor sufficient to writing bodies came up with other standards that an internationalized program. Unicode Character encoding standards used these code values for the letters of and internationalization are related, but Unicode is a character encoding stan- other alphabets. very different concepts. Unicode makes dard. Computers don’t have any innate Now there’s a plethora of character en- it easier to write internationalized pro- knowledge of text or characters or images coding standards out there, each of which grams, but you can write them without or sounds; all computers really understand defines code values for a single language using Unicode. And you can very easily are numbers. To represent text in soft- or a small group of related languages. write Unicode-based programs that still ware, you adopt a convention where each There are several problems with this: 1) aren’t internationalized. character you need to represent is given a The standards are mutually incompatible. Many articles in this guide will help you number. You decide that in your applica- While you can usually count on the value get up to speed on just what it means to tion, any time you see, say, the number 1 in 65 representing the capital letter A, the write an internationalized program. The a memory location you know is supposed value 215 can represent lots of different purpose of this article is to help you get up to hold text, you interpret it as the letter characters, depending on the encoding to speed on just what it means to support A. When you see the number 2, it’s B and standard. 2) Because most legacy encod- Unicode and which problems it does and so on. Sequences of these numbers repre- ing standards only encode a small num- doesn’t solve. sent sequences of characters. ber of characters for a small number of At first glance, Unicode can be quite Further, text is so common that rather languages, mixing languages in a single an intimidating beast. The latest version than having each developer adopt his or document frequently requires changing — Version 5.0 — sprawls across a 1,400- her own convention for representing text from one encoding standard to another in page book and a CD full of appendices, with numbers, the industry issues stan- the middle of the document, and there are character property databases, and other dards, official documents that define ofttimes no mechanisms in the software supplemental material and comprises conventions for assigning numbers to for doing that or for reliably interchanging nearly 100,000 character assignments. characters. If two applications follow the such documents with other applications.
April/May 2007 • www.multilingual.com/gsg page 7
007-097-09 UnicodeUnicode Gillam.inddGillam.indd 7 44/5/07/5/07 77:43:11:43:11 AMAM INTERNATIONALIZATION Guide: GGETTINGETTING SSTARTEDTARTED
3) Often, encoded text travels across me- the character you need, the chances are dia without any external indication of the overwhelming that Unicode has it, and, encoding standard it follows. Software if it doesn’t, no other encoding stan- receiving a message of unknown encod- dard in reasonably wide use is going to ing has to guess or simply assume. Many have it either. This comprehensiveness times, the sending software intends for makes it possible to represent text in a particular numeric value to represent any Unicode-encoded language or com- ,/#!, some character, and the receiving soft- bination of languages without having to ).3)'(4 ware interprets it as something totally worry about specifying which character different, thus leading to garbage. If encoding standard your application or you’ve ever received an e-mail message document is following and without hav- ',/"!, with strange characters where you expect ing to be concerned about changing that dashes or quotation marks to be, you’ve encoding standard in the middle of your +./7,%$'% seen this problem in action. document or going without characters Unicode was designed to solve these because you can’t change encodings. problems. The idea was to use a larger More importantly, Unicode is unique ,/#!,):!4)/. data type than a byte for each charac- in approaching the business of assign- ter and then give every ing numbers to charac- #/.4%.4 character in every lan- ters with far more rigor #2%!4)/. guage its own unique than any other encoding numeric representation. standard has attempt- %80%24)3% 30!..).' This means you can mix hatever the ed. The further away from s 4ECHNOLOGY languages freely in a W the Latin alphabet you s -OBILE $EVICES document without the character you get, the less clear-cut us- s ,IFE 3CIENCES software having to worry ing numbers to represent s %LECTRONICS about mixed encodings, text becomes. In many s #ONSUMER 2ETAIL need, the and you can send text writing systems, the let- &5.#4)/.!, #/6%2!'% from one system to an- chances are ters don’t march in a nice s $OCUMENTATION other without worrying orderly fashion from the s -ARKETING 7EB about it getting mangled overwhelming left-hand side of the page s 4RAINING on the other end — as to the right. In some, they s ,EGAL (2 long as the sending and that go from right to left. In s 0ATENTS receiving systems both some, they knot together support Unicode. Unicode has it. in complex ways. In some, 6ISIT US AT It should be clear that they’re adorned with vari- WWWLIONBRIDGECOM Unicode doesn’t solve all ous accent, tone or vowel your internationalization problems. You marks that attach to the letters in many dif- still have to translate the text. You still ferent places. Straightening this out into have to remember to call number-format- a one-dimensional sequence of numbers ting and date-formatting routines that is complex, and the right answer is often !ND INTRODUCING can produce different output for users of ambiguous. &REEWAY