Internationalization 360˚ Testing

MahipalsinnhMahipalsinh Rana Rana Member of Technical Staff Sun Microsystems

1

Agenda

z Introduction of I18n - 40 Minutes

z Internationalization(I18n) 360˚ testing - 60 Minutes

z Testing Standalone Applications – 15 minutes

z Quiz – 10 minutes

z Testing Web Applications – 15 minutes

z Quiz – 10 minutes

z I18n testing Automation – 30 minutes

z Advanced I18n testing , References – 15 minutes

z Q/A - 15 minutes Introduction

z Understanding of Internationalization (I18n)

z Why I18n testing

z Myths for I18n testing

z Scope of I18n testing

z Terminologies in i18n technology

z Character set/Character repertoire

z Character Code/Code Point,Coded Character

z Encoding , , UTF-8 ,UTF-16 ,UTF-32

z , Fonts , Input Method Engine (IME)˫

z Locale

“Everyone has the right... to seek, receive and impart information and ideas through any media regardless of frontiers” -- Universal Declaration of Human Rights Why Globalization

Why Globalization

Sun Portal server in Chinese Why Globalization

Yahoo.com in Kannada

Why Globalization

z “Visitors linger twice as long as they do at English-only URL's. z Business users are 3 times more likely to buy when addressed in their language. z Customer service costs drop when instructions are displayed in the user's native language." 'Strategies for Global Sites' Donald DePalma Forrester Research Inc. Why Globalization

"One large IT company discovered that a significant percentage of inquiries were coming from South Korea - they created a Korean website and revenues rose by 8 percent." 'Global eCommerce' Donald J. Plumley Bowne Global Solutions

What's with the acronyms?

Internationalization ====> i18n , How? There are 18 characters between i and n With that logic : Localization ====>L10n Globalization ====> G11n Translation ===> T9n and you can call me M5l ==> Mahipal , Don't they all look the same?

z Localization

z Internationalization

z Globalization

z Translation

How do they differ and relate?

z Globalization encompasses i18n and l10n.

z InternationalizationAn enables localization.

z An expert in i18N may not be an expert in l10N. LISA* Definitions

z Globalization-(G11n)˫

z “Globalization addresses the business issues associated with taking a product global. In the globalization of high-tech products this involves integrating localization throughout a company, after proper internationalization and product design, as well as marketing, sales, and support in the world market.”

z Internationalization-(I18n)˫

z “Internationalization is the process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re- design. Internationalization takes place at the level of program design and document development.”

z Localization-(L10n)˫

z “Localization involves taking a product and making it linguistically and culturally appropriate to the target locale (country/region and language) where it will be used and sold.”

*Localization Industry Standards Association

Why I18n testing ?˫

z I18n testing is required for enable product localization in multiple languages.

z Removing barriers to localization

z Enabling Unicode

z Independence from UI strings in code

z Handling legacy character encodings.

z Separating localizable elements from source.

z Enabling code to support local,regional, language, or culturally related preferences. Myths for I18n Testing

z Misunderstood as translation testing

z Only language expert can perform i18n testing

z Done after product released

z Misunderstood with product localization

z It is only about String messages

Terminologies of I18n

z What is Character set/Character repertoire?

z What is Character Code/Code Point,Coded Character?

z What is Unicode?

z What is meant by Encoding?

z UTF-8,UTF-16,UTF-32

z What is Glyph?

z What is Font?

z What is Input Method Engine(IME) ?

z What is Locale? What is Character, Character Set ?

z A character is just an abstract minimal unit of text. It doesn't have a fixed shape (that would be a glyph), and it doesn't have a value. z "A" is a character, and so is "$", the symbol for the currency. z Character set/repertoire is a collection of characters. z Examples

Œ ȡ _Œ ȯ –€Ȫ ˜Ǖ ™ȡ ”Ȣ –“ȡ š¡ ¡ ! Making the World Wide Web world wide! ࡢ࡯࡞࠼࡮ࡢࠗ࠼࡮࠙ࠚ࠶ࡉࠍ਎⇇ਛߦᐢߍ߹ߒࠂ ߁ Œ ȡ _Œ ȯ –›ȡ _ ™ȡ ˜Ȱ ™ȡ ”Ȣ –“ȡ `“ȯ ! 놹ꫭ陹넍낉麑꿵넩麑낮냱ꈑꎁ麙韥 !

What is Character Code/Code Point, Coded Character Set ?

z Character Code - A mapping, which defines a one- to-one correspondence between characters in a character repertoire and a set of non-negative integers. Examples of character codes:

z ASCII, ISO Latin 1 alias ISO 8859-1, ISO 10646, the Windows character set exists in different variations,or "code pages" (CP)- Windows code page 1252 etc

z A Character Code point is unique non-negative integer assigned to character in character code

z A coded character set is a character set where each character has been assigned to a unique code point What is Character Code/Code Point, Coded Character Set ?

Image Source :

z ASCII character set , one of early character set

What is Character Code/Code Point, Coded Character Set ?

Image Source:

Ex. ASCII Character set

z 8 bit character set , cover most of character needed by Europeans but What about east part of the world? Unicode

z Answer is

z It has characters from almost every written script in this world z European alphabetic scripts

z Latin,Greek,Cyrillic,Armenian,Georgian,Runic,Ogham,Modifier letters z Middle East Scripts

z Hebrew,Arabic,Syriac,Thaana z South & South East Asian scripts

z ,Bengali,Gujurati,Panjabi,Oriya,Tamil,Telugu,Kannada,Mala yalam z East Asian scripts

z Han,Hiragana,Katakana,Hangul,Bopomofo,Yi z Symbols Ex. ASCII Character set z Currency symbols,Letter like symbols,Mathematic operators,Numeric forms,Technical symbols,Geometrical symbols z Additional scripts

z Ethiopic Cherokee Canadian Aboriginal Syllabics Mongolian

What is Character Encoding ?

z A mapping from a set of non-negative integers that are elements of a Coded Character Set, to a set of sequences of particular code units of some specified width, such as 8- bit/16- bit/32-bit integers

z The most commonly used code units are bytes, but 16-bit or 32- bit integers can also be used for internal processing.

z Examples are UTF-8,UTF-16,UTF-32 UTF-8 , UTF-16 , UTF-32

zUTF-32 simply represents each Unicode code point as the 32- bit integer of the same value.

zUTF-16 uses sequences of one or two unsigned 16-bit code units to encode Unicode code points. [Values U+0000 to U+FFFF are encoded in one 16-bit unit with the same value. Supplementary characters are encoded in two code units]

zUTF-8 uses sequences of one to four bytes to encode Unicode code points. [U+0000 to U+007F are encoded in one byte, U+0080 to U+07FF in two bytes, U+0800 to U+FFFF in three bytes, and U+10000 to U+10FFFF in four bytes.]

Relation between Character set and Encoding Characters A ʠ ᅢ

Code Point 41 5D0 597D

UTF-8 41 D7 90 E5 A5 BD UTF-16 00 41 05 D0 59 7D UTF-32 00 00 00 41 00 00 05 D0 00 00 59 7D

z Different encodings yield different byte sequences for same character in Character set Unicode Character set, code set, encodings

Universal Unicode UTF Character Code encodings set/repertoire Points

All Character UTF-8, set will be Each Unicode UTF-16, a subset of character is UTF-32 this huge assigned a are the character Unicode Code encoding repertoire. point .Range ASCII formats set,French, is U+0000 to for Japanese, U+10FFFF. internal Korean, processi Devanagari ng

What is Glyph?

z A glyph - a visual appearance

z It is important to distinguish the character concept from the glyph concept. A glyph is a presentation of a particular shape which a character may have when rendered or displayed.

z Example: a letter and different for it:latin capital letter z (U+00E9)˫ Z Z Z Z ٥Q + ‘ + Ȣ + “+ Ǔ + ¡ Ǔ¡‘Ȣ What is Font?

z A repertoire of glyphs comprises a font

z A font is a numbered set of glyphs.

z The numbers correspond to code positions of the characters (presented by the glyphs).

z Font including characters for a language should be available for an application to display text for the language

What is Input Method Engine(IME)˫

z Input methods capture a sequence of keystrokes and form a character or characters as input for languages

z Input Method Engine (IME) is a program or component that allows computer users to enter complex characters and symbols using a standard Western keyboard. It is also referred as Input Method Environment. What is Locale?

z Locale is a set of parameters that defines the user's language, country and any special variant references that the user wants to see in their user interface.

z The locale naming convention is usually: language[_territory][.encoding][@modifier].

z Example for Hindi with UTF8 encoding : hi_IN.UTF8

z Encoding [ Native encoding (iso8859-*, Shift_JIS,GB18030, BIG5, ISO2022) , Unicode encoding (UTF-8, UTF-16, ) ]

What is Locale?

z Behavior affected by Locale

z Language culture data z Sorting, searching, text boundary, text conversion z Indexing z Country culture data z Calendar, date/time/number/currency format z People name/mailing address layout I18n 360˚ Testing Approach

z What is Traditional Approach

z What is 360˚ Approach

z Case Study

z Requirement Phase

z Design Phase

z Implementation Phase

z QA Phase

z Documentation

Traditional Approach of I18n Testing

z Generally start after build released by development team.

z In some case starts even after product release as they release separate international release

z Major Focus on functionality testing

z I18n testing done on following

z Messages

z Date/Calender

z Sorting/Searching Traditional Approach of I18n Testing

z Major Architectural flaws related to i18n get caught quite late

z Support for adding new language not given

z Does not consider global cultural requirements

z All the issues are reported in QA phase which takes longer time to fix

z Documentation does not care about usage in non-english environment

What is 360˚ Approach

z Start as early as Product planning

z I18n has role to play in each phase of Product life cycle

z No corner untouched

z Helps to design and build better products for Global Customers Case Study – Railway Reservation System

z Usecases

z Search Trains/Fare

z Make Reservation

z Know Passenger status

z Know Train Schedule

z User management

z We will use this case study to illustrate key points in this workshop

Requirements phase

z What is Global market requirements?

z Languages ,Regions to be supported

z What Date format will be supported?

z What Calender will be supported?

z Gregorian , Vikram samvat , Lunar etc

z What Cultural requirements to be taken care of? Case Study - Requirements phase

z Who are the customers of this website?

z What languages will our target customers use?

z Which payment methods should be made available?

z Should we display information visually (ex. seat availability) or textually?

z What kind of Internet access/computers will our target customers use? how will that effect l10n?

z Should email confirmations/alerts be sent using local language?

Design phase

z What approach will be used to support multiple languages?

z Browser based and/or Command line based

z List of languages on website

z User interface design

z How I18n of UI Messages will be done?

z How I18n of UI components

z Button , Dropdown box size to accommodate multibyte values

z Review of images for cultural sensitivity

z ex. A sentence "Every days" contains variable part which will be a input from text field. So, while design engineer should externalize the whole string as a single string. Not as 3 strings and concatenate programmatically. Design phase

z I18n compliant Product architecture

z Consideration of Encoding , Charset , Bi-di

z Standard I18n mechanism or customized i18n solution for each technology used in Product

z ex. Java I18n , JSP I18n , AJAX I18n , Jruby I18n

z How Locale fall-back will be handled?

z How I18n of Date,Calender,Sorting and Searching techniques will be done?

z How I18n of Error Messages , System error messages will be done?

z How I18n of Log message will be done?

z Input/Output should handle multibyte characters

z Which features does not need I18n?

Case Study - Design phase

z How can customer change language?

z How can messages on website will be visible in local languages of customer?

z How different encoding will be handled?

z How UI components on website handle messages in different language?

z How user can register in local languages?

z How I18n of various technologies done? Implementation phase – Interaction with Development Team

z Setting up common convention

z Naming convention of localizable files

z Directory to store localizable files

z How to specify non localizable text in property file or html file

z Educating developer about i18n best practices

z Most of technology has standard way of doing i18n

z Defining customized i18n solution for technologies which does not have standard i18n solution

Implementation phase – Code review

z Best way to find early and most common i18n issues

z Should be done to catch following

z Messages externalization

z Date , Calender I18n

z Encoding handling , HTTP content header

z Searching technique i18n

z Sorting technique i18n

z Input field should have clear hints of which character are allowed

z I18n implementation should be common across modules for same technology

z ex. Java I18n should be done in same way across modules Implementation phase – Unit testing

z Incorporate i18n in developer level i18n testing

z lets developers see for themselves if they broke i18n

z helps prevent regression

z improves product quality tremendously

Case Study – Implementation phase

z Find out any hard coded messages in code

z Check for encoding in html or jsp page

z Verify how date are being displayed

z Check out button,Dropdown size , is it sufficient for localized characters

z Include i18n testcases in developer testing QA phase – I18n Test case writing

z I18n test plan

z Which build to start i18n testing

z How much testing required

z Which area to focus more for i18n and which are for less

z I18n test cases writing and review

z Review base team testcases for functionality coverage

z Testcases should capture flow of mutlibyte data in product

z Testcases should cover culture specific issues

z Date format change in various languages

z Include negative testcases for i18n

z Fields which does not accept multibyte data

QA phase – Configuration Matrix

z Configuration matrix for i18n testing

z Which Locale to be tested

z Which Encoding to be tested

z Which Platforms to be tested

z Install OS with l10n support

z Which features to be tested

z Hint : test features which base testing team has already tested QA phase – Cultural Differences

z Language ,Cultural specific representation of data

z ex. name and address formats are specific to language ˜¡Ǔ ”ȡ › Ǔ Ȳ ¡šȡ Žȡ “ ˜Ǔ €Q šȪ Q ™Q Q ȯ ˜Q H˜ȡ ›ȡ , ‘Ǔ Q ™žQ šȢ ȯ ˜Q –šQ ¡Ȫ €Q €Q ™˜Ȱ ‘ȡ “ €ȯ –ȡ ‡Ǖ ˜ȯ Ȳ –Ȳ ‚›Ȫ š – GHBBDG

Format Examples town, province postalcode China, India USA, Canada, town province postalcode Australia postalcode town-province Brazil postalcode town, province México

QA phase – Cultural Differences

z Symbolism can differ from place to place. For example the check mark means incorrect in some places around the world. Ensure that you do not give the wrong message through your use of colors,symbolism, examples, etc.

z Be cautious with humour It doesn't travel well.

z When dealing with graphics, consider how to deal with text. Ideally the text will be overlaid on a graphic, rather than embedded in it. If the text is within the graphic, try to ensure that you develop it in layers, with text on a separate layer, so that when it comes to translation the text can be easily removed and replaced over complicated backgrounds.

Fast relief, when you need it most!

Image Source :

z Examples used in text are understandable by the audience of the translated version. QA phase – Cultural Differences

z Color also has different connotations in different parts of the world.

Image Source :

z For example, a black wedding kimono is not as strange in Japan as it may seem to a European.

QA phase – Culture Differences

z Culture specific order

Image Source : QA phase – Human Interface

z Input

z Entering data in different languages – Is one keystroke equal to one character for non- English languages?

z Application should parse input multibyte data and process accordingly

z Operating system allows to enter data in various languages

z Application can also provide inbuilt feature. Ex. Orkut

QA phase – Human Interface

z Output

z Displaying data in different languages - what you enter, what stores in memory & what gets displayed – Is this all one-to-one mapping?

z It becomes complex and includes many-to-one mapping Text Rendering, Reordering, Layout of strings becomes complex

z One character will not be equal to one glyph

z Example: Languages like Hindi which have Complex Text Layout(CTL), which can use a number of glyphs to form a single character QA phase – Text Processing

z What are the considerations when you have to process Text which are in different languages ?

z Text Boundary - Character/Word/Sentence/Line Boundary

z Chinese and Japanese do not have space between words

z CTL character may contain multiple code points (glyphs)˫

z Text Input/Output, Encoding Conversion

z Text transferred between applications or external files should have consistent encoding, else encoding conversion is involved

z Text Layout and Direction, Vertical and BiDi

z Some Asian countries still use vertical

z Arabic and Hebrew use Bi-Direction writing system

z Text Sorting and Searching

QA phase – Presentation Matters

z Vertical characters should be correctly displayed for based on languages

z text proceeds downwards syllable by syllable, not letter by letter.

Image Source : QA phase – Presentation Matters

z Right to left layout

z BBC site in Left to Right and Right to Left language.

QA phase – Format z Formatting of Data is different when dealing with different languages / regions z Date/Time formats, Calendar

z Date/time formats are different across languages and countries

z Some countries use local calendar as their official calendar z Number/Currency format

z Show number in the format of the language user prefers

z Such number should be parsed by number parser for the user preferred language QA phase – Message

z What are the considerations when dealing with messages in your application?

z Externalizing UI messages, Error messages from program to resource files for localization

z Categorise static content like (help files / docs ) to languages specific directory

z Message Formatting

z When message contains more than one place holders, you need to consider that the translated messages may re-order these place holders

z Message Encoding

z Messages should be encoded in the encoding that the application expects

QA phase – Pseudo localization

z Can be used when product is yet to be localized

z Create localized resource bundle by adding localized character at beginning and end of each English messages

z Effective way of finding hard coded strings

z ex. English Resource bundle ex. English resource bundle MyMessages.properties welcome=Welcome to I18n World startProcess=Start the process

Create resource bundle for Hindi as follow MyMessages_hi.properties welcome=˜Welcome to I18n World˜ startProcess=˜Start the process Case Study - QA phase

z Access the website in non english language

z Do registration as non english user

z Book a ticket for non english passenger

z Verify site able to display non english characters correctly

z Ensure website provide correct responses with non english inputs

z Check whether website comes in user language

Documentation

z How to install product in non-English environment

z How to configure features in non-English environment

z How to add new language to Product

z Verify I18n specific hints and processes documented correctly

z Hints to translator regarding culture specific images in documentation

z Case Study Testing Standalone applications

z Setting of localized environment

z Operating system with l10n support

z Starting product in non-English environment

z Language selection

z Applications testing with multibyte data

z Quiz

Testing Web applications

z Setting of localized environment

z Operating system with l10n support

z Starting product in non-English environment

z Browser preferred language

z Content negotiation

z Presidency of language (user preferred locale, browser preferred locale, platform locale)˫

z Application Testing with multibyte data

z Quiz Automation Testing

z Automation framework

z Automation tool should support multibyte data

z Leverage from core testing team

z Scope of testing to be automated

z Regression testing

z Demo

Advanced I18n testing

z Speech based

z Higher recognition accuracy can be obtained by tailoring voice input to regional dialects

z Voice output in the wrong dialect can make an application sound ‘foreign’

z Applications supported with regional dialects have better impact

z Indic , Bi-Di specific issues

z Titles and Names

z Different ways of expressing currency

z Presentation / Styling issues

z Calenders - Vikram Samvat/ Saka / Hijri/Islamic Advanced I18n testing – International Domain Name (IDN)˫

z Lot of demand for not ASCII domain names http://räksmörgås.josefsson.org/mål/franzén.html domain name path

New standards have come out of the IETF recently that make this possible. The W3C personnel contributed to the development of these standards. There are still some hurdles to overcome with regard to security and deployment, but it is possible to use these now. For more information see http://www.w3.org/International/articles/idn-and-iri/ .

References

z W3C Internationalization :http://www.w3.org/International/

z Sun Software Globalization : http://developers.sun.com/techtopics/global/

z Software Globalization - Architecture, Design,Testing : http://developers.sun.com/techtopics/global/technology/arch/

z Software Globalization- JES : http://developers.sun.com/techtopics/global/products_platforms/jes/

z Sun Software Product Internationalization Taxonomy : http://developers.sun.com/dev/gadc/des_dev/i18ntaxonomy

z Subscribe to Software globalization NewsLetter : http://developers.sun.com/dev/gadc/subscribe/index.html

z Technical articles on Java Internationalization : http://java.sun.com/developer/technicalArticles/Intl/

z Java Internationalization Tutorial : http://java.sun.com/docs/books/tutorial/i18n/index.html

z The Java Tutorial's Weblog: http://blogs.sun.com/thejavatutorials/ Last but not the least!

“Maintain that rapport with your Development team.”

Q/A Thank you Mahipalsinh Rana [email protected]