Pango: Internationalized Text Handling

Pango: Internationalized Text Handling

Pango: internationalized text handling Owen Taylor Red Hat, Inc. [email protected], http://people.redhat.com/otaylor Abstract he said: GO TO 123 MAPLE STREET Text in memory he said: TEERTS ELPAM 123 OT OG Text as displayed 1: left-to-right 2: right-to-left Pango is a library for laying out and displaying 3: left-to-right internationalized text. It handles almost every writing system in the world, and can work on top Figure 1: Transformations while displaying of multiple different display systems - including complex-text languages traditional X fonts, or client-side OpenType fonts. Pango is used for all text handling in the soon- to-be-released version 2.0 of the commonly used GTK+ widget toolkit. paper will concentrate on the last aspect, rendering. Rendering internationalized text is often assumed to be simply a matter of fonts. All languages are 1 Introduction thought to be like English where it is simply a matter of picking the right symbols from the font and displaying them in the order they occur in the Although technically inclined users have histor- memory representation. In this view, international- ically been willing to learn English in order to ized rendering is a simple matter. All you need is a free software, in order for free software to gain character set that includes all the characters in the an acceptance in the wider community of users, world's languages (Unicode [Unicode] fits this role) it must able to both display a user interface in and a font for that character set. Of course, things the user's native language and allow the user to are not that simple. manipulate text in their native language. First, a number of languages, notable Arabic and A number of separate areas must be addressed Hebrew are written from right-to-left, instead of when making a program suitable for handling from left-to-right, so the rendering process needs to text in multiple languages (a process known as be able to deal with that ordering. In fact, text in internationalization. First, the user must be able these languages usually consists of a mix of right- to input text in their native language | this to-left text and left-to-right text (numbers, foreign may simply be a matter of changing the keyboard words.) So, a complicated reordering process is mapping, but it may also be a more complicated needed between the in-memory representation process involving dictionary lookup by separate and the actual drawing process. Figure 3 shows programs known as input methods. Then, the a schematic representation of the reordering process. program must preserve the structure of the text during any manipulation it does of the text. Some common assumptions, such as assuming each Arabic also introduces some other complications. character is one byte may mangle a stream of in- The shape of each character is different depending ternationalized characters. The program must pick on whether it occurs at the beginning of a word, appropriate translations of any messages it displays in the middle of a word, at the end of a word, or to the user, and finally, it must be able to render by itself. So the right glyph needs to be selected in strings in the user's native language correctly. This each context. Cluster Formation Reordering Application Toolkit Pango Pango Core T + RA + I TRA + I TRI Arabic X Shaper PS X Shaper Figure 2: Transformations while displaying Language Module X rendering backend PS rendering backend complex-text languages Xlib X Server Printer Another group of languages that needs special attention are the languages of South Asia, often Figure 3: Architecture of Pango known as complex text languages. In these lan- guages, the characters making up a syllable interact in complex ways to produce the final rendered system where all these details, and many more are form. This can involve reordering, combining properly handled for each language, what we need characters to make ligatures that appear very to do is move to a higher level of abstraction. We different from the original character, and stacking need a system where the application programmer multiple glyphs on top of each other vertically. simply presents the system with a chunk of text, A group of interacting characters in one of these and all the details of wrapping lines, laying out the languages is known as a cluster. (See Figure 2.) text, choosing glyphs and rendering is handled for the programmer. The Pango library has designed for this purpose; it encapsulates all the necessary Algorithms such as line-breaking also need detailed knowledge about various languages and scripts and knowledge of a language. Although for western presents the application programmer with generic languages, a simple job of line-breaking can be model of lines and paragraphs. done by breaking on white space, other languages, such as the languages of East Asia or Thai, are written without any white space at all, so linguistic information is needed. For Thai, correct line breaking actually needs to be done by first splitting 2 Architecture the text into words using a dictionary. There are several principles that underly the design So, we see that to properly render many of the of Pango. The first one is that Unicode is used as world's languages, we have to be able to deal with a a common character set throughout the system. rendering process which is considerably more com- Although, as mentioned above, supporting Unicode plex than just slapping down glyphs in the order is not enough to handle internationalized rendering, that characters appear in memory. It is interesting by standardizing on Unicode, and on UTF-8 as also to note that many of the same issues that the encoding of Unicode, there is no need for appear above, such as ligatures, alternate glyph the application, the toolkit, Pango, and Pango's selection, and repositioning of characters (kerning) language-specific modules to negotiate the encoding are also important when doing a high-quality job of to be used. displaying English or Western-European languages. So, if we can properly handle these issues for internationalization, we gain, as a side benefit, a The second principle of Pango is modularity. The much higher level of typographic sophistication. code specific to each language is contained in a separate, dynamically loaded module. This has several benefits. First, it reduces the amount of Clearly, you don't want to be handling all of these code that is contained in the main library. Second, details in your application. So, to be able to have a it allows modules for specific languages to be developed and distributed by teams familiar with used to convert characters to glyphs. Until we those languages, instead of tying the development have done this conversion, we cannot compute of support for a particular language to the release the size on the screen of the text we need to cycle of the core system. draw for use in a line-breaking algorithm. Line Breaking. The results of shaping and • The third principle of Pango is rendering sys- boundary resolution are used to choose where tem independence. The same tasks need to be to break lines that need to be wrapped. If performed whether using X fonts to draw to the breaking lines involves dividing items, then screen, drawing into a off-screen buffer in some we'll need to call pango shape() again to do fi- other fashion, or printing to paper. Pango provides nal glyph selection and positioning, since break- an API that can be used for all purposes to ing words may require different glyphs to be se- format the text. Only in the final rendering step lected. (The line breaking algorithm described do rendering-system-dependent calls need to be here is a simple one that works for most uses. used. Because some of the intermediate steps (for High quality text display, as for publishing may instance, positioning glyphs with respect to each require a more sophisticated algorithm that other) do depend knowledge from the rendering globally optimizes for the best line break posi- system, these portions need to be rendering system tions taking into account changes in the glyphs dependent. So each language module is split into that occur during shaping.) pieces: a rendering system-independent module Rendering. The result of the shaping and line (that knows how to do such tasks as find the • breaking process is a set of glyph strings, which permissible breaks within a line) | the language is a list of glyphs from the font, along with module and a rendering-system-dependent module positioning information for each glyph. Since for each supported rendering system | the shaper rendering is specific to the type of fonts be- module. ing used and how they are being rendered, the core Pango library doesn't handle render- ing; the rendering system specific libraries in- At the lowest level, the process of rendering text cluded with Pango, such as libpangox for X consists of a number of steps: fonts and libpangoft2 for using TrueType and Postscript fonts via the FreeType library, in- Itemization. In this step, the input text is di- clude basic rendering routines, or applications • vided a input Unicode string is analyzed and can do their own rendering. broken into items that each handled by a sin- gle language module and have a single direction Since we claim rendering-system independence, (left-to-right or right-to-left). If the application we should quickly mention how we handle font has applied additional markup on the string to information. Pango has an abstract class Pango- control things like style or font size, it may fur- Font which represents a font in some rendering ther subdivide each item into pieces that have system.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us