1 Internationalization

1 Internationalization

2 Discuss various aspects of content Objectives Internationalization Objectives authoring that support universal access Internationalization to the Web You will understand how to declare and use characters & character encodings in X/HTML & CSS Content authoring that supports worldwide use of the Web how to declare the language of the document or parts of the document how to use Chinese or other scripts in URIs according to the latest standards Richard Ishida You will see some of the new CSS style capabilities that may soon be available for Asian languages 4 5 Outline Outline Outline Outline Character sets & encoding Character sets & encoding Identifying language Character set vs. character encoding IDN & IRIs The Document Character Set East Asian typography Choosing an encoding Serving HTML & XHTML Declaring the document encoding Entities & Numeric Character References Care & feeding of characters Identifying language IDN & IRIs East Asian typography 1 6 An important initial distinction 7 Character set vs. character encoding Character set The set of atomic text elements you will use for a character encoding character character encoding character particular purpose. vs vs Character Encoding The way these abstract characters are mapped to numbers for manipulation in a computer. Character set Character set A single character set, such as Unicode, may 9 10 have more than one character encoding. The Document Character Set 好 ũ א A Code point U+0041 U+05D0 U+597D U+233B4 character encoding character character encoding character vs vs UTF-8 41 D7 90 E5 A5 BD F0 A3 8E B4 UTF-16 00 41 05 D0 59 7D D8 4C DF B4 UTF-32 00 00 00 41 00 00 05 D0 00 00 59 7D 00 02 33 B4 Character set Character set 2 What is the Document Character Set? 11 What is the Document Character Set? 12 the logical model that describes how XML and HTML are means that documents can only contain characters defined processed by Unicode for XML and HTML (from version 4.0): any encoding can be used for your document as long as it is the Universal Character Set (UCS) defined by both ISO/IEC properly declared and a subset of the Unicode repertoire 10646 and Unicode standards values of numeric character references (such as ǵ it does not mean that all HTML and XML documents have to and ǵ for ǵ) are interpreted as Unicode characters - The Document Character Set Character Document The Set Character Document The The Document Character Set Character Document The be encoded as Unicode ! Set Character Document The no matter what encoding you use for your document See GEO FAQ: Document character set http://www.w3.org/International/questions/qa-doc-charset.html 13 Consider using Unicode 14 Choosing an encoding supports many languages, enabling the use of a single encoding across all pages and forms, regardless of language eliminates the need for server-side logic to determine the Choosing an encoding Choosing an encoding character encoding for each page served or each incoming form submission allows many more languages to be mixed on a single page than almost any other choice Although there are other multi-script approaches (such as ISO- 2022 and GB18030), Unicode generally provides the best combination of user agent and script support. 3 If you don't use Unicode 15 16 select an encoding that maximizes the opportunity to Serving HTML & XHTML directly represent characters / minimizes the need to represent characters by character escapes Choosing an encoding Choosing an encoding select commonly supported encodings & check that user agents adequately support the encoding selected consider a solution that minimizes complexity when dealing with multiple languages and scripts note that support for a given encoding does not necessarily imply support for all writing systems that that encoding supports XHTML & MIME types 17 XHTML & MIME types 18 HTML text/html HTML text/html Use the compatibility Use the compatibility XHTML text/html guidelines in XHTML XHTML text/html guidelines in XHTML Spec, Appendix C ! Spec, Appendix C ! Serving HTML & XHTML HTML Serving & XHTML HTML Serving Serving HTML & XHTML HTML Serving application/xhtml+xml & XHTML HTML Serving application/xhtml+xml application/xml application/xml text/xml text/xml 4 XHTML & MIME types 19 'Standards' vs 'Quirks' modes 20 HTML HTML text/html XHTML text/html + compatibility guidelines Serving HTML & XHTML HTML Serving & XHTML HTML Serving Serving HTML & XHTML HTML Serving application/xhtml+xml & XHTML HTML Serving application/xml text/xml XML 'Standards' vs 'Quirks' modes 21 'Standards' vs 'Quirks' modes 24 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>xhtml document</title> <style type="text/css"> body { background: white; color: black; font-family: arial, sans-serif; font-size: 40px; } p { font-size: 50%; } h1 { font-size: 24px; } </style> </head> <body> <h1>Test file for Standards Mode</h1> Serving HTML & XHTML HTML Serving & XHTML HTML Serving Serving HTML & XHTML HTML Serving & XHTML HTML Serving <div style="margin: 50px; width: 300px; padding: 100px; border: 10px solid teal;"> <p> Here is some text in a p in a div. </p> </div> <table border="1"> <tr><td><p>Here is some text...</p></td> <td><p>...in a p tag</p></td> </tr> <tr><td>Here is some ...</td> <td>... that's not.</td> </tr> </table> </body> </html> 5 'Standards' vs 'Quirks' modes 25 Summary of assumptions & recommendations 26 <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>xhtml document</title> <style type="text/css"> body { background: white; color: black; font-family: arial, sans-serif; font-size: 40px; } we assume standards mode and relatively up to date user p { font-size: 50%; } agents h1 { font-size: 24px; } </style> </head> <body> use XHTML where possible <h1>Test file for Standards Mode</h1> Serving HTML & XHTML HTML Serving & XHTML HTML Serving Serving HTML & XHTML HTML Serving <div style="margin: 50px; width: 300px; padding: 100px; border: 10px solid teal;"> & XHTML HTML Serving <p> Here is some text in a p in a div. </p> XHTML served as xml is still not widely supported </div> <table border="1"> <tr><td><p>Here is some text...</p></td> <td><p>...in a p tag</p></td> XHTML served as XML should be served as </tr> application/xhtml+xml <tr><td>Here is some ...</td> <td>... that's not.</td> </tr> </table> we assume that some people will not want to use the XML </body> </html> declaration ie. <?xml version="1.0" encoding="utf-8"?> 27 Basic scenarios 28 Declaring the document encoding HTTP <?xml .. <meta .. HTML XHTML (text/html) Declaring the document encoding document the Declaring Declaring the document encoding document the Declaring XHTML (XML) 6 Where appropriate, declare the page's 29 Where appropriate, declare the page's 30 character encoding by setting the charset character encoding by setting the charset parameter in the HTTP Content-Type header. parameter in the HTTP Content-Type header. method depends on the server + user agents can easily find the information server may have default settings highest priority in case of conflict, so should be used where transcoding done by the server users may be able to override default settings more difficult for content authors to change the setting - - especially when dealing with an ISP Eg .htaccess files in Apache • AddType 'text/html; charset=UTF-8' html server settings may get out of synch with the document • <Files ~ "events\.html"> doesn't cater for documents read from CD or hard disk Declaring the document encoding document the Declaring encoding document the Declaring Declaring the document encoding document the Declaring ForceType 'text/html; charset=UTF-8' encoding document the Declaring </Files> may not facilitate processing, eg XSLT or translation For XHTML served as text/html, where 31 For XHTML served as text/html, where 32 practical use an XML declaration with an practical use an XML declaration with an encoding attribute. encoding attribute. <?xml version="1.0" encoding="UTF-8"?> useful when editing or processing the file as XML, eg. using + XSLT helps developers, testers, or translation production Character set name managers who want to perform a visual check of a document not required for UTF-8 or UTF-16, but useful anyway allows the document to be read correctly when not on the find a name at server http://www.iana.org/assignments/character-sets Declaring the document encoding document the Declaring encoding document the Declaring Declaring the document encoding document the Declaring encoding document the Declaring the XHTML spec says you should use the preferred name when aliases are available - knocks Internet Explorer documents into Quirks mode avoid using unregistered names (ie. x-… ) not actually needed for HTML documents 7 For XHTML served as application/xhtml+xml, 33 For HTML documents and XHTML documents 34 always use an XML declaration with an served as text/html, always use the <meta> encoding attribute. element to explicitly declare the document's character encoding. Character set name useful when processing the file as XML, eg. using XSLT <meta http-equiv="Content-type" + content="text/html;charset=UTF-8" /> developers, testers, or translation production managers may want to perform a visual check of a document required for all encodings, including UTF-8 or UTF-16 the XHTML spec says you should put as near as possible to the top of the file (ie.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us