Web Internationalization – Standards and Practice

Objectives

• Describe the standards that define the Web Internationalization architecture & principles for I18N on the web Standards and Practice • Scope limited to markup languages • Provide practical advice for working with international data on the web, including the design and implementation of Internationalization and Conference 34 multilingual web sites and localization

Tex Texin, XenCraft considerations Copyright © 2002-2010 Tex Texin and Yves Savourel. • Be introductory level

Web Internationalization – Standards and Practice Slide 2

Legend For This Presentation Web Internationalization Agenda

Icons used to indicate current product support: • Emphasis on Character Processing • Updates for HTML5 Internet Firefox 3.6 Opera 10 Explorer 8 This presentation and part 2 and example code are Supported: available at:

Partially supported: www.xencraft.com/training/webstandards.html

Not supported: Richard Ishida and W3C test I18n features for Caution numerous browsers and versions (X)HTML: Highlights a note for users or developers to be careful. www.w3.org/International/tests/

Web Internationalization Slide 3 Web Internationalization – Standards and Practice Slide 4

Internationalization and Unicode Conference 1 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Web I18n Part 1- Character Processing A Simple HTML Example Page

Character Encodings Negotiation Reference Processing Model Character Escaping Unicode in Markup Normalization Identifiers

Web Internationalization – Standards and Practice Slide 6 Web Internationalization Slide 7

A Simple HTML Example Page A Simple HTML Example Page

Here is how the same HTML looks in Japan Here is how the same HTML looks in Japan The browser has no information about the encoding of the web page. It uses a default value which in this case, is very wrong and even confuses the markup (see beauté).

Web Internationalization Slide 8 Web Internationalization Slide 9

Internationalization and Unicode Conference 2 Tex Texin, XenCraft Web Internationalization – Standards and Practice

A Simple HTML Example Page Character Encodings

Here is how the same HTML looks in Japan Encoding disagreement is one problem for text. We also consider the following problems and solutions. Some of the problems may (See: Character Model for the World Wide Web www.w3.org/TR/charmod/ ) not be obvious to the reader. Problem Solution Changing the euro symbol Encoding disagreement Encoding negotiation to a bullet, might cause a Encoding diversity Reference processing model Encoding limitations Character escaping significant financial error. Unicode vs. markup Markup preferred on the web

String matching Early uniform normalization String indexing Character counting guidelines

Web Internationalization Slide 10 Web Internationalization Slide 11

Character Encodings Start by identifying needed symbols

Use Example Symbols First though: What are character encodings? Letters ABCDEF…abcdef…ÂÃÄ Æ ß Ñ ACR ≠ A + ˚ : . , ; ? ! ( ) - … Numeric, 0 1 2 3 4 5 6 7 8 9 + − ± * × / † < = Arithmetic > % ‰ # ¼ ½ ¾ ⅓ ⅔ ⅣⅲⅫ CCS U+233B4 U+2260 U+0041 U+030A $ ¢ £ ¥ € ₣ ₤ ₧ ₫ ₪₩ © ® ™ ℅ ° Business ℃ ℉ ∞ CEF D84C DFB4 2260 0041 030A Mathematics ∀ ∂ ∃ ∇ ∈ ∉ ∋∏ ∑ ≦ ≥ ≡ ≠ ≈ ⊆ ⊇ Other applications: ¶ § ♀ ♂ ♠ ♣ ♥ ♦ ← ↑ ↔ ↵ CES D8 4C DF B4 22 60 00 41 03 0A Proofreading, ♧ ♨ ♬ ♭♯ Å ☞ ☜ games, music… The Character Encoding Model – Unicode Tech. Report 17 Web Internationalization Slide 12 Web Internationalization Slide 13

Internationalization and Unicode Conference 3 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Character Encodings Character Encodings

ACR = Abstract Character Repertoire CCS = Coded Character Set

ACR ≠ A + ˚ ACR ≠ A + ˚

CCS U+233B4 U+2260 U+0041 U+030A Maps each character to a non-negative unique number. • The set of characters you need to represent – Note this example uses numbers. – (aka Character Set). – The “U+” indicates use of Unicode‟s numbering. • Characters may be composable. E.g. Å = A + ˚ – The grapheme Å consists of two characters A + ˚ – Unicode calls these “Unicode Scalar Values”

Web Internationalization Slide 14 Web Internationalization Slide 15

Character Encodings Character Encodings

CEF = Character Encoding Form CEF = Character Encoding Form

ACR ≠ A + ˚ ACR ≠ A + ˚

CCS U+233B4 U+2260 U+0041 U+030A CCS U+233B4 U+2260 U+0041 U+030A

CEF D84C DFB4 2260 0041 030A CEF D84C DFB4 2260 0041 030A 16-bit 16-bit

CEF F0 A3 8E B4 E2 89 A0 41 CC 8A Note the relationship between the CEFs is not so simple. 8-bit Map CCS to fixed width units (e.g. 32, 16, or 8 –bit) Map CCS to fixed width units (e.g. 32, 16, or 8 –bit) Note the relationship between the CEFs is not so simple. Web Internationalization Slide 16 Web Internationalization Slide 17

Internationalization and Unicode Conference 4 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Character Encodings Character Encodings

CES = Character Encoding Scheme • Many character sets exist and in popular use • Many encoding schemes, even for 1 character ACR ≠ A + ˚ set ISO 8859-1 ≈ IBM 850 ISO-2022-JP, Shift_JIS, EUC-JP (JIS X-0208-1997) CCS U+233B4 U+2260 U+0041 U+030A UTF-8 = UTF-16 = UTF-32

CEF 16-bit D84C DFB4 2260 0041 030A • Given just bytes, the character set and the encoding scheme can be indeterminate. CES D8 4C DF B4 22 60 00 41 03 0A UTF-16BE How can a browser know how to decode a web page? CES: Mapping the CEF(s) to serialization of bytes Web Internationalization Slide 18 Web Internationalization Slide 19

Encoding Identification Character Encoding Names

• Given just bytes, encoding is IANA (Internet Assigned Numbers Authority) indeterminate. – Maintains registry of official names for character • How can an encoding be identified? sets (actually encodings) used on the internet and in MIME (mail) • Registry Names • There are 2 requirements: – ASCII, printable characters – Agreement on names for encodings – Case-insensitive – Mechanisms for labeling text with encoding – Maximum length 40 characters – Aliases (alternative names) are also registered – The preferred name is indicated www.iana.org/assignments/character-sets

Web Internationalization Slide 20 Web Internationalization Slide 21

Internationalization and Unicode Conference 5 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Unregistered Encoding Names Character Encoding Names

• Conventions for Unregistered Character • IANA Name and Alias Examples Encoding Names – ISO_8859-1:1987 (ISO_8859-1, ISO-8859-1, latin1, – Name begins with “x-” L1, IBM819, CP819, csISOLatin1) – Example: “x-Tex-Yves-encoding” – Windows-1252, GB2312, BIG5, BIG5-HKSCS – Useful for private encodings or very new – SHIFT_JIS, HP-Legal encodings – Extended_UNIX_Code_Packed_Format_for_Japanese – Adobe-standard-encoding – Not useful on the web, except for private – UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32 exchange • Note- Registry contains many useless names • Note- Preferred names indicated. Use them.

Web Internationalization Slide 22 Web Internationalization Slide 23

Markup and Encoding Names HTTP and Encoding Names

• HTTP Mechanism for labeling HTTP with encoding • HTML • XML HTTP Response • CSS

• Links 200 OK HTTP/1.1 – HTML Content-Type: text/; charset=UTF-8 --- Blank Line – HTML <… HREF> document – XML <… HREF> ...

Web Internationalization Slide 24 Web Internationalization – Standards and Practice Slide 25

Internationalization and Unicode Conference 6 Tex Texin, XenCraft Web Internationalization – Standards and Practice

HTML, XML & Encoding Names New in HTML 5:

HTML – Use “preferred MIME name” in IANA registry – HTML does not specify a default. – Use BOM instead for UTF-16. – Supported by most browsers XML – Simpler and less error-prone then CONTENT="text/html; charset=UTF-8"> – Alternative declaration: Begin with (U+FEFF), for UTF-16 or UTF-8 – Byte Order Mark now recognized – Note UTF-16 MUST begin with a BOM – UTF-32, EBCDIC, others not recommended – The default encoding is UTF-8. – UTF-7, SCSU, et al must not be supported

Web Internationalization – Standards and Practice Slide 26 Web Internationalization – Standards and Practice Slide 27

CSS2 and Encoding Name LINKs and Encoding Name

CSS2 Declaring the charset of a LINKed document – Only used in the first line of external style • HTML sheets

– CSS 2.1 added Unicode Byte Order Mark (BOM, U+FEFF) as an encoding indicator. Unicode – Encoding is unspecified if BOM and @charset • XML conflict.

Web Internationalization – Standards and Practice Slide 28 Web Internationalization Slide 29

Internationalization and Unicode Conference 7 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Notes: Declaring Encoding Names HTML5: LINKs and Encoding Name

• Charset on links can be incorrect if the Declaring the charset of a LINKed document document‟s encoding on the server changes • Deprecated in HTML5 • The encoding for the – Place it is as early as possible in the document. – Else, prior statements may be decoded incorrectly. Unicode • Note: – Transcoders do not generally correct charset ID

Web Internationalization Slide 30 Web Internationalization X Slide 31

HTML4 Encoding Priorities HTML5 Encoding Priorities

Prioritization is used to resolve conflicts. • From high to low priority, HTML5 uses the • From high to low priority, HTML uses the encoding encoding of: of: 1. User override for charset 1. HTTP “Content-Type” charset 2. HTTP “Content-Type” charset 2. 3. Byte Order Mark 3. LINK or other syntax for external documents 4. Either (Must only be one) 4. Charset-detecting heuristics • • Many user agents (browsers) support a user override 5. Character set-detecting heuristics for charset (highest priority)

Web Internationalization Slide 32 Web Internationalization Slide 33

Internationalization and Unicode Conference 8 Tex Texin, XenCraft Web Internationalization – Standards and Practice

CSS2 Encoding Priorities XML Encoding Priorities

Prioritization is used to resolve conflicts. • Encoding name processing is more carefully • From high to low priority, CSS 2.1 external specified for XML. style sheets use the encoding of: • As with HTML, protocol or external information can supercede declaration, BOM or default of 1. HTTP “Content-Type” charset UTF-8. 2. BOM/@charset rule in the style sheet • XML Appendix E (non-normative): Prioritization 3. LINK or other syntax in referencing should be specified by protocols. document – Recommends use of BOM or encoding declaration for files (rather than an external source). 4. Charset of the referencing document – Refers to RFC 3023 • RFC 3023 specifies several encoding scenarios based on 5. Assume UTF-8 MIME media type: text/xml, application/xml, etc.

Web Internationalization Slide 34 Web Internationalization – Standards and Practice Slide 35

Web I18n Part 1- Character Processing Character Encoding Negotiation

Character Encodings Unix user Windows user Character Encoding Negotiation Reference Processing Model Character Escaping GB2312 1252 Unicode in Markup Normalization html html Identifiers

Web Internationalization Slide 36 Web Internationalization Slide 37

Internationalization and Unicode Conference 9 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Typical Browser-Server HTTP Sequence Character Encoding Negotiation

Accept Charset Get URL HTML Browser Server 1. Browser issues GET URL x,y,z Pages 2. Server sends RESPONSE 3. Browser displays document in RESPONSE GET / HTTP/1.1 Accept-Language: en-us,en,hr;q=0.5 4. Browser POSTs Form with user data (text) Accept-Charset: iso-8859-1,utf-8;q=0.75,*;q=0.5 5. Web Server receives data, database The browser‟s HTTP GET request can list the languages and application stores text. the encodings it can make use of, to guide the server. •“q” is a relative measure of the usefulness (quality) of an entry.

Which encoding is sent by the server? The above example indicates: •US English preferred, other English, Croatian are also ok. Which encoding is returned by the browser? •ISO 8859-1 preferred, then UTF-8, then anything else.

Web Internationalization Slide 38 Web Internationalization Slide 39

Character Encoding Negotiation Character Encoding Negotiation

Accept Charset Get URL HTML Browser Server • Most browsers let you set your language x,y,z Pages preferences and priorities Response Browser Server CHARSET=x • Encoding capabilities are not settable (since they are software dependent). 200 OK HTTP/1.1 Content-Type: text/html; charset=iso-8859-1 – Microsoft IE doesn‟t send ACCEPT-CHARSET. --- Blank Line – (U.S.) NS 7: ISO-8859-1, UTF-8;q=0.66, *;q=0.66 HTML document ...

– Opera 6.0 sends: The server returns a document. Windows-1252;q=1.0, UTF-8;q=1.0, UTF-16; q=1.0, The encoding is declared in the RESPONSE header. iso-8859-1;q=0.6, *;q=0.1 (Web administrators or content authors need to inform the server about document encodings.)

Web Internationalization Slide 40 Web Internationalization Slide 41

Internationalization and Unicode Conference 10 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Character Encoding Negotiation Character Encoding Negotiation

Accept Charset Get URL HTML Accept Charset Get URL HTML Browser Server Browser Server x,y,z Pages x,y,z Pages Response Browser Server CHARSET=x Response Browser Server CHARSET=x Form Data Set Browser Server O/S Charset =z The browser adapts the document for operating System display. The browser also accepts user data in HTML

and can send it to the server as a Form Data Set.

A Form Data Set is a series of control name/current value pairs, for “successful” controls.

There are 3 ways browsers submit form data sets.

Web Internationalization Slide 42 Web Internationalization Slide 43

Form Data Set Form Data Set Submission

• GET + HTTP URI Name: Form Data Set appended to URI +”?” encoded as Male – “application/x-www-form-urlencoded“ Female

• POST + HTTP URI Form Data Set sent in body, encoded as either

Form Data Set = Control Name/Current Value Pairs 1) “application/x-www-form-urlencoded“ or Name/Tex 2) “multipart/form-data” (MIME, RFC 2045) sex/m

Web Internationalization Slide 44 Web Internationalization Slide 45

Internationalization and Unicode Conference 11 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Form Data Set- GET Method Submission Form Data Set Encoding

Name=Value&Name2=Value2&Name3=Value3 – Pairs of control names and current values. Name: – Names separated from values by = Male – Name/value pairs separated by & Female – Spaces replaced by + – Line breaks represented as CR LF: %0D%0A
– Non-alphanumeric and non-ASCII characters and ‘+’, ‘&’, ‘=’, are replaced by %HH – Browsers map current encoding byte values to %HH – If the server doesn‟t know browser‟s character This simple form will submit a an HTTP GET with: encoding, it may decode form data incorrectly. http://www.xencraft.com/cgitest?Name=Tex&sex=m

Web Internationalization Slide 46 Web Internationalization Slide 47

Form Data Set Encoding Character Encoding Negotiation

Accept Charset Get URL HTML Application/x-www-form-urlencoded Browser Server x,y,z Pages Response Browser Server CHARSET=x

Submit Form Example comparing two character encodings: (GET or POST) Browser Server O/S Charset =z Charset=ISO-8859-1 CHARSET=x Name=Fran%E7ois+Ren%E9+Strau%DF encoding=x-www-form-urlencoded

Charset=UTF-8 Modern browsers send x-www-form-urlencoded data to Name=Fran%C3%A7ois+Ren%C3%A9+Strau%C3%9F the server in the CHARSET that was determined to be that of the *form*, however that determination was made (HTTP, , default, user override).

Web Internationalization Slide 48 Web Internationalization Slide 49

Internationalization and Unicode Conference 12 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Character Encoding Negotiation Character Encoding Negotiation

Accept Charset Get URL HTML Returning data in the encoding received Browser Server x,y,z Pages • Generally works in principle Response Browser Server CHARSET=x • Document „charset” must be correctly Submit Form identified (and has often been wrong) (POST) Browser Server • Fails with multiple encodings handled by a O/S Charset =z CHARSET=x single CGI multipart/form-data (MIME) • Fails with transcoding proxies (not allowed to Each control name/current value pair is a separate change URIs). part. Each part can be a different charset or content-type encoding. • Recommend using UTF-8 in both directions Supports file uploading (RFC1867).

Web Internationalization Slide 50 Web Internationalization Slide 51

Character Encoding Negotiation Character Encoding Negotiation

Multipart/form-data Other solutions to identifying encodings: • More efficient than x-www-form-urlencode for • XFORMS fixes the failure cases: non-ASCII data, binary data, and files http://www.w3.org/MarkUp/Forms/ • Does not have the length limit that browsers http://www.w3.org/TR/xforms/ impose on (can be as low as 250 for • TIP: Use with older browsers: some devices) • Hidden fields containing encoding name or • Is now well supported carefully chosen text (tracks transcodings). • Recommended for POST of all form data CGI performs analysis. – e.g. Microsoft‟s _CHARSET_

Web Internationalization Slide 52 Web Internationalization Slide 53

Internationalization and Unicode Conference 13 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Web I18n Part 1- Character Processing Reference Processing Model

Character Encodings • Different encoding schemes require different Character Encoding Negotiation decoding/parsing/processing methods Reference Processing Model – Single, and Multi-byte character sets (e.g. EUC) – Character encoding-switching schemes (ISO 2022) Character Escaping – Forward combining (accent-base letter) Unicode in Markup – Backward combining (base letter-accent) Normalization – Logical ordering/Visual ordering Identifiers • Variety bothers implementers and spec writers • Adopting a single universal encoding obsoletes most of the existing data • Instead, use a character abstraction Web Internationalization Slide 54 Web Internationalization Slide 55

Reference Processing Model Reference Processing Model

• Logically, characters are Unicode characters – Specifications are in terms of Unicode characters – Implementations do NOT have to use Unicode, In/out Any encoding only behave as if they did on the wire HTML • Benefits Abstraction – Removes ambiguity, simplifies specifications C Layer using internal S – Allows flexibility for common local encodings Unicode S – Backward compatible for older HTML browsers – Supports internationalization (large character set) Any encoding XML for internal – Removes dependencies/orientation on byte values implementation

Web Internationalization Slide 56 Web Internationalization Slide 57

Internationalization and Unicode Conference 14 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Reference Processing Model Web I18n Part 1- Character Processing

• Examples using Reference Processing Model Character Encodings – HTML 4.0 declares Unicode as its SGML Document Character Encoding Negotiation Character Set Reference Processing Model – CSS “sequence of characters from UCS” – XML “A character is an atomic unit of text as specified by Character Escaping ISO/IEC 10646” Unicode in Markup • Any encoding can be used internally, but Unicode often makes the most sense. Normalization • XML requires parsers to accept UTF-8 and UTF-16, Identifiers making Unicode best internal choice • Some Recommendations require Unicode – e.g. DOM requires UTF-16 Web Internationalization Slide 58 Web Internationalization Slide 59

Character Escaping Character Escaping

Mechanisms to represent characters • Useful for: • Numeric Character References (NCRs) – syntax-significant characters – HTML and XML – e.g. < (<), > (>), & (&), " (") Hexadecimal: &#xhhhhhh; – characters outside current encoding Decimal &#dddd; – eliminating visual or other ambiguity – CSS2 ­ (soft-), “\hh ” (note terminating ), \hhhhhh - (hyphen-minus) • Character Entity References (HTML only) å Å (note case-sensitivity) (space)   (no-break space)

Web Internationalization Slide 60 Web Internationalization Slide 61

Internationalization and Unicode Conference 15 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Character Escaping Character Escaping

• Relies on Reference Processing Model • Don‟t use Windows 1252 code points instead – Always references Unicode scalar value of Unicode, for values 128-159 (0x80-0x9F) • Same value regardless of encoding • e.g. Euro is € or € not € • Same value for UTF-8, UTF-16, UTF-32 • www.i18nguy.com/markup/ncrs.html • One value for supplementary characters, not two • Don‟t simulate characters with special fonts E.g. 𒍅 not �� (e.g. Symbol), or you can get erroneous: – Simplifies transcoding (no parsing or conversion) • Display, depending on font availability – Allows any Unicode character in any document (if • Font fallbacks it is legal in the language of the document) • Searches by Search engines • Behavior from Style sheets • Database contents

Web Internationalization Slide 62 Web Internationalization Slide 63

Windows 1252 vs Unicode & ISO 8859-1 Selecting A Character Encoding

1252 is identical to Unicode and ISO 8859-1 except in 80-9F. • Choose an encoding that minimizes the need Unicode and ISO 8859-1 assigns control codes. Windows-1252 assigns Euro, Smart quotes, TM, and others to escape characters. – Unicode is always a candidate. – Unicode is supported by all but the oldest browsers. – Is the largest character set, and can be expanded. – Therefore it is often the best choice both for minimizing escapes and anticipating future character requirements. • e.g. New currency symbols

Web Internationalization Slide 65

Internationalization and Unicode Conference 16 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Web I18n Part 1- Character Processing Unicode Vs. Markup

Character Encodings • +100,000 characters as of Unicode 5.0 Character Encoding Negotiation – Should we use them all? Reference Processing Model – Are there any we shouldn‟t use? Character Escaping – Does Unicode‟s capabilities, needed for , Unicode in Markup interfere with markup? • Markup can do some things better than Normalization character codes. Not all Unicode characters are Identifiers needed.

Web Internationalization Slide 66 Web Internationalization Slide 67

Unicode Vs. Markup Unicode Vs. Markup

Potential problem areas Solution types • Redundancies impact searching • Restrict characters so they cannot be used – “Å” A-ring “A+˚ ” A+ring “Å” Angstrom • Replace redundancies (normalization) • Formatting characters vs. Markup • Replace with Markup – E.g. Bidi controls, interlinear annotation characters – Extensible – presentation can be separate from content • Characters with style vs. Markup – E.g. Superscript, subscript Joint W3C and Unicode recommendations in: • Object Replacement Character vs. Markup “Unicode in XML and other Markup Languages” http://www.w3.org/TR/unicode-xml/ – Better to use markup to include an image http://www.unicode.org/unicode/reports/tr20/

Web Internationalization Slide 68 Web Internationalization Slide 69

Internationalization and Unicode Conference 17 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Web I18n Part 1- Character Processing String Indexing

Character Encodings • How long is this string? ≠ Å Character Encoding Negotiation Reference Processing Model Character Escaping Unicode in Markup String Indexing Normalization Identifiers

Web Internationalization Slide 70 Web Internationalization Slide 71

String Indexing String Indexing

• Which units should be used for counting? Character Model recommendations Graphemes • Character counting is recommended for most ≠ A + ˚ 3 programming interfaces (e.g. XML Path)

Characters • Code unit counting may be used for internal U+233B4 U+2260 U+0041 U+030A 4 efficiency (e.g. DOM) • Graphemes may be useful for user interaction, Code units D84C DFB4 2260 0041 030A once a suitable definition exists 5 • Avoid creating API with single unit arguments Bytes D8 4C DF B4 22 60 00 41 03 0A e.g. “SS” = Uppercase(“ß”) 10

Web Internationalization Slide 72 Web Internationalization Slide 73

Internationalization and Unicode Conference 18 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Normalization Early Uniform Normalization

• Representing data in Unicode characters can have more than 1 more than 1 way leads representation to errors • Canonical equivalence • E.g. The Mars Climate – Indistinguishable, fundamental equivalence Orbiter mission was – E.g. combining sequences, singletons disastrous. Information – “Å” U+00C5 (A-ring pre-composed) expected to be metric, – “A+˚ ” U+0041 + U+030A (A + combining ring above) was sent in English units – “Å” U+212B (Angstrom) • Solution- Adopt a • Compatibility equivalence standard representation- – E.g. Formatting differences, ligatures Normalize – “カ” U+FF76 “カ” U+30AB (KA half and full width) – “fi” U+FB01 (ligature fi)

Web Internationalization Slide 74 Web Internationalization Slide 75

Early Uniform Normalization Early Uniform Normalization

has defined canonical and Text on the web SHOULD be Fully Normalized. compatibility decomposition formats and 4 Fully Normalized text is either: different sets of rules for normalization: 1. Unicode text in Normalization Form NFC, and “ Unicode Normalization Forms” 2. Does not contain character escapes or includes http://www.unicode.org/unicode/reports/tr15/ that upon expansion would undo point 1, and • The W3C Character Model recommends 3. Does not begin with a composing character. Normalization Form C (NFC) or: – Brings canonical equivalences to composed form – Leaves compatibility forms as distinct 1. Legacy encoded text, which transcoded to Unicode satisfies the above. – Most legacy text is composed, and is unchanged

Web Internationalization Slide 76 Web Internationalization Slide 80

Internationalization and Unicode Conference 19 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Early Uniform Normalization New “International” Features in HTML5

• Examples of Fully Normalized Text • Some attributes that now apply to all elements: “suçon”, “suçon”, – dir (Direction) “sub¸on”, “sub̧on” – lang (Language) Note- Unicode does not have a composed b-cedilla. • lang attribute now supports empty string • Examples that are not Fully Normalized indicating the primary language is unknown “suc¸on”, “suçon” • xml:lang supported, provided it has same value Reason: should use composed character “ç” as lang “¸on”, “̧on” • hreflang attribute added to Reason: should not begin with – For consistency with and

Web Internationalization Slide 81 Web Internationalization Slide 88

New “International” Features in HTML5 New “International” Features in HTML5

• Ruby annotation support: , , • Native support for IRI and IDNA – For IRI, document encoding must be UTF-8 or UTF-16 or the query component must %hh escape 2010 ( yyyy ) 10 ( mm ) any non-ASCII characters 16 ( dd ) • Encodings – New encoding declaration – Support for BOM – Charset attribute removed on and

Web Internationalization Slide 89 Web Internationalization Slide 90

Internationalization and Unicode Conference 20 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Questions Coffee Break

Web Internationalization Slide 91 Web Internationalization Slide 92

Web Internationalization Agenda Language Identification

• Part 1 – Character Processing Same mechanism for all: • Coffee Break • Tags (identifiers) defined by RFC 3066. • Part 2 2-letter and 3-letter language codes (ISO- – Layout and Typography 639) with optional 2-letter country codes – Designing International Web Sites (ISO-3166) separated by a character „-‟ (not „_‟). • Tags are case insensitive (even in XML). • In mark-up: the language attribute is inherited by the children of the element where the attribute is defined.

Web Internationalization Slide 93 Web Internationalization Slide 94

Internationalization and Unicode Conference 21 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Language Identification Language Identification

RFC 3066 Rules: • RFC 3066 does not cover all needs. • 3-letter codes should be used only for the – e.g. Latin-Amer. Spanish, Script distinctions languages that have no 2-letter code. – Addressed case-by case through registrations • Always use the Terminological form of the • No clear distinction of the identifiers of a 3-letter codes, not the Bibliographical “language” and a “locale”. form. – (See past IUC locales talks for more information.) • Avoid user-defined codes (x-myCode) • Standards groups considering these issues: IETF, ISO TC37, SIL, W3C, et al

Web Internationalization Slide 95 Web Internationalization Slide 96

Language Identification: BCP47 Language Identification

• BCP47 evolution • HTTP: Content-Language header – RFC 4646, 4647 • HTML: LANG attribute (e.g. in ) • RFC 4646 replaces RFC 3066. • XML: xml:lang attribute • language-country becomes language-script-country • Registry expanded to include all valid entries • XHTML 1.0: Both lang and xml:lang • New matching rules in RFC 4647 – RFC 5646 just released

Verba.

• 7,000 three-letter ISO 639-3, ISO 639-5 language codes, 7 region codes • 220 'extended language' subtags, for backwards • XHTML 1.1: xml:lang attribute compatibility.

Web Internationalization Slide 97 Web Internationalization Slide 98

Internationalization and Unicode Conference 22 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Language Identification Language Identification – Input file

The lang() function in XPath:

• True if the selected node has xml:lang set to the given language code. Message 100 in English. • Match is done as a sub-string from the start of Message 200 [insertion in French] in American the value: English. Message 200 en Québecquois. 'en' matches 'en', and 'en-us'. • Match is case insensitive: Message 300 en français. 'en' matches 'EN', 'En-us', etc. Message 400 in British English.  Example: Input, Languages.xsl, Output.

Web Internationalization Slide 99 Web Internationalization Slide 100

Language Identification – Style-sheet Language Identification – IE Output

Message 100 in English. (en) Message 200 [insertion in French] in American English. (en-us) en Message 400 in British English. (EN-GB)

()

Web Internationalization Slide 101 Web Internationalization Slide 102

Internationalization and Unicode Conference 23 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Language Identification – FF Output Language Identification- Opera Output

Message 100 in English. (en)Message 200 [insertion in French] Message 100 in English. Message 200 [insertion in French] in in American English. (en-us)Message 400 in British English. American English. Message 200 en Québecquois. Message 300 (EN-GB) en français. Message 400 in British English.

Web Internationalization Slide 103 Web Internationalization Slide 104

Language Identification – CSS Language Identification – CSS

There are two methods to refer to the language attribute in CSS: Test Language and CSS *:lang(zh) { font-family:SimSun }

Text in English.

• The attribute selector.

Text in American English.

Text in generic English.

*[lang|=fr] { font-weight:bold }

Texte en québecquois.

Texte en français.

Text in British English.

• Both use the same matching mechanism as the lang() function in XPath.  Example: LanguagesCSS.htm

Web Internationalization Slide 105 Web Internationalization Slide 106

Internationalization and Unicode Conference 24 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Language Identification – FF Output Quotes – HTML

Text in English. • The element for in-line quotations (auto- quotation marks expected). Text in American English. • The

element for paragraph- Text in generic English. type quotations (indented, and no auto- Texte en québecquois. quotation marks expected).

Texte en français.  Example: Input, Output: Quotes.htm. Text in British English.

Web Internationalization Slide 107 Web Internationalization Slide 108

Quotes – Using CSS Quotes – HTML Input

...

English text with English quoted • CSS allows control of the type of quote to use text.

Text en Français avec English quoted according to the language. text.

Text en Français avec English *[lang|=fr] { quote:'\ab\a0' '\a0\bb' } quoted text containing a quote itself.

qo:before { content:open-quote }

Quotes in Finnish.

Quotes in Polish.

qo:after { content:close-quote }

Quotes in Japanese.

Quotes in German.

Quotes in Dutch.

A paragraph using • Examples blockquote.
 HTML: Input, CSS, Output: QuotesWithCSS.htm.  XML: Input, CSS File, Output: Quotes.xml.

Web Internationalization Slide 109 Web Internationalization Slide 110

Internationalization and Unicode Conference 25 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Some Unicode Characters Quotes – CSS Style-sheet

q:before { content: open-quote; } U+2018 „ Left Single Quotation Mark q:after { content: close-quote; } blockquote:before { content: open-quote; } U+2019 ‟ Right Single Quotation Mark blockquote:after { content: close-quote; } U+201C “ Left Double Quotation Mark [lang|='en'] > * { /* English */ quotes: "\201C" "\201D" } U+201D ” Right Double Quotation Mark U+201E „ Double Low 9 Quotation Mark [lang|='fr'] > * { /*guillemets*/ quotes: "\AB\A0" "\A0\BB" } U+201F ‟ Double High Reversed 9 Q. M. [lang|='fi'] > * {/*same direction*/ quotes: "\201D" "\201D" } U+300C 「 Left Corner Bracket [lang|='de'] > * { /* German */ quotes: "\201E" "\201C" } U+300D 」 Right Corner Bracket [lang|='ja'] > * { /* Japanese */ quotes: "\300C" "\300D" }

U+00AB « Left Pointing Double Angle Q. M. [lang|='nl'] > * { /* Dutch */ quotes: "\2018" "\2019" }

U+00BB » Right Pointing Double Angle Q. M. [lang|='pl'] > * { /* Polish */ quotes: "\201E" "\201D" } U+00A0 No Break Space

Web Internationalization Slide 111 Web Internationalization Slide 112

Quotes – Firefox 3 Output Quotes – Opera Output

English text with “English quoted text”. English text with “English quoted text”. Text en Français avec « English quoted text ». Text en Français avec « English quoted text ». Text en Français avec « English quoted text containing a Text en Français avec « English quoted text containing a “ quote” itself ». ”quote" itself ». ”Quotes” in Finnish. ”Quotes” in Finnish. „Quotes” in Polish. „Quotes” in Polish. 「Quotes」 in Japanese. 「Quotes」 in Japanese. „Quotes“ in German. „Quotes“ in German. „Quotes‟ in Dutch. „Quotes‟ in Dutch. “A paragraph using blockquote.” “A paragraph using blockquote.”

Web Internationalization Slide 113 Web Internationalization Slide 114

Internationalization and Unicode Conference 26 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Casing Casing TextTransform.htm

• CSS2 provides the property Style: text-transform with 5 values: uppercase, lowercase, capitalize, conversion (making it useless from an i18n viewpoint). CSS3 (working draft) forces Unicode casing conformance. This property is deprecated in XSL 1.0.  Example: Source, Output: TextTransform.htm.

Web Internationalization Slide 115 Web Internationalization Slide 116

Casing TextTransform.htm Casing – IE and Opera Output

Original = This text should be all uppercased.
Original = This text should be all uppercased. Transformed = This text should be all Transformed = THIS TEXT SHOULD BE ALL UPPERCASED. uppercased.

Original = THIS TEXT SHOULD BE ALL LOWERCASED.
Original = THIS TEXT SHOULD BE ALL LOWERCASED. Transformed = THIS TEXT SHOULD BE ALL Transformed = this text should be all lowercased. LOWERCASED.

Original 1 = tHIS tEXT sHOULD bE cAPITALIZED.
Transformed = tHIS tEXT sHOULD bE Original 1 = tHIS tEXT sHOULD bE cAPITALIZED. cAPITALIZED.
Transformed = THIS TEXT SHOULD BE CAPITALIZED. Original 2 = this text should be capitalized.
Original 2 = this text should be capitalized. Transformed = this text should be Transformed = This Text Should Be Capitalized. capitalized.

[de] Original = ß (sharp-s), ö (o-diaeresis)
[de] Original = ß (sharp-s), ö (o-diaeresis) Transformed = ß (sharp-s), ö (o- Transformed = ß (SHARP-S), Ö (O-DIAERESIS) diaeresis)

[tr] Original = i (i-with-dot)
Transformed = i (i-with-dot)

[tr] Original = i (i-with-dot) Transformed = I (I-WITH-DOT)

Web Internationalization Slide 117 Web Internationalization Slide 118

Internationalization and Unicode Conference 27 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Casing – Firefox Output Casing – Clipboard Copy (unchanged)

Original = This text should be all uppercased. Original = This text should be all uppercased. Transformed = THIS TEXT SHOULD BE ALL UPPERCASED. Transformed = This text should be all uppercased.

Original = THIS TEXT SHOULD BE ALL LOWERCASED. Original = THIS TEXT SHOULD BE ALL LOWERCASED. Transformed = this text should be all lowercased. Transformed = THIS TEXT SHOULD BE ALL LOWERCASED.

Original 1 = tHIS tEXT sHOULD bE cAPITALIZED. Original 1 = tHIS tEXT sHOULD bE cAPITALIZED. Transformed = THIS TEXT SHOULD BE CAPITALIZED. Transformed = tHIS tEXT sHOULD bE cAPITALIZED. Original 2 = this text should be capitalized. Original 2 = this text should be capitalized. Transformed = This Text Should Be Capitalized. Transformed = this text should be capitalized.

[de] Original = ß (sharp-s), ö (o-diaeresis) [de] Original = ß (sharp-s), ö (o-diaeresis) Transformed = SS (SHARP-S), Ö (O-DIAERESIS) Transformed = ß (sharp-s), ö (o-diaeresis)

[tr] Original = i (i-with-dot) [tr] Original = i (i-with-dot) Transformed = I (I-WITH-DOT) Transformed = i (i-with-dot)

Web Internationalization Slide 119 Web Internationalization Slide 120

Numbered Lists Numbered Lists NumberedLists.htm

With CSS2 • CSS2 offers the list-style-type ...  Example NumberedLists.htm ...

Web Internationalization Slide 121 Web Internationalization Slide 122

Internationalization and Unicode Conference 28 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Numbered Lists NumberedLists.htm Numbered Lists – Firefox Output

...

List numbered in Hebrew:

  1. Item 1
  2. ...
  3. Item 6

List numbered in Georgian:

  1. Item 1
  2. ...
  3. Item 6

List numbered in Armenian:

  1. Item 1
  2. ...
  3. Item 6

List numbered in Han character (cjk-ideographic):

  1. Item 1
  2. ...
  3. Item 6

Web Internationalization Slide 123 Web Internationalization Slide 124

Numbered Lists Number Formatting

With XSL The function format-number() • XSL provides more flexibility as the format in XSL allows the formatting of numbers based and the type of the numbers can be changed on a given pattern. using . • Uses same patterns as Java 1.1 java.text.DecimalFormat patterns. • Use to  Example: Input, ListNumbers.xsl, Output. overwrite the default symbols (i.e. decimal separator, grouping separator, etc.).

 Example: Input, XSL File, Output.

Web Internationalization Slide 125 Web Internationalization Slide 126

Internationalization and Unicode Conference 29 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Text Flow Text Flow

Bi-directional Text in HTML Bi-directional Text for XML (CSS2) • The dir attribute: • Use the direction and unicode-bidi – dir="ltr" (default), dir="rtl" properties. The unicode-bidi property – Affects the default value of align. specifies the behavior for inline levels elements – Inherited (use it in to set the base for the (15 maximum levels of embedding). whole document). • Based on Unicode bidi algorithm (UAX#9) • The element: para.bidi { direction:rtl; – Overrides implicit directional properties of content. unicode-bidi:embed } – Requires the dir attribute.  Example: BidiText.htm

Web Internationalization Slide 127 Web Internationalization Slide 128

Text Flow – Bidi Example Source (1/2) Text Flow – Bidi Example Source (2/2)

-Pepper Creek LLC, Using dir="ltr חברת span":
.שנוסדה זה-עתה, מונה יותר מ550- עובדים ,Pepper Creek LLC חברת

.שנוסדה זה-עתה, מונה יותר מ550- עובדים <"p dir="rtl> Using dir="rtl":
Using dir="ltr" (wrong):
.שנוסדה זה-עתה, מונה יותר מ550- עובדים

.שנוסדה זה-עתה, מונה יותר מ550- עובדים

Web Internationalization Slide 129 Web Internationalization Slide 130

Internationalization and Unicode Conference 30 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Text Flow – Bidi Output Text Flow

Vertical Text

• Use the writing-mode property (CSS3). • For example, to display top-to-bottom, and right-to-left text use:

div.vertical { writing-mode:tb-rl }

 Example in HTML, and in SVG.

Web Internationalization Slide 131 Web Internationalization Slide 132

Text Flow – Vertical, HTML Text Flow – Vertical, HTML Output

Example of horizontal text (rl-tb).

Example of vertical text (tb-rl).

Example of vertical text with horizontal insert.

Web Internationalization Slide 133 Web Internationalization Slide 134

Internationalization and Unicode Conference 31 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Text Flow – Vertical, SVG Text Flow – Vertical, SVG in HTML

Horizontal Text type="image/svg+xml" width="330" height="330" /> Example of vertical text

Web Internationalization Slide 135 Web Internationalization Slide 136

Text Flow – Vertical, SVG Output Ruby Annotation

Annotation in smaller characters running above or below a base text.

• Used in Japanese for pronunciation of characters (Furigana). • W3C Ruby Module: element with for the base text, for the ruby text. and for complex annotations.

 Example: Ruby.htm

Web Internationalization Slide 137 Web Internationalization Slide 138

Internationalization and Unicode Conference 32 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Ruby Annotation – HTML (1/2) Ruby Annotation – HTML (2/2)

Simple Ruby test:

Ruby complex:

日本語 にほんご 10 31 2002

Ruby with parenthesis text, used if ruby is not

implemented:

Month Day Year
日本語 [[にほんご]] Expiration Date

Web Internationalization Slide 139 Web Internationalization Slide 140

Ruby Annotation – IE Output Ruby Annotation – Firefox Output

Web Internationalization Slide 141 Web Internationalization Slide 142

Internationalization and Unicode Conference 33 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Sorting Sorting

XSL offers the Version 2.0 of XSL has new features for element to collate lists of items. http://www.w3.org/TR/xslt20/#dt-collation • Use lang (not xml:lang) to specify the language to use for the sorting rules. • case-order attribute specifies whether to sort • Results depend on the implementation of the uppercase or lowercase first. XSL engine. • collation attribute names an implementer-  Example: Sorting.xml sorted for English defined collation to use. and Norwegian. (Sorting_EN.xsl and – if given, lang and case-order are ignored. Sorting_NO.xsl).

Web Internationalization Slide 144 Web Internationalization Slide 145

Summary Standards Text Layout Web Internationalization Agenda

Feature Lang() y y N • Part 1 – Character Processing Lang pseudo-class N Y Y • Coffee Break Lang attr selector N Y Y Quote:qo N Y ½ • Part 2 Text-transform Y Y Y – Layout and Typography Css list-style-type N Y ½ Xsl number Y N N – Designing International Web Sites Xsl format-number Y Y Y Html bi-directional text Y Y Y Css bi-directional text Y Y Y Vertical text (SVG losing ground) Y/N N N Ruby annotation ½ N N Css3 combined sort N N N Xsl:sort Y Y N

Web Internationalization Slide 146 Web Internationalization Slide 147

Internationalization and Unicode Conference 34 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Requirements Domain Names

• Requirements are not always compatible: Easier if each language has its own domain – Business requirements. name: www.xyzcorp.fr, www.xyzcorp.de, etc. • Web site ranked high in search engines;  One domain = One language. • Single look and feel across sites in different languages – Localization requirement. Unfortunately: • Avoid changing links in localized pages; – Most site have only one address for many • To have locale-specific content; etc. languages. – Even „country-specific‟ sites may have • Solutions depend on technologies used several languages: www.xyzcorp.ca – (static Web site, client-side scripting, server-side scripting, databases, multiple addresses, etc).  English, French, Inuktitut.

Web Internationalization Slide 148 Web Internationalization Slide 149

Directories and Files Directories and Files

\ One possible solution: +----- index.html +----- index_fr.html • Home page of the „main‟ language is the entry | +----- en point of the directory structure. | +----- about.html (e.g. index.html) | +----- products.html | +----- menubar.png • Language home pages are also at the root and | +----- fr have a language identifier in their name. | +----- about.html (e.g. index_fr.html) | +----- products.html | +----- menubar.png • Other pages have identical names across | +----- common languages, but are in different language +----- logo.png directories. +----- background.jpg

Web Internationalization Slide 150 Web Internationalization Slide 151

Internationalization and Unicode Conference 35 Tex Texin, XenCraft Web Internationalization – Standards and Practice

Directories and Files Directories and Files

• Allow search engines to retrieve meaningful • Use cookies if you want to remember the information (but emphasis for the main preferred language of the user and redirect language). him/her to the relevant set of files. • Maximize the use of relative URLs (no link • Use common directory for shared files. change, except to the home page). • Use meaningful directory and file names. If scripting is available, you can have the links • Avoid translating directory and file names. resolved at run-time. – However, this hurts SEO. • Allows room for locale-specific content if • Treat the source language just like another necessary. language as much as possible.

Web Internationalization Slide 152 Web Internationalization Slide 153

Language Selection Good Practices – IDs

• List box of language names in native language IDs are VERY useful for re-use of translation, – Make sure characters display correctly (fonts) and for re-use of text across documents. – Graphics are always displayed correctly. – in HTML IDs can be set for all elements containing text, except the element. – Make sure to provide an ID attribute for the • Destination Choice translatable elements of your XML vocabularies, so – The same page in the new it can be utilized for re-use, leveraging, etc. language. – The main page in the new language. (for country-specific sites, etc.)</p><p>Web Internationalization Slide 154 Web Internationalization Slide 155</p><p>Internationalization and Unicode Conference 36 Tex Texin, XenCraft Web Internationalization – Standards and Practice</p><p>Good Practices – Attributes Good Practices – Embedded Data</p><p>When creating new XML vocabularies: Avoid Data that are not text content (e.g. scripts, SQL using attributes for storing translatable text. queries, etc.).</p><p>– Impossible to add needed bidi tags in an attribute. – Keep them outside of the document if possible – Cause segmentation issues in many tools. (e.g. using include mechanisms). – Much more difficult to have metadata for attributes – At least, make sure elements with such data are than for elements. identified for the localizer (who might need to – You cannot set different languages for two apply a process different than for the rest of the attributes in the same element. document content). – More tricky to set unique IDs for attributes. – Internationalize your scripts/queries/etc.</p><p>Web Internationalization Slide 156 Web Internationalization Slide 157</p><p>Good Practices – Use Style-sheets Good Practices – CDATA Sections</p><p>• Separate the function of a term (a title, a link, Avoid CDATA sections if possible. an important term) from its display (bolded, underlined, in 12-points Courier, etc.) – Translation tools do not handle CDATA well. – Type of display for the target language(s) may be – Keeping track on inline CDATA leads to different than for the source language. meaningless inline codes in segments (and can affect leveraging). – Force author/developer to think about the structure – NCRs are not allowed in CDATA. This may cause of the document. problems if the document is converted to an • Avoid <br/> -like elements when possible: encoding where some characters need to be written Use styles to format, not tags. as NCRs. By the way: CDATA does NOT preserve spaces.</p><p>Web Internationalization Slide 158 Web Internationalization Slide 159</p><p>Internationalization and Unicode Conference 37 Tex Texin, XenCraft Web Internationalization – Standards and Practice</p><p>Additional Resources Conclusions</p><p>• W3C Internationalization Work Group 1. Web technologies are among the best http://www.w3.org/International ways to store, manipulate and represent • Unicode in XML and other Markup Languages data in different languages. http://www.w3.org/TR/unicode-xml • Character Model for the World Wide Web http://www.w3.org/TR/charmod 2. Implementation of Web standards is • Richard Ishida‟s paper on “Localisation incomplete and inconsistent Considerations in DTD Design” http://www.w3.org/People/Ishida/writing.html#dtd • XML Internationalization FAQ http://www.opentag.com/xmli18nfaq.htm</p><p>Web Internationalization Slide 160 Web Internationalization Slide 161</p><p>Acknowledgements</p><p>• Parts of this presentation are based on “Weaving the multilingual Web: Standards and their implementations” by François Yergeau and Martin Dürst, given at previous Unicode conferences. • Yves Savourel (ENLASO Corporation) created the test programs and the initial versions of the best practices content. • Thanks to Richard Ishida and Martin Dürst for their extensive review.</p><p>Web Internationalization Slide 162</p><p>Internationalization and Unicode Conference 38 Tex Texin, XenCraft</p> </div> </article> </div> </div> </div> <script type="text/javascript" async crossorigin="anonymous" src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-8519364510543070"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.1/jquery.min.js" crossorigin="anonymous" referrerpolicy="no-referrer"></script> <script> var docId = '79dda7190f896bf477bd78776ddae0f8'; var endPage = 1; var totalPage = 38; var pfLoading = false; window.addEventListener('scroll', function () { if (pfLoading) return; var $now = $('.article-imgview .pf').eq(endPage - 1); if (document.documentElement.scrollTop + $(window).height() > $now.offset().top) { pfLoading = true; endPage++; if (endPage > totalPage) return; var imgEle = new Image(); var imgsrc = "//data.docslib.org/img/79dda7190f896bf477bd78776ddae0f8-" + endPage + (endPage > 3 ? ".jpg" : ".webp"); imgEle.src = imgsrc; var $imgLoad = $('<div class="pf" id="pf' + endPage + '"><img src="/loading.gif"></div>'); $('.article-imgview').append($imgLoad); imgEle.addEventListener('load', function () { $imgLoad.find('img').attr('src', imgsrc); pfLoading = false }); if (endPage < 7) { adcall('pf' + endPage); } } }, { passive: true }); </script> <script> var sc_project = 11552861; var sc_invisible = 1; var sc_security = "b956b151"; </script> <script src="https://www.statcounter.com/counter/counter.js" async></script> </html>