XML and Semantic Web Technologies
XML and Semantic Web Technologies
II. XML / 1. Unicode, URIs, and XML Syntax
Lars Schmidt-Thieme
Information Systems and Machine Learning Lab (ISMLL) Institute of Economics and Information Systems & Institute of Computer Science University of Hildesheim http://www.ismll.uni-hildesheim.de
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 1/42 XML and Semantic Web Technologies
II. XML / 1. Unicode, URIs, and XML Syntax
1. Unicode
2. Uniform Resource Identifiers (URIs)
3. XML Syntax
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 1/42 XML and Semantic Web Technologies / 1. Unicode Semantic Web Layer Cake
Figure 1: Semantic Web Layers (Berners-Lee).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 1/42 XML and Semantic Web Technologies / 1. Unicode Coded Character Sets
name codes examples ASCII code 0–127 64 A 7→ ISO-8859-1, ISO-LATIN-1 0–255 0–127 as ASCII, 196 7→ ISO-8859-7 0–255 0–127 as ASCII, 225 α 7→ Unicode 0–(232 1) 0–255 as ISO-8859-1 −
Unicode is organized in 256 groups à 256 planes à 256 rows à 256 cells.
Plane 0 (codes 0–65535) is called basis multilingual plane (BMP).
Non ISO-8859-1 characters are mapped to higher codes, e.g., 945 α. 7→
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 2/42 XML and Semantic Web Technologies / 1. Unicode Unicode
Assigned characters of the Unicode standard (v6.0.0, 2011) can be found at http://www.unicode.org/charts/.
Unicode also specifies character classes for each charac- ter, as letters (capital and small), • digits, • punctuation, • control characters. •
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 3/42 XML and Semantic Web Technologies / 1. Unicode Unicode / Scripts
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 4/42 XML and Semantic Web Technologies / 1. Unicode Unicode / Symbols
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 5/42 XML and Semantic Web Technologies / 1. Unicode Character Encoding Schemata
coded character codes character byte characters character set (natural numbers) encoding schema sequences
Character Encoding Schemata are trivial for 1-byte coded character sets.
Direct representations of Unicode: UCS-2: direct representation of codes 0–65535 with 2 bytes. UCS-4: direct representation of all codes with 4 bytes.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 6/42 XML and Semantic Web Technologies / 1. Unicode
Drawbacks of direct representations: bytecode 0x00 occurs (that marks string endings in • C), e.g., in UCS-4: A 65 (0, 0, 0, 65) 7→ 7→
uniform blow-up of storage space, but most texts • mostly use ASCII or ISO-8859-1.
error-prone, as if one byte is lost, all following data will • be decoded incorrectly.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 7/42 XML and Semantic Web Technologies / 1. Unicode Unicode Transformation Formats (UTF)
Unicode Transformation Formats (UTF) use a variable number of bytes for coding a character. UTF-8: 0x00-0x7f (bit sequences 0...... ) code ASCII characters directly, 0xc0-0xfd (bit sequences 11...... ) mark the start of a multi-byte character repre- sentation (and code its length and leading bits of its code), 0x80-0xbf (bit sequences 10...... ) code continuations of multi-byte character representations, 0xfe, 0xff (bit sequences 1111111.) are not used.
bit sequence bytes free bits character codes 0...... 1 7 0x00– 0x7f 110..... 10...... 2 5 + 6 = 11 0x80– 0x7ff 1110.... 10...... 10...... 3 4 + 2 6 = 16 0x800– 0xffff · 11110... 10...... 10...... 10...... 4 3 + 3 6 = 21 0x10000– 0x1fffff · 111110.. 10...... 10...... 10...... 10...... 5 2 + 4 6 = 26 0x200000– 0x3ffffff · 1111110. 10...... 10...... 10...... 10...... 10...... 6 1 + 5 6 = 31 0x4000000–0xffffffff ·
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 8/42 XML and Semantic Web Technologies
II. XML / 1. Unicode, URIs, and XML Syntax
1. Unicode
2. Uniform Resource Identifiers (URIs)
3. XML Syntax
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 9/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Uniform Resource Identifiers (URIs)
URIs are used to identify resources.
Example:
http://www.ismll.uni-hildesheim.de/lehre/xml-09s/index.html
URIs are defined in RFC 3986 (01/2005).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 9/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs)
Generic URI syntax
/FE1!G/ IH
"!$# % '&)(*",+-.!0/132*(4!05687*/ BD D
¦=<>?<>
¡ ¡ ¤ ¤
>Q
¢¡£¡¥¤§¦ ¡
M7ON P .!G O# N R
L
AWV
SUT
9BAC:
@
:-
@KJ
9¢¨;:
¨ ©
| {z } | {z } | {z } | {z } | {z }
:- 9BA-X:3V J*T
| {z }
¨ © ZY[¨ F©FX\?© A£:
@ @KJ
| {z }
¤¦¢¨ ! ¡¦"¢$#&%(' £¢*)!+ -,.¢/ ' 102%43 ,
¡£¢¥¤§¦©¨
56 879 ;:< =?>6 @:BADCFE
£ | {z } | {z }
Figure 6: Typical parts of URIs.
Generic URI syntax: URI := scheme : scheme-specific-part h i h i h i
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 10/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Hierarchical URIs
An URI is called hierarchical iff scheme-specific-part :=( // authority [ path ] h i h i h i | path )[ ? query ][ # fragment h i h i h i ] path :=( / path-segment )+ h i h i otherwise its called opaque.
The path-segments . and .. have special meaning: context path and parent path.
A hierarchical URI is called server-based iff authority :=[ userinfo @ ] host [ : port ] h i h i h i h i otherwise it is called registry-based.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 11/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Fragment identifiers
Fragment identifiers are used to identify parts of the resource identified by an URI.
Example: http://www.informatik.uni-freiburg.de/xml/books.html#R03
1
2
3
4 XML und Datenmodellierung, 2004.
5
6 Learning XML, 2003.
7
8 Figure 7: HTML document at http://www.informatik.uni-freiburg.de/xml/books.html.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 12/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Relative (hierarchical) URIs A relative URI is defined as: relativeURI ::=( // authority [ path ] h i h i h i | path h i | relativePath )[ ? query ] h i h i relativePath := path-segment ( / path-segment )* h i h i h i .------. | .------. | | | .------. | | | | | .------. | | | | | | | .------. | | | | | | | | |
Figure 8: A Base URI is the context for resolving relative URIs [RFC 2396]. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 13/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) URI schemes URI schemes are managed by Internet Assigned Numbers Authority (IANA).
Scheme Name Description Reference Type ftp File Transfer Protocol RFC 1738 server-based http Hypertext Transfer Protocol RFC 2616 server-based mailto Electronic mail address RFC 2368 opaque file Host-specific file names RFC 1738 server-based pop Post Office Protocol v3 RFC 2384 server-based dav dav RFC 2518 server-based tel telephone RFC 2806 opaque https Hypertext Transfer Protocol Secure RFC 2818 server-based urn Uniform Resource Names RFC 2141 opaque . . . .
66 URI schemes (as of 2009-04-06; http://www.iana.org/assignments/uri-schemes.html). URI registrations are regulated by RFC 4396 (2/2006). Example: tel:+(49)-761-203-8164
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 14/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) URI types by URI semantics
Uniform Resource Identifier (URI)
Uniform Resource Uniform Resource Uniform Resource Locator (URL) Name (URN) Characteristics (URC)
Figure 9: URI types.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 15/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Uniform Resource Names (URNs) URNs are special kinds of URIs that map other namespaces into URN-space, • are required to remain globally unique and persistent • (even when the resource ceases to exist or becomes unavailable). have scheme urn. •
URN := urn: namespace : namespace-specific-part h i h i h i
Examples: urn:isbn:0-395-36341-1 urn:newsml:reuters.com:20000206:IIMFFH05643_2000-02-06_17-54-01_L06156584:1U
A book or a news item (identified by an URN) may be retrieved from different locations (URLs).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 16/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs)
URN Namespaces Value Reference ietf 1 RFC 2648 pin 2 RFC 3043 issn 3 RFC 3044 oid 4 RFC 3061 newsml 5 RFC 3085 oasis 6 RFC 3121 xmlorg 7 RFC 3120 publicid 8 RFC 3151 isbn 9 RFC 3187 nbn 10 RFC 3188 web3d 11 RFC 3541 mpeg 12 RFC 3614 mace 13 RFC 3613 fipa 14 RFC 3616 swift 15 RFC 3615 . . . 40 URN namespaces (as of 2008-12-09; http://www.iana.org/assignments/urn-namespaces) Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 17/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Characters Allowed in URIs
In URIs only some characters may be used literally in non-syntactic parts ("data").
All others have to be escaped using their code (in some character encoding):
dataChars := alphanum | - | _ | . | ! | ~ | * | ’ | ( | ) h i h i escapedChar := % hexDigit hexDigit h i h i h i
Codes have been interpreted as codes in different character encodings, depending on the URI scheme.
UTF-8 is recommended by RFC 2718 and already used by some schemes (e.g., urn, imap, pop).
Example: http://www.informatik.uni-freiburg.de/login.jsp?name=Hans%20Meyer
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 18/42 XML and Semantic Web Technologies / 2. Uniform Resource Identifiers (URIs) Internationalized Resource Identifiers (IRIs)
IRIs allow more characters to be used literally (RFC 3987; 01/2005).
In IRIs only data characters that can be misinterpreted as syntactic characters and • some bidirectional formatting characters • have to be escaped.
All other data characters are used literally (in some character encoding, e.g., UTF-8).
Example: http://www.informatik.uni-freiburg.de/login.jsp?name=Hans%20Müller
Schemes still are restricted to US ASCII characters.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 19/42 XML and Semantic Web Technologies
II. XML / 1. Unicode, URIs, and XML Syntax
1. Unicode
2. Uniform Resource Identifiers (URIs)
3. XML Syntax
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 20/42 XML and Semantic Web Technologies / 3. XML Syntax W3C development process
W3C specifications are called Recommendations.
Stages of W3C recommendations: completion date stage XML 1.0 XML 1.1 Working Draft 1996/11/14 2001/12/13 1997/11/17 Last Call Working Draft 2002/04/25 Candidate Recommendation 2002/10/15 Proposed Recommendation 1997/12/08 2003/11/05 Recommendation 1998/02/10 2004/04/15 Working Draft 2000/08/14 Recommendation (2nd edition) 2000/10/06 2006/08/16 Proposed Edited Recommendation 2003/10/30 Recommendation (3rd edition) 2004/02/04 Recommendation (4th edition) 2006/08/16 Recommendation (5th edition) 2008/11/26
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 20/42 XML and Semantic Web Technologies / 3. XML Syntax
Every XML document consists of a prolog and a single element, called root ele- ment.
document := prolog element ( Comment | PI | S )* h i h i h i h i h i h i
prolog := h i ( Comment | PI | S )* h i h i h i ( DoctypeDecl ( Comment | PI | S )* )? h i h i h i h i In all productions matching " can be replaced by ’. • = may be surrounded by spaces (i.e., match S ? = S ?). • h i h i
S := (#x20 | #x9 | #xD | #xA)+ h i
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 21/42 XML and Semantic Web Technologies / 3. XML Syntax A minimal XML document
1
2
In XML 1.1 the version attribute is mandatory.
If the version attribute is missing, version 1.0 is assumed.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 22/42 XML and Semantic Web Technologies / 3. XML Syntax Elements and Attributes
element := emptyElementTag h i h i | STag content ETag h i h i h i emptyElementTag := < Name ( S Name = " AttValue " )* S ? /> h i h i h i h i h i h i STag := < Name ( S Name = " AttValue " )* S ? > h i h i h i h i h i h i ETag := h i h i h i Name s h i start with a unicode letter or _ • (: is also allowed, but used for namespaces). may contain unicode letters, uncode digits, -, ., or . • · A wellformed document requires, that start and end tag of each element match, • that for each tag the same attribute never occurs twice. •
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 23/42 XML and Semantic Web Technologies / 3. XML Syntax Not-wellformed Documents (1/2)
1
2
3
4
5
6
7
8
9
10
11
12 Figure 11: More than one root element.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 24/42 XML and Semantic Web Technologies / 3. XML Syntax Not-wellformed Documents (2/2)
1
2
3
4
5
6 Figure 12: Miss-nested elements.
1
2
3
4
5 Figure 13: Same attribute occurs twice.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 25/42 XML and Semantic Web Technologies / 3. XML Syntax Element content
The contents of an element can be made up from 6 different things: 1. other elements, 2. Character data, 3. References, 4. CDATA sections, 5. Processing instructions, and 6. comments.
content := CharData ? h i h i (( element | Reference | CDSect | PI | Comment ) h i h i h i h i h i CharData ? )* h i
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 26/42 XML and Semantic Web Technologies / 3. XML Syntax Character data
CharData may contain any characters except h i <, &, or the sequence >]]
Attribute values may not contain ", if delimited by ", • ’, if delimited by ’, •
These characters can be expressed by references.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 27/42 XML and Semantic Web Technologies / 3. XML Syntax Character data
1
2
3 x^2 = y has no real solution for y < 0.
4 But there are solutions for y = 0 & for y > 0.
5 Figure 14: Forbidden characters in character data.
1
2
3 x^2 = y has no real solution for y < 0.
4 But there are solutions for y = 0 & for y > 0.
5 Figure 15: Using references in character data.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 28/42 XML and Semantic Web Technologies / 3. XML Syntax References
Reference := EntityRef | CharRef h i h i h i CharRef := [0-9]+ ; h i | [0-9a-fA-F]+ ; EntityRef := & Name ; h i h i
There are five predefined entity references: < > & ' " < > & ’ "
All other entities known from HTML (as ä) are not predefined in XML.
Custom entities can be defined in the document type declaration.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 29/42 XML and Semantic Web Technologies / 3. XML Syntax CDATA sections
CDATA sections allow the literal usage of all characters (except the sequence ]]>).
CDSect := h i h i
CDATA sections are typically used for longer text containing < or &.
CDATA sections are flat, i.e., there is no possibility to structure them with elements (as < or & are interpreted literally).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 30/42 XML and Semantic Web Technologies / 3. XML Syntax Character data and CDATA sections
1
2
3 x^2 = y has no real solution for y c; 0.
4 But there are solutions for y = 0 for y e; 0.
5 Figure 16: Using numeric character references.
1
2 3 x^2 = y has no real solution for y < 0. 4 But there are solutions for y = 0 & for y > 0. 5 ]]>
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 31/42 XML and Semantic Web Technologies / 3. XML Syntax Attribute values
1
2
3
4
5 Figure 18: Literal usage of attribute delimiter.
1
2
3
4
5 Figure 19: Using different attribute delimiters.
1
2
3
4
5 Figure 20: Using references in attribute values. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 32/42 XML and Semantic Web Technologies / 3. XML Syntax Comments
Comments can occur in the prolog and in the contents of elements.
Comments are not allowed to contain the character sequence --.
Comment := h i h i
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 33/42 XML and Semantic Web Technologies / 3. XML Syntax Comments
1
2
3
4
5
6
7
8
9
10
11 Figure 21: Comments in the prolog and in the contents of elements.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 34/42 XML and Semantic Web Technologies / 3. XML Syntax
1
2
3
4
5
6
7 Figure 22: Comments in tags are not allowed.
1
2
3
4
5
6
7
8
9
10 Figure 23: -- is not allowed in comments. Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 35/42 XML and Semantic Web Technologies / 3. XML Syntax Processing Instructions
Processing instructions (PIs) allow documents to contain instructions for applica- tions.
PI := h i h i h i h i
The name of a PI must be different from xml.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 36/42 XML and Semantic Web Technologies / 3. XML Syntax Character encoding schemata
Character encoding schemata are specified by the name they are registered with at IANA (http://www.iana.org/assignments/character-sets), e.g., US-ASCII ISO-8859-1 ISO-10646-UCS-2 or csUnicode (UCS2) ISO-10646-UCS-4 or csUCS4 (UCS4) UTF-8 UTF-16 ...
If no encoding is specified in the XML declaration, UTF-8 is assumed.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 37/42 XML and Semantic Web Technologies / 3. XML Syntax Character encoding schemata
1
2
3 Grüß Gott !
4 Figure 24: Non-wellformed document (assumed to be ISO-8859-1 coded).
1
2
3 Grüß Gott !
4 Figure 25: XML document coded in ISO-8859-1.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 38/42 XML and Semantic Web Technologies / 3. XML Syntax Language and Whitespaces
There are two predefined attributes, xml:lang • and
xml:space, • that can be used with any element. xml:lang specifies the language of the character contents of elements and at- tributes with (RFC 3066) an ISO language code • (http://www.loc.gov/standards/iso639-2/langcodes.html)
or
an IANA language code • (http://www.iana.org/assignments/language-tags).
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 39/42 XML and Semantic Web Technologies / 3. XML Syntax Language Attribute
Example ISO and IANA language codes: language code meaning source de ISO German de-CH ISO German, Swiss variant de-DE ISO German, German variant en ISO English en-US ISO US English en-GB ISO Britain English tlh ISO Klingon de-1901 IANA German, traditional orthography de-1996 IANA German, orthography of 1996 . . .
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 40/42 XML and Semantic Web Technologies / 3. XML Syntax Language Attribute
1
2
3
Guten Morgen!
4
Good morning!
5
USD | 0 | 1 | ... |
EUR | 0 | 0.839818 | ... |
9 Figure 26: Language attribute.
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 41/42 XML and Semantic Web Technologies / 3. XML Syntax References
Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany, Course on XML and Semantic Web Technologies, summer term 2012 42/42