Document Type Definition (DTD) Appendix B Appendix B Document Type Definition (DTD) 1629

1628 Document Type Definition (DTD) Appendix B Appendix B Document Type Definition (DTD) 1629 B.3 Document Type Declaration Outline DTDs are introduced into XML documents using the document type declaration (i.e., B B.1 Introduction DOCTYPE). A document type declaration is placed in the XML document’s prolog (i.e., all B.2 Parsers, Well-Formed and Valid XML Documents lines preceding the root element), begins with <!DOCTYPE and ends with >. The docu- B.3 Document Type Declaration ment type declaration can point to declarations that are outside the XML document (called the external subset) or can contain the declaration inside the document (called the internal Document Type B.4 Element Type Declarations subset). For example, an internal subset might look like B.4.1 Sequences, Pipe Characters and Occurrence Indicators Deﬁnition (DTD) B.4.2 EMPTY, Mixed Content and ANY <!DOCTYPE myMessage [ <!ELEMENT myMessage ( #PCDATA )> B.5 Attribute Declarations ]> B.6 Attribute Types The first myMessage is the name of the document type declaration. Anything inside B.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) the square brackets ([]) constitutes the internal subset. As we will see momentarily, ELE- B.6.2 Enumerated Attribute Types MENT and #PCDATA are used in “element declarations.” B.7 Conditional Sections External subsets physically exist in a different file that typically ends with the.dtd extension, although this file extension is not required. External subsets are specified using B.8 Whitespace Characters Objectives either keyword the keyword SYSTEM or the keyword PUBLIC. For example, the DOC- B.9 Internet and World Wide Web Resources TYPE external subset might look like • To understand what a DTD is. Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • To be able to write DTDs. <!DOCTYPE myMessage SYSTEM "myDTD.dtd"> • To be able to declare elements and attributes in a DTD. which points to the myDTD.dtd document. The PUBLIC keyword indicates that the DTD • To understand the difference between general entities is widely used (e.g., the DTD for HTML documents). The DTD may be made available in B.1 Introduction well-known locations for more efficient downloading. We used such a DTD in Chapters 9 and parameter entities. and 10 when we created XHTML documents. The DOCTYPE • To be able to use conditional sections with entities. In this appendix, we discuss Document Type Definitions (DTDs), which define an XML document’s structure (e.g., what elements, attributes, etc. are permitted in the document). • To be able to use NOTATIONs. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" An XML document is not required to have a corresponding DTD. However, DTDs are of- • To understand how an XML document’s whitespace "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ten recommended to ensure document conformity, especially in business-to-business is processed. (B2B) transactions, where XML documents are exchanged. DTDs specify an XML docu- uses the PUBLIC keyword to reference the well-known DTD for XHTML version 1.0. To whom nothing is given, of him can nothing be required. ment’s structure and are themselves defined using EBNF (Extended Backus-Naur Form) XML parsers that do not have a local copy of the DTD may use the URL provided to down- Henry Fielding grammar—not the XML syntax introduced in Appendix A. load the DTD to perform validation. Like everything metaphysical, the harmony between thought Both the internal and external subset may be specified at the same time. For example, and reality is to be found in the grammar of the language. the DOCTYPE Ludwig Wittgenstein Grammar, which knows how to control even kings. B.2 Parsers, Well-Formed and Valid XML Documents <!DOCTYPE myMessage SYSTEM "myDTD.dtd" [ <!ELEMENT myElement ( #PCDATA )> Molière Parsers are generally classified as validating or nonvalidating. A validating parser is able ]> to read a DTD and determine whether the XML document conforms to it. If the document conforms to the DTD, it is referred to as valid. If the document fails to conform to the DTD contains declarations from the myDTD.dtd document, as well as an internal declaration. but is syntactically correct, it is well formed, but not valid. By definition, a valid document Software Engineering Observation B.1 is well formed. The document type declaration’s internal subset plus its external subset form the DTD. 0.0 A nonvalidating parser is able to read the DTD, but cannot check the document against the DTD for conformity. If the document is syntactically correct, it is well formed. In this appendix, we use a Java program we created to check a document conformance. Software Engineering Observation B.2 This program, named Validator.jar, is located in the Appendix B examples directory. The internal subset is visible only within the document in which it resides. Other external Validator.jar uses the reference implementation for the Java API for XML Pro- documents cannot be validated against it. DTDs that are used by many documents should be cessing 1.1, which requires crimson.jar and jaxp.jar. placed in the external subset. 0.0 © 2002 by Prentice-Hall, Inc. © 2002 by Prentice-Hall, Inc. © 2002 by Prentice-Hall, Inc. 1630 Document Type Definition (DTD) Appendix B Appendix B Document Type Definition (DTD) 1631 1632 Document Type Definition (DTD) Appendix B B.4 Element Type Declarations 1  Occurrence Indicator Description Elements are the primary building blocks used in XML documents and are declared in a 2  DTD with element type declarations (ELEMENTs). For example, to declare element 3 <!ELEMENT myMessage ( message )> 4 <!ELEMENT message ( #PCDATA )> myMessage, we might write Plus sign ( + ) An element can appear any number of times, but must appear at least once (i.e., the element appears one or more times). <!ELEMENT myElement ( #PCDATA )> C:\>java -jar Validator.jar welcome.xml Asterisk ( * ) An element is optional, and if used, the element can appear any Document is valid. number of times (i.e., the element appears zero or more times). The element name (e.g., myElement) that follows ELEMENT is often called a generic Question mark ( ? ) An element is optional, and if used, the element can appear only identifier. The set of parentheses that follow the element name specify the element’s al- Fig. B.2 Validation by using an external DTD. once (i.e., the element appears zero or one times). lowed content and is called the content specification. Keyword PCDATA specifies that the element must contain parsable character data. These data will be parsed by the XML parser, therefore any markup text (i.e., <, >, &, etc.) will be treated as markup. We will discuss Fig. B.4 Occurrence indicators. the content specification in detail momentarily. 1 <?xml version = "1.0"?> 2 A plus sign indicates one or more occurrences. For example, Common Programming Error B.1 3  4  Attempting to use the same element name in multiple element type declarations is an error. 0.0 <!ELEMENT album ( song+ )> 5 6 <!DOCTYPE myMessage SYSTEM "welcome.dtd"> specifies that element album contains one or more song elements. Figure B.1 lists an XML document that contains a reference to an external DTD in the 7 The frequency of an element group (i.e., two or more elements that occur in some com- DOCTYPE. We use Validator.jar to check the document’s conformity against its DTD. 8  9 <myMessage> bination) is specified by enclosing the element names inside the content specification with The document type declaration (line 6) specifies the name of the root element as 10 </myMessage> parentheses, followed by either the plus sign, asterisk or question mark. For example, MyMessage. The element myMessage (lines 8–10) contains a single child element named message (line 9). <!ELEMENT album ( title, ( songTitle, duration )+ )> C:\>java -jar Validator.jar welcome-invalid.xml Line 3 of the DTD (Fig. B.2) declares element myMessage. Notice that the content error: Element "myMessage" requires additional elements. indicates that element album contains one title element followed by any number of specification contains the name message. This indicates that element myMessage con- songTitle/duration element groups. At least one songTitle/duration group tains exactly one child element named message. Because myMessage can have only an Fig. B.3 Invalid XML document. must follow title, and in each of these element groups, the songTitle must precede element as its content, it is said to have element content. Line 4, declares element message the duration. An example of markup that conforms to this is whose content is of type PCDATA. <album> Common Programming Error B.2 B.4.1 Sequences, Pipe Characters and Occurrence Indicators <title>XML Classical Hits</title> Having a root element name other than the name specified in the document type declaration DTDs allow the document author to define the order and frequency of child elements. The <songTitle>XML Overture</songTitle> is an error. 0.0 comma (,)—called a sequence—specifies the order in which the elements must occur. For <duration>10</duration> example, If an XML document’s structure is inconsistent with its corresponding DTD, but is <songTitle>XML Symphony 1.0</songTitle> syntactically correct, the document is only well formed—not valid. Figure B.3 shows the <duration>54</duration> messages generated when the required message element is omitted. <!ELEMENT classroom ( teacher, student )> </album> specifies that element classroom must contain exactly one teacher element followed which contains one title element followed by two songTitle/duration groups. The asterisk (*) character indicates an optional element that, if used, can occur any number 1 <?xml version = "1.0"?> by exactly one student element.

Document Type Definition (DTD) Appendix B Appendix B Document Type Definition (DTD) 1629

XML a New Web Site Architecture

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

Semantic Web

Defining Data with DTD Schemas

HTML 4.01 Specification

Document Type Definition: Structure

XHTML+Rdfa 1.1 - Third Edition Table of Contents

Fuzzy GML Modeling Based on Vague Soft Sets

Graduate Project GML and Complex Features 21-E4dc1/020 2/66 19/10/2001 -2A16-DED4.Doc

Chapter 7. XML

XBRL: a New Tool for Electronic Financial Reporting

Doctype Must Be Declared First