Document Type Definition (DTD) Appendix B

1628 Document Type Definition (DTD) Appendix B Outline B B.1 Introduction B.2 Parsers, Well-Formed and Valid XML Documents B.3 Document Type Declaration Document Type B.4 Element Type Declarations B.4.1 Sequences, Pipe Characters and Occurrence Indicators Deﬁnition (DTD) B.4.2 EMPTY, Mixed Content and ANY B.5 Attribute Declarations B.6 Attribute Types B.6.1 Tokenized Attribute Type (ID, IDREF, ENTITY, NMTOKEN) B.6.2 Enumerated Attribute Types B.7 Conditional Sections B.8 Whitespace Characters Objectives B.9 Internet and World Wide Web Resources • To understand what a DTD is. • To be able to write DTDs. Summary • Terminology • Self-Review Exercises • Answers to Self-Review Exercises • To be able to declare elements and attributes in a DTD. • To understand the difference between general entities B.1 Introduction and parameter entities. • To be able to use conditional sections with entities. In this appendix, we discuss Document Type Definitions (DTDs), which define an XML document’s structure (e.g., what elements, attributes, etc. are permitted in the document). • To be able to use NOTATIONs. An XML document is not required to have a corresponding DTD. However, DTDs are of- • To understand how an XML document’s whitespace ten recommended to ensure document conformity, especially in business-to-business is processed. (B2B) transactions, where XML documents are exchanged. DTDs specify an XML docu- To whom nothing is given, of him can nothing be required. ment’s structure and are themselves defined using EBNF (Extended Backus-Naur Form) Henry Fielding grammar—not the XML syntax introduced in Appendix A. Like everything metaphysical, the harmony between thought and reality is to be found in the grammar of the language. Ludwig Wittgenstein Grammar, which knows how to control even kings. B.2 Parsers, Well-Formed and Valid XML Documents Molière Parsers are generally classified as validating or nonvalidating. A validating parser is able to read a DTD and determine whether the XML document conforms to it. If the document conforms to the DTD, it is referred to as valid. If the document fails to conform to the DTD but is syntactically correct, it is well formed, but not valid. By definition, a valid document is well formed. A nonvalidating parser is able to read the DTD, but cannot check the document against the DTD for conformity. If the document is syntactically correct, it is well formed. In this appendix, we use a Java program we created to check a document conformance. This program, named Validator.jar, is located in the Appendix B examples directory. Validator.jar uses the reference implementation for the Java API for XML Pro- cessing 1.1, which requires crimson.jar and jaxp.jar. © 2002 by Prentice-Hall, Inc. © 2002 by Prentice-Hall, Inc. Appendix B Document Type Definition (DTD) 1629 1630 Document Type Definition (DTD) Appendix B B.3 Document Type Declaration B.4 Element Type Declarations DTDs are introduced into XML documents using the document type declaration (i.e., Elements are the primary building blocks used in XML documents and are declared in a DOCTYPE). A document type declaration is placed in the XML document’s prolog (i.e., all DTD with element type declarations (ELEMENTs). For example, to declare element lines preceding the root element), begins with <!DOCTYPE and ends with >. The docu- myMessage, we might write ment type declaration can point to declarations that are outside the XML document (called the external subset) or can contain the declaration inside the document (called the internal <!ELEMENT myElement ( #PCDATA )> subset). For example, an internal subset might look like The element name (e.g., myElement) that follows ELEMENT is often called a generic identifier. The set of parentheses that follow the element name specify the element’s al- <!DOCTYPE myMessage [ <!ELEMENT myMessage ( #PCDATA )> lowed content and is called the content specification. Keyword PCDATA specifies that the ]> element must contain parsable character data. These data will be parsed by the XML parser, therefore any markup text (i.e., <, >, &, etc.) will be treated as markup. We will discuss The first myMessage is the name of the document type declaration. Anything inside the content specification in detail momentarily. the square brackets ([]) constitutes the internal subset. As we will see momentarily, ELE- MENT and #PCDATA are used in “element declarations.” Common Programming Error B.1 External subsets physically exist in a different file that typically ends with the.dtd Attempting to use the same element name in multiple element type declarations is an error. 0.0 extension, although this file extension is not required. External subsets are specified using either keyword the keyword SYSTEM or the keyword PUBLIC. For example, the DOC- Figure B.1 lists an XML document that contains a reference to an external DTD in the TYPE external subset might look like DOCTYPE. We use Validator.jar to check the document’s conformity against its DTD. <!DOCTYPE myMessage SYSTEM "myDTD.dtd"> The document type declaration (line 6) specifies the name of the root element as MyMessage. The element myMessage (lines 8–10) contains a single child element which points to the myDTD.dtd document. The PUBLIC keyword indicates that the DTD named message (line 9). is widely used (e.g., the DTD for HTML documents). The DTD may be made available in Line 3 of the DTD (Fig. B.2) declares element myMessage. Notice that the content well-known locations for more efficient downloading. We used such a DTD in Chapters 9 specification contains the name message. This indicates that element myMessage con- and 10 when we created XHTML documents. The DOCTYPE tains exactly one child element named message. Because myMessage can have only an element as its content, it is said to have element content. Line 4, declares element message <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" whose content is of type PCDATA. "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> uses the PUBLIC keyword to reference the well-known DTD for XHTML version 1.0. Common Programming Error B.2 XML parsers that do not have a local copy of the DTD may use the URL provided to down- Having a root element name other than the name specified in the document type declaration load the DTD to perform validation. is an error. 0.0 Both the internal and external subset may be specified at the same time. For example, If an XML document’s structure is inconsistent with its corresponding DTD, but is the DOCTYPE syntactically correct, the document is only well formed—not valid. Figure B.3 shows the messages generated when the required message element is omitted. <!DOCTYPE myMessage SYSTEM "myDTD.dtd" [ <!ELEMENT myElement ( #PCDATA )> ]> 1 <?xml version = "1.0"?> contains declarations from the myDTD.dtd document, as well as an internal declaration. 2 3  Software Engineering Observation B.1 4  5 The document type declaration’s internal subset plus its external subset form the DTD. 0.0 6 <!DOCTYPE myMessage SYSTEM "welcome.dtd"> 7 8 <myMessage> Software Engineering Observation B.2 9 <message>Welcome to XML!</message> The internal subset is visible only within the document in which it resides. Other external 10 </myMessage> documents cannot be validated against it. DTDs that are used by many documents should be placed in the external subset. 0.0 Fig. B.1 XML document declaring its associated DTD. © 2002 by Prentice-Hall, Inc. © 2002 by Prentice-Hall, Inc. Appendix B Document Type Definition (DTD) 1631 1632 Document Type Definition (DTD) Appendix B 1  Occurrence Indicator Description 2  3 <!ELEMENT myMessage ( message )> 4 <!ELEMENT message ( #PCDATA )> Plus sign ( + ) An element can appear any number of times, but must appear at least once (i.e., the element appears one or more times). C:\>java -jar Validator.jar welcome.xml Asterisk ( * ) An element is optional, and if used, the element can appear any Document is valid. number of times (i.e., the element appears zero or more times). Question mark ( ? ) An element is optional, and if used, the element can appear only Fig. B.2 Validation by using an external DTD. once (i.e., the element appears zero or one times). Fig. B.4 Occurrence indicators. 1 <?xml version = "1.0"?> 2 3  A plus sign indicates one or more occurrences. For example, 4  <!ELEMENT album ( song+ )> 5 6 <!DOCTYPE myMessage SYSTEM "welcome.dtd"> specifies that element album contains one or more song elements. 7 The frequency of an element group (i.e., two or more elements that occur in some com- 8  9 <myMessage> bination) is specified by enclosing the element names inside the content specification with 10 </myMessage> parentheses, followed by either the plus sign, asterisk or question mark. For example, <!ELEMENT album ( title, ( songTitle, duration )+ )> C:\>java -jar Validator.jar welcome-invalid.xml error: Element "myMessage" requires additional elements. indicates that element album contains one title element followed by any number of songTitle/duration element groups. At least one songTitle/duration group Fig. B.3 Invalid XML document. must follow title, and in each of these element groups, the songTitle must precede the duration. An example of markup that conforms to this is <album> B.4.1 Sequences, Pipe Characters and Occurrence Indicators <title>XML Classical Hits</title> DTDs allow the document author to define the order and frequency of child elements. The <songTitle>XML Overture</songTitle> comma (,)—called a sequence—specifies the order in which the elements must occur. For <duration>10</duration> example, <songTitle>XML Symphony 1.0</songTitle> <duration>54</duration> <!ELEMENT classroom ( teacher, student )> </album> specifies that element classroom must contain exactly one teacher element followed which contains one title element followed by two songTitle/duration groups.

Document Type Definition (DTD) Appendix B

XML a New Web Site Architecture

Rdfa in XHTML: Syntax and Processing Rdfa in XHTML: Syntax and Processing

Semantic Web

Defining Data with DTD Schemas

HTML 4.01 Specification

Document Type Definition: Structure

XHTML+Rdfa 1.1 - Third Edition Table of Contents

Fuzzy GML Modeling Based on Vague Soft Sets

Graduate Project GML and Complex Features 21-E4dc1/020 2/66 19/10/2001 -2A16-DED4.Doc

Chapter 7. XML

XBRL: a New Tool for Electronic Financial Reporting

Doctype Must Be Declared First