Learning XML.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Learning XML Erik T. Ray First Edition, January 2001 ISBN: 0-59600-046-4, 368 pages XML (Extensible Markup Language) is a flexible way to create "self-describing data" - and to share both the format and the data on the World Wide Web, intranets, and elsewhere. In Learning XML, the author explains XML and its capabilities succinctly and professionally, with references to real-life projects and other cogent examples. Learning XML shows the purpose of XML markup itself, the CSS and XSL styling languages, and the XLink and XPointer specifications for creating rich link structures. Release Team[oR] 2001 Preface 1 What's Inside Style Conventions Examples Comments and Questions Acknowledgments 1 Introduction 5 1.1 What Is XML ? 1.2 Origins of XML 1.3 Goals of XML 1.4 XML Today 1.5 Creating Documents 1.6 Viewing XML 1.7 Testing XML 1.8 Transformation 2 Markup and Core Concepts 25 2.1 The Anatomy of a Document 2.2 Elements: The Building Blocks of XML 2.3 Attributes: More Muscle for Elements 2.4 Namespaces: Expanding Your Vocabulary 2.5 Entities: Placeholders for Content 2.6 Miscellaneous Markup 2.7 Well-Formed Documents 2.8 Getting the Most out of Markup 2.9 XML Application: DocBook 3 Connecting Resources with Links 60 3.1 Introduction 3.2 Specifying Resources 3.3 XPointer: An XML Tree Climber 3.4 An Introduction to XLinks 3.5 XML Application: XHTML 4 Presentation: Creating the End Product 88 4.1 Why Stylesheets? 4.2 An Overview of CSS 4.3 Rules 4.4 Properties 4.5 A Practical Example 5 Document Models: A Higher Level of Control 119 5.1 Modeling Documents 5.2 DTD Syntax 5.3 Example: A Checkbook 5.4 Tips for Designing and Customizing DTD s 5.5 Example: Barebones DocBook 5.6 XML Schema: An Alternative to DTD s 6 Transformation: Repurposing Documents 156 6.1 Transformation Basics 6.2 Selecting Nodes 6.3 Fine-Tuning Templates 6.4 Sorting 6.5 Example: Checkbook 6.6 Advanced Techniques 6.7 Example: Barebones DocBook 7 Internationalization 206 7.1 Character Sets and Encodings 7.2 Taking Language into Account 8 Programming for XML 215 8.1 XML Programming Overview 8.2 SAX: An Event-Based API 8.3 Tree-Based Processing 8.4 Conclusion A Resources 235 A.1 Online A.2 Books A.3 Standards Organizations A.4 Tools A.5 Miscellaneous B A Taxonomy of Standards 241 B.1 Markup and Structure B.2 Linking B.3 Searching B.4 Style and Transformation B.5 Programming B.6 Publishing B.7 Hypertext B.8 Descriptive/Procedural B.9 Multimedia B.10 Science Glossary 252 Colophon 273 The arrival of support for XML - the Extensible Markup Language - in browsers and authoring tools has followed a long period of intense hype. Major databases, authoring tools (including Microsoft's Office 2000), and browsers are committed to XML support. Many content creators and programmers for the Web and other media are left wondering, "What can XML and its associated standards really do for me?" Getting the most from XML requires being able to tag and transform XML documents so they can be processed by web browsers, databases, mobile phones, printers, XML processors, voice response systems, and LDAP directories, just to name a few targets. In Learning XML, the author explains XML and its capabilities succinctly and professionally, with references to real-life projects and other cogent examples. Learning XML shows the purpose of XML markup itself, the CSS and XSL styling languages, and the XLink and XPointer specifications for creating rich link structures. The basic advantages of XML over HTML are that XML lets a web designer define tags that are meaningful for the particular documents or database output to be used, and that it enforces an unambiguous structure that supports error-checking. XML supports enhanced styling and linking standards (allowing, for instance, simultaneous linking to the same document in multiple languages) and a range of new applications. For writers producing XML documents, this book demystifies files and the process of creating them with the appropriate structure and format. Designers will learn what parts of XML are most helpful to their team and will get started on creating Document Type Definitions. For programmers, the book makes syntax and structures clear It also discusses the stylesheets needed for viewing documents in the next generation of browsers, databases, and other devices. Learning XML Preface Since its introduction in the late 90s, Extensible Markup Language (XML) has unleashed a torrent of new acronyms, standards, and rules that have left some in the Internet community wondering whether it is all really necessary. After all, HTML has been around for years and has fostered the creation of an entirely new economy and culture, so why change a good thing? The truth is, XML isn't here to replace what's already on the Web, but to create a more solid and flexible foundation. It's an unprecedented effort by a consortium of organizations and companies to create an information framework for the 21st century that HTML only hinted at. To understand the magnitude of this effort, we need to clear away some myths. First, in spite of its name, XML is not a markup language; rather, it's a toolkit for creating, shaping, and using markup languages. This fact also takes care of the second misconception, that XML will replace HTML. Actually, HTML is going to be absorbed into XML, and will become a cleaner version of itself, called XHTML. And that's just the beginning, because XML will make it possible to create hundreds of new markup languages to cover every application and document type. The standards process will figure prominently in the growth of this information revolution. XML itself is an attempt to rein in the uncontrolled development of competing technologies and proprietary languages that threatens to splinter the Web. XML creates a playground where structured information can play nicely with applications, maximizing accessibility without sacrificing richness of expression. XML's enthusiastic acceptance by the Internet community has opened the door for many sister standards. XML's new playmates include stylesheets for display and transformation, strong methods for linking resources, tools for data manipulation and querying, error checking and structure enforcement tools, and a plethora of development environments. As a result of these new applications, XML is assured a long and fruitful career as the structured information toolkit of choice. Of course, XML is still young, and many of its siblings aren't quite out of the playpen yet. Some of the subjects discussed in this book are quasi-speculative, since their specifications are still working drafts. Nevertheless, it's always good to get into the game as early as possible rather than be taken by surprise later. If you're at all involved in web development or information management, then you need to know about XML. This book is intended to give you a birds-eye view of the XML landscape that is now taking shape. To get the most out of this book, you should have some familiarity with structured markup, such as HTML or TeX, and with World Wide Web concepts such as hypertext linking and data representation. You don't need to be a developer to understand XML concepts, however. We'll concentrate on the theory and practice of document authoring without going into much detail about writing applications or acquiring software tools. The intricacies of programming for XML are left to other books, while the rapid changes in the industry ensure that we could never hope to keep up with the latest XML software. Nevertheless, the information presented here will give you a decent starting point from which to jump in any direction you want to go with XML. page 1 Learning XML What's Inside The book is organized into the following chapters: Chapter 1 is an overview of XML and some of its common uses. It's a springboard to the rest of the book, I ntroducing the main concepts that will be explained in detail in following chapters. Chapter 2 describes the basic syntax of XML, laying the foundation for understanding XML applications and technologies. Chapter 3 shows how to create simple links between documents and resources, an important aspect of XML. Chapter 4 introduces the concept of stylesheets with the Cascading Style Sheets language. Chapter 5 covers document type definitions (DTDs) and introduces XML Schema. These are the major techniques for ensuring the quality and completeness of documents. Chapter 6 shows how to create a transformation stylesheet to convert one form of XML into another. Chapter 7 is an introduction to the accessible and international side of XML, including Unicode, character encodings, and language support. Chapter 8 gives you an overview of writing software to process XML. In addition, there are two appendixes and a glossary: Appendix A contains a bibliography of resources for learning more about XML. Appendix B lists technologies related to XML. The Glossary explains terms used in the book. page 2 Learning XML Style Conventions Items appearing in the book are sometimes given a special appearance to set them apart from the regular text. Here's how they look: Italic Used for citations to books and articles, commands, email addresses, URLs, filenames, emphasized text, and first references to terms. Constant width Used for literals, constant values, code listings, and XML markup. Constant width italic Used for replaceable parameter and variable names. Constant width bold Used to highlight the portion of a code listing being discussed. Examples The examples from this book are freely downloadable from the book's web site at http://www.oreilly.com/catalog/learnxml.