XML Tutorial Description
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to XML Tutorial Description With your HTML knowledge, you have a solid foundation for working with markup languages. However, unlike HTML, XML is more flexible, Bebo White allowing for custom tag creation. This course [email protected] introduces the fundamentals of XML and its related technologies so that you can create your own markup language. InterLab 2006 FermiLab October 2006 Topics* What Is Markup? • XML well-formed documents • Information added to a text to make its structure • Validation concepts comprehensible • DTD syntax and constructs • Pre-computer markup (punctuational and presentational) • W3C Schema syntax and constructs • Word divisions • XSL(T) syntax and processing • Punctuation • XPath addressing language • Copy-editor and typesetters marks • Development and design considerations • Formatting conventions • XML processing model • XML development and processing tools * Tutorial plus references Computer Markup (1/3) Computer Markup (2/3) • Any kind of codes added to a document • Declarative markup (cont) • Typesetting (presentational markup) • Names and structure • Macros embedded in ASCII • Framework for indirection • Commands to define the layout • Finer level of detail (most human-legible signals are • MS Word, TeX, RTF, Scribe, Script, nroff, etc. overloaded) • *Hello* Æ Hello • Independent of presentation (abstract) ••/Hello//Hello/ Æ Hello • Often called “semantic” • Declarative markup • HTML (sometimes) ••XMLXML Computer Markup (3/3) Markup – ISO-Definitions • Semantic Markup • Markup – Text that is added to the data of a • Authors put annotations into their texts to help the document in order to convey information about it publisher to understand what type of text this is (e.g. • Descriptive Markup – Markup that describes “this is a heading”) the structure and other attributes of a document • Annotations are agreed between author and publisher in a non-system-specific way, independently of • Publisher decides on the layout any processing that may be performed on it • Descriptive markup • Processing Instruction (PI) – Markup • Describing content not the layout consisting of system-specific data that controls • Markup to support search in documents how a document is to be processed • Words in headings are more important than in footnotes • Markup for machines vs. markup for humans Markup Language Features • Stylistic (appearance) • <I><B><U> • Structural (layout) Hypertext Markup • <P><BR><H2> Language (HTML) • Semantic (meaning) ••<TITLE><TITLE> • <META NAME=keywords CONTENT = " …... " > • Functional (action) • <BLINK> • <A HREF = "[link]">Click here</A> Hypertext Markup Language Some Problems (1/2) • HTML – The Markup Language used to represent Web pages for viewing by people • Rendered and viewed in a Web Browser • Not extensible • Documents • Easy to write – Markup your data with tags • Platform independent • Can contain links to Images, documents, and other pages • HTML is an application/instance of SGML (Standard Generalized Markup Language, ISO 8879:1986 – used for defining Markup Languages) • For further information: http://www.w3.org/MarkUp/ Some Problems (2/2) Observations on HTML <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> • Powerful for Presentation (Focus on Client-Side) <html><head> <title>The Some Problems Example</title> • Cascading Style Sheets (CSS) </head><body> • Allows for dynamic behavior using scripting/ DHTML <H1>Separation Of Concerns</h1> There are a lot of problems using • Allows for proprietary extension (ActiveX, plug-ins, HTML for <WebEngineering>Web Application development</WebEngineering>,, etc.). ifif youyou dodo notnot separateseparate concerns.concerns. <P> • Easy to write and generate, but: The <b>Bold</b> and <i>Italic</i>talic</i> example:example: <br><br> • Difficult to parse While rendering <b> is easy nowadays.<i> The semantic of this markup </b> is not </i> clear. • No support for extending semantics, e.g. using your own tags </BODY></HTML> • REMEMBER: Do not develop AApplicationspplications in this manner! • Difficult to apply disciplined approaches XML (1/2) • The eXtensible Markup Language • XML is a universal format for structured eXtensible Markup documents and data on the Web • XML is a standard, interoperable way to Language (XML) describe data for flexible processing • Multi-format delivery • Schema-aware information retrieval • Transformation and dynamic data customization • Archival: standardized, self-describing XML (2/2) XML History • http://www.w3.org/XML/ • 1996 Development started • XML looks like markup (e.g., HTML) but in this • 1997 Public Drafts context the interpretation of data is the job of the application • E.g. Provided in paper form at WWW6, Santa Clara, CA • XML tags/elements/attributes are not predefined • XML uses a Document Type Definition (DTD) or • February, 1998 W3C REC an XML Schema to describe data • Based on experience: simplified form of SGML • XML with a DTD or XML Schema is designed to • XML derived from SGML – both are used for be self-descriptive defining Markup Languages • XML = 80% of SGML´s capabilities, 20% of SGML´s complexity The W3C Standards* Process XML Facts • World Wide Web Consortium (W3C) • Important for Web development because it • Development is organized into WGs. removes two constraints: • Dependence on a single, inflexible Document • Working Group (~10) - set agenda /decide Type (HTML); • Special Interest Group (~100) - • The complexity of full SGML, whose syntax discuss/recommend allows many powerful but hard-to-program • W3C members (~500) - vote options • W3C Director (TimBL) - may veto • XML was not designed to do anything • The public--comment on public WDs; • XML is free and extensible adopt/reject • XML complements (not replaces) HTML XML and HTML XML Characteristics • XML was designed to “carry” data • Well-Formed – An XML document is well- • Two different goals: formed if it complies to the following rules: • Elements have an open and close tag: • XML – describe data and focus on what it is <tag>content</tag> • HTML – display data and focus on how it looks • Empty elements are closed by “ / ” e.g. <emptyelem/> • Attribute values are quoted • Valid – An XML document is well-formed and if its content conforms to the rules in its document type definition or schema • Validity allows an application to make sure the XML data is complete, is formated properly, and has appropriate attribute values. The Two Worlds of XML The Two Worlds United • Markup of documents: the original • Documents and “semi-structured” data share • This perspective is our focus here features • Document representation was the primary problem • Hierarchical structure XML was created to solve • String content • Data exchange and protocol design • Variations in structure • XML turned out to fill important gaps • Their applications also share needs • Relational databases needed a way to share records • Need for a lingua franca, independent of APIs and multi-table data • Ability to cope with international characters • Protocol designers wanted a way to encapsulate • “Fit” with WWW and HTTP. structured data XML is More General Better Rendering than HTML • Tags label arbitrary information units • Fully internationalized • More suited to multiple purposes • Also better for visually-impaired users • “Looking right” is needed but not enough • Supports multiple renderings • Supports custom information structures • Customize to the user, time, situation, device • If you have “price” or “procedure”, you can make a • Separates formatting from structure tag for it, and validate its usage • And processing other than rendering • Can support many different information models • Large documents don’t break it • E.g., molecular models, vector graphics, etc. • Easy to trade off server/client work • More “teeth” to enforce consistent syntax • Artificial “next tiny bit” links no longer necessary • Works hard to avoid semi-interoperable docs • No searches that fail because big doc was split • XHTML is XML-conforming flavor of HTML • Clean existing HTML is already close... XML Treats Documents like Databases XML Example • XML brings benefits of DBs to documents • A way of representing information • Schema to model information directly • XML documents (application of XML) are • Formal validation, locking, versioning, rollback... composed of elements and attributes ••ButBut <?xml version="1.0“encoding="ISO-8859- • Not all traditional database concepts map cleanly, order 1” ?> because documents are fundamentally different in <order OrderID="10643"> item some ways <item> room <room id=“Room10"/> item </item> <item> room <room id=“Room11"/> OrderDate </item> <OrderDate price ts="2005-10-17T00:00:00"/> 200.00 dollars <price>200.00 dollars</price> </order> What is Structure When Structure is Essential • To Relational Database theorists, structure is: • Large scale data • Tables with fixed sets of non-repeating named fields, • Data with individual parts you care about that have little internal structure • (like price-tag, tool-list, citation, author,...) • E-R diagrams with fixed number of nodes • Need for good navigation tools • Structured documents are different: • Mission-critical information • The order of SECs, Ps, etc. matters (a lot) • Many hierarchical layers (which text crosses) • Information that must last • Text/graphic data mixes with aggregate objects • Multi-author publishing process • Optional or repeatable sub-parts abound • Multiple delivery media • Interaction with natural language phenomena • These are very different