Appendix A: Introduction to XHTML
Total Page:16
File Type:pdf, Size:1020Kb
Appendix A: Introduction to XHTML HTML (HyperText Markup Language) is the authoring language created for the World Wide Web, to support the distribution of information over the Internet by means of Web pages. Authoring of an HTML document requires only a primitive text editor. To prepare a document for presentation to anyone in the world, it must be stored on a Web server which associates the document with an address and responds to page requests from software applications called Web clients or user agents . The user agent commonly used by humans for browsing the Web and for accessing resources is called a Web browser . An HTML document is a structured document consisting of three components: the HTML version information, the header , and the body . The header contains the title of the page and possibly some metadata. Web browsers usually show the title as the window title. The body includes the content intended to be displayed in the browser window as the Web page. The structure of the Web page is indicated by means of textual elements including headings at several levels, paragraphs, various types of lists, and tables. Multimedia features enable pages’ authors to include images, video clips, other HTML pages, and other objects in their pages. For exam- ple, a page may include interactivity provided by software applications called applets . The language includes hyperlinks (or links ) so that pages can be connected to each other and to other kinds of digital information resources. A link has a direction and two ends called anchors , a source anchor and a destination anchor. The example below creates two links: one from the document to the description of the p element in the HTML 4 specifi cation, another from the end of the document to the top of the same document. Element a is used both for the source and destination anchors. A. Salminen and F. Tompa, Communicating with XML, 207 DOI 10.1007/978-1-4614-0992-2, © Springer Science+Business Media, LLC 2011 208 Appendix A: Introduction to XHTML <html><head> <title>Simple html example</title></head> <body> <h3><a name="beginning">Example with two links</a></h3> <p> A paragraph on a Web page is delimited by the element <a href="http://www.w3.org/TR/html401/struct/text.html#h-9.3.1">p</a>.</p> <p><a href="#beginning">top</a></p> </body> </html> XHTML (Extensible HyperText Markup Language) provides the same capabili- ties as HTML 4, but instead of being an SGML application, XHTML is an XML application. This means that XHTML pages follow the rules of XML and may thus be processed using XML software. The XHTML 1.0 specifi cation defi nes three language variants, each correspond- ing to one of the three variants of HTML 4: Strict, Transitional, and Frameset. Strict is the variant that does not include styling capabilities but is instead based on the idea that the styling information is provided in style sheets. Strict is the variant that is recommended for most uses. Transitional includes style attributes to support leg- acy pages from browser environments, where style attributes were commonly used. The Frameset variant includes the description of frames. The example above could be a piece of markup in either an HTML or an XHTML document. However, HTML allows the same information to be provided in ways that are not accepted in XHTML. For example, the end-tags of the p ele- ments could be omitted and both upper case and lower case letters could be used in element and attribute names. In XHTML, like in all XML documents, all ele- ment and attribute names must be in lower case, and end-tags are required in all non-empty elements. An important feature in XHTML is that the specifi cation is modularized so that various XHTML languages can be defi ned from the standard building blocks for different types of devices. Thus XHTML is not a single language with three vari- ants, but instead a family of current and future languages and modules that reformu- late, subset, and extend HTML 4. The document XHTML Modularization defi nes sets of elements in modules so that arbitrary module combinations can be used, and probably extended, to defi ne XHTML conforming languages for various platforms. This is intended to enable easy exchange of data across platforms. The core modules must be present in any language considered to be a language in the XHTML family. These modules contain the basic structural and text elements, as well as the anchor element a enabling the creation of links. For example, all elements in the previous example are included in the core modules. In addition, special mod- ules are available for images, frames, forms, and scripting, among other features. As an example, XHTML 1.1 defi nes XHTML 1.0 Strict using modules. Alternatively, XHTML Basic is designed especially for small appliances, such as mobile phones, car navigation systems, and digital book readers, to replace the vari- ous HTML variants that have been designed for those devices. Appendix B: History of XML The history leading to the development of XML is summarized in Table B.1 , and some milestones in the development of XML are depicted in Table B.2 . The story behind these two tables is provided by the narrative in this appendix, which is divided into four parts: Origins of the Internet , Origins of SGML , From the Internet to the World Wide Web , and From SGML to XML . A list of historical readings, which have also served as our main information sources, can be found at the end of the appendix. B.1 Origins of the Internet Computer networking started in the United States in the 1960s, at the time of the Cold War. The Soviet Union’s success in launching Sputnik into space in the previous decade activated technological research and development in U.S. In 1958 the U.S. Department of Defense established a new organization called ARPA (the Advanced Research Projects Agency, which was subsequently called DARPA, the Defense Advanced Research Projects Agency) to support research on computer networking. The ARPANET network was created in 1969, interconnecting four computers at four universities by the end of that year. At fi rst ARPANET was used for fi le transfer and remote computing. To activate collaborative development of technical specifi cations, a practice called RFC (Request for Comments) was initiated. This practice was based on open publication of specifi cations and an open request for comments. Besides ARPANET, several other computer networks were developed in the 1970s. Their architectures and connection protocols differed from those of the ARPANET. The idea of “Internetworking Architecture,” a network of intercon- nected networks with various architectures, evolved. To implement this idea, a new transmission protocol was needed. TCP/IP (Transmission Control Protocol / Internet Protocol) was developed for this purpose, and it formed the underpinnings of today’s Internet. This activated wide interest in further development of network technologies in many countries. Because the researchers needed effective means of communication to support collaboration, electronic mail was created. 209 210 Appendix B: History of XML B.2 Origins of SGML Automated text processing of documents was well-established by the 1960s [20], and the problems with embedded formatting instructions were soon identifi ed. In a meeting of the Canadian Government Printing Offi ce in 1967, William Tunnicliffe pointed out the importance of separating information content in documents from their format. The idea of “generic document coding” was an important feature in the language Charles Goldfarb, Ed Mosher, and Ray Lorie designed at IBM. In his article in the ACM SIGPLAN/SIGOA conference in 1981, Charles Goldfarb adopted the term Generalized Markup Language (GML) for the language developed at IBM and the term descriptive markup for the document coding style in GML. In the same article Goldfarb emphasized the importance of rigorous markup if the processing of sets of documents of certain types were to be automated. The work towards standardized rules for descriptive markup started in a commit- tee of the International Organization for Standardization (ISO). The fi rst chair of the committee was William Tunnicliffe, and the concrete development work was done primarily by Charles Goldfarb. The work resulted in the Standard Generalized Markup Language (SGML), accepted as ISO standard 8879 in 1986. B.3 From the Internet to the World Wide Web Even though the U.S. Department of Defense was the principal force behind the Internet, the technical specifi cation documents were public and development work was open for university researchers worldwide. Therefore networking soon expanded to other countries. The World Wide Web (WWW) was introduced in 1991, and that same year the Internet Society was organized to coordinate the development of the Internet. WWW was a hypermedia application designed by Tim Berners-Lee and Robert Cailliau for CERN (the European Organization for Nuclear Research) to support collaboration among the physicists working around the world. The information published in the World Wide Web was accessible throughout the Internet via documents linked to each other by hyperlinks. The system was based on a client–server architecture and incorporated three major technological innovations: HTML (HyperText Markup Language) as the presentation language for documents and hyperlinks, HTTP (Hypertext Transfer Protocol) as the rules for exchanging documents between computers, and URI (Uniform Resource Identifi er) as the notation for identifying objects uniquely throughout the Internet. In 1992, more than one million computers were connected to the Internet. WWW proved to be extremely convenient, and its adoption expanded rapidly. New kinds of businesses evolved, based on the connectivity of people and the connectivity of software applications all over the world.