The Internet Is the World's Largest Experiment in Anarchy
Total Page:16
File Type:pdf, Size:1020Kb
data:image/s3,"s3://crabby-images/0b7e7/0b7e7a68958178a10e97fd809e4adf8e091b3d91" alt="The Internet Is the World's Largest Experiment in Anarchy"
Client/Server, Web Style The Internet is the world's largest experiment in anarchy. — Eric Schmidt, CEO Novell The Web builds client/server order on top of what Novell's Eric Schmidt calls "the world's largest experiment in anarchy." In its enormously popular first incarnation, the Web is simply a global hypertext system. Hypertext is a software mechanism that links documents to other related documents on the same machine or across networks. The linked document can itself contain links to other documents, and this can go on forever. A link can also point to other external resources such as image files, sound clips, or executable programs. The Web could eventually link every document produced on this planet. The beauty of the Web is in its simplicity. The Web client/server model achieves its intergalactic reach by using highly portable protocols on top of TCP/IP. Portability, platform-independence, and content-independence are emphasized at every level—they're the mantras of the Web architects. So what makes hundreds of thousands of distributed servers behave like a single application? This magic is created by introducing four new technologies on top of the existing Internet infrastructure: graphical Web browsers, the HTTP RPC, HTML-tagged documents, and the URL global naming convention. We explain what all this means in the next sections. The Web Protocols: How They Play Together In many ways, the Web is the mother of all data warehouses. — Richard Hackathorn (October, 1997) The first-generation Web applications were mostly read-only and static, which means that they were simple. By dealing with read-only documents, Web designers were able to avoid the thorny issues of distributed multiserver application design—such as security, transactions, and synchronized updates. A Web server simply returns documents when clients ask for them by name. Because it is so simple and useful, the Web was able to mushroom. Although most of us don't think of the Web in this way, it has become the most visible demonstration of the power of intergalactic client/server computing today. The first-generation Web applications are built on the following technologies and protocols:
The Internet is the global backbone. The Web achieves its global reach by using the Internet as its backbone. The Internet is the world's largest public network. It consists of over 100,000 interconnected networks that span across 70 countries. These networks extend deep inside corporations as well as people's homes. The number of networks is expected to double each year. The beauty of the Web is that its application protocols are Internet-ready—they run on top of TCP/IP. In less than five years, the Web has become the graphical user interface of the Internet. Today, most people that access the Internet use the Web as their front-end. The Web subsumes most of the existing Internet application protocols—including the Simple Mail Transfer Protocol (SMTP), Telnet, File Transfer Protocol (FTP), Network News Transfer Protocol (NNTP), and Gopher. The Internet is the private backbone. The Web model of client/server is also the foundation for private corporate intranets. In many cases, intranets are becoming the new enterprise networks. URLs are used to globally name and access all Web resources. The Unified Resource Locator (URL) protocol provides a consistent intergalactic naming scheme to identify all Web resources—including documents, images, sound clips, and programs. URLs fully describe where a resource lives and how to get to it. URLs support the newer Web protocols—such as HTTP—as well as older Internet protocols such as FTP, Gopher, WAIS, and News. HTTP is used to retrieve URL-named resources. The Web provides an RPC-like protocol—called the Hypertext Transfer Protocol (HTTP)—for accessing resources that live in URL space. HTTP is a stateless RPC that 1) establishes a client/server connection, 2) transmits and receives parameters including a returned file, and 3) breaks the client/server connection. HTTP clients and servers use the Internet's MIME data representations to describe (and negotiate) the contents of messages. HTML is used to embed hyperlinks and to describe the logical structure of Web documents. The Hypertext Markup Language (HTML) is the lingua franca of Web documents. The Web is a gigantic collection of HTML documents linked together to form the world's largest file server. A Web document—or page—is a plain ASCII text file with embedded HTML commands. The HTML commands—also known as tags—are used to describe the structure of a document, provide font and graphics information, and define hyperlinks to other Web pages and Internet resources. HTML documents live in HTTP servers—also known as Web servers. The beauty of HTML is that it is simple and portable. In addition, server programs can easily generate HTML-tagged text files in response to client requests. Web browsers are universal clients. A Web browser is a minimalist client that interprets information it receives from a server, and displays it graphically to a user. The client is simply there to interpret the server's commands and render the contents of an HTML page to a user. The browser executes the HTML commands to properly display text and images on a specific GUI platform; it also navigates from one page to another using the embedded hypertext links. HTTP servers produce platform-independent content that clients can then request. A server does not know a PC client from a Mac client—all Web clients are created equal in the eyes of their Web servers. Browsers are there to take care of all the platform-specific details. So this is the Web client/server story in a nutshell. The devil is in the details, which we cover next. If you don't care about the details then just read the next section and move on. You'll at least know the names of the key protocols and client/server pieces. This alone should make you quite dangerous. Your First Web Client/Server Interaction The model we see emerging is a universal client, able to navigate the local network or the Internet and to go to any application at any point in time. — Marc Andreessen Figure 26-2 shows how the Web client and server pieces play together: Figure 26-2: A Web Client/Server Interaction.
1. You select a target URL. A Web client/server interaction starts when you specify a target URL from within your Web browser. You do this either by clicking on a hypertext link, picking a URL off a list, or by explicitly typing in the URL (this is typically your last resort). 2. Browser sends an HTTP request to server. The browser takes the URL you specified, embeds it inside an HTTP request, and then sends it to the target server. 3. Server comes to life and processes the request. On the receiving side, the HTTP server spins in a loop, waiting for requests to arrive on its well-known port (the default port is 80 for HTTP). The incoming request causes a socket connection to be established between the client and the server. The server receives the client's message, finds the requested HTML file, ships it back to the client along with some status information, and then closes the connection. 4. Browser interprets the HTML commands and displays the page contents. The browser displays a status indicator while waiting to receive the requested URL. When it finally receives the URL, the browser looks at the type. If it's an HTML file, it interprets the tags and displays the contents in its window. Otherwise, it invokes a plug-in application that's associated with a particular resource type and hands it the returned file. The helper displays the contents in its own window or in an agreed-upon area within the browser's window. For example, most browsers don't know what to do with a video file, so they hand it to a video player—the helper—which then plays the movie in a separate window. This primitive client/server interaction accounts for over 90% of today's Web transactions. In the next chapter, we go over the CGI protocols and browser forms and explain how they extend the basic model. But before moving to the next chapter, we must first cover the details of the URL, HTTP, and HTML protocols. You must understand how these protocols work before you tackle CGI, forms, and Web objects.
So What Exactly Is a URL? Information wants to be free—free of platform, output destination, formatting instructions, proprietary fife formats, and physical location. — Art Fuller A URL provides a general-purpose naming scheme for specifying Internet resources using a string of printable ASCII characters. The printable characters enable you to send URLs in mail messages, print them on your business card, or display them on billboards. A typical URL consists of four parts (see Figure 26-3):
Figure 26-3: The URL Structure.
The protocol scheme tells the Web browser which Internet protocol to use when accessing a resource on a server. In addition to HTTP, URL supports all major Internet protocols—including Gopher, FTP, News, Mailto, and WAIS. HTTP is the Web's native protocol; it points to Web pages and server programs. Gopher is a precursor to the Web; it displays information on servers as a hierarchy of menus. FTP is the oldest Internet protocol for retrieving files. News is a discussion-group protocol that lets you specify a newsgroup or article. Mailto lets you send mail to a designated e-mail address. WAIS lets you specify the domain name of a target database to be searched as well as a list of search criteria. Of course, the protocol scheme that you choose will directly affect the interpretation of path information within the URL. The server name is usually an Internet-host domain name that identifies the site on which the server is running. Note that you can also use numeric IP addresses and include an optional user name or password. However, numeric IP addresses are hard to remember. Also, putting a password in a URL is not a secure way to access a resource. The port number identifies a program that runs on a particular server. You explicitly specify a port number after a server name using a colon (:) as the separator. If you do not specify a port number, the browser will direct the call to a well-known port. For example, HTTP resources are on port 80, Gopher uses port 70, FTP files are on port 21, and so on. The path to a target resource starts with the forward slash (/) after the host and port number. The interpretation of this field varies depending on the resource you access. The most common representation consists of a set of directory paths that lead to a particular file. The World's Shortest HTML Tutorial Personal Web pages are the '90s equivalent of home video, except that you don't have to visit someone else's house to fall asleep—you can do so in the comfort of your own home. — Ray Valdes HTML is an elaborate protocol, and we can't possibly cover it completely in the next few pages. Instead, we will hit on some of the highlights of HTML to give you a feeling for what it does. We also cover some of HTML's more advanced features in the next two chapters. These include forms for data entry, tables, dynamic HTML, and the new OBJECT tag. If you get a sinking feeling in the next few sections that HTML—to quote from Yogi Berra—is "like deja vu all over again," you're probably right. The technology is definitely late-70s retro. HTML is rooted in the ISO SGML standard, which has been around for quite some time. However, bear in mind that retro is a small price to pay to achieve universality. HTML was designed to be fully portable across every conceivable GUI, operating system, CPU architecture, and file system. You can print and view HTML documents on machines that have barebones displays; it's truly a minimalist protocol. How To Mark Up Text in HTML An HTML document is an ordinary text file whose appearance is controlled by magical tags that are embedded within the text. Whenever you want to highlight some text—for example, an italicized word or a link to another page—you place either a single tag or a pair of tags around it. Tags are non-case-sensitive commands surrounded by angle brackets. A tag pair consists of a command, then some text, and finally the inverse command—represented as the command with a slash in front of it. The first tag in the command pair applies the command while the second tag turns it off. This means that the command only applies to the text enclosed within tag pairs. Figure 26-4 shows some of the more common HTML text markup tags and the output they produce on a typical Web browser. Note that we use the
tag to force line breaks.
Figure 26-4: HTML: Visual Markup Examples.
The General Structure of an HTML Document Right now, people just want to connect easily. Once that's accomplished, they'll see that content is what it's about. — Christine Comaford Figure 26-5 shows the structure of an HTML document. The structure primarily is there to help a browser understand how a document is organized. A well-structured HTML document begins with and ends with . In addition, every HTML document should have a header section at the top, bracketed by
and tags, and a body section bracketed by and tags. The header contains information that describes the contents of the body, its title, URL, and whether the document is searchable. A Web, browser displays the text marked off byThe
and tags contain the portion of the document that you see displayed within the client area of a Web browser's window. This is where you view an HTML document's contents. HTML lets you further structure a document's body using a hierarchy of headings that can be nested up to six levels deep. You specify the headings in descending order using the tags through . When a Web browser encounters a heading tag, it terminates the current paragraph and displays the heading text using a left-aligned visually distinct font. Figure 26-6 shows a screen capture of a particular Web browser's rendition of the HTML in Figure 26-5.
Figure 26-6: A Web Browser's Rendition of My Document's HTML.
How To Structure the Flow of Text in an HTML Document You typically break the text within a document into paragraphs that you place under various headings. You use the
tag to indicate a new paragraph;
to draw a horizontal line; and
to introduce a block of preformatted style text—such as a program listing or a table. Note thatand
are singleton tags—they do not require or pairs. Figure 26-7 is an example of an HTML document that uses the,
, andtags. Figure 26-8 shows how this document is rendered by a typical Web browser. Notice that all the text betweentags is, reflowed by the Web browser to fit the window size, fonts, and general shape of the document. Most Web browsers use one line height's worth of white space to set each paragraph apart from the next. The horizontal rule is drawn with some amount of white space above and below it. You should use horizontal rules whenever possible. They help create a uniform appearance and are also much more efficient to transfer over the network than a bitmap or a line's worth of underscore characters. Figure 26-7: HTML for "The Essential Distributed Objects."
Figure 26-8: A Web Browser's Rendition of "The Essential Distributed Objects" HTML.
Finally, note that the text marked off by
andis not very attractive, but it respects the layout we specified in the HTML. We basically told the browser that thetext is off limits. Consequently, the browser can't reflow the text, make it pretty, or use an attractive proportional font to display it. Instead, it displays the text "as is," using a nonproportional font that maintains the white space, tabs, carriage returns, and strings of ASCII spaces.tags are the only way to present tabular information on browsers that do not support HTML 3.2's table tags. We cover table tags in the next chapter. HTML Lists HTML lets you specify a variety of list format types in your text-including unordered lists using the
- tag; ordered lists using the
- tag; definition lists using the
- tag; directory lists using the