Infrastructure Architecture Essentials, Part 5: Content Delivery and Distribu
Total Page:16
File Type:pdf, Size:1020Kb
Infrastructure architecture essentials, Part 5: Content delivery and distribu... http://www.ibm.com/developerworks/library/ar-infraarch5/ Infrastructure architecture essentials, Part 5: Content delivery and distribution network design Concepts and techniques Sam Siewert ( [email protected] ), Principal Software Architect/Adjunct Professor, University of Colorado Summary: Discover the methods for content delivery and distribution of Web-based media in the Web 2.0 world. Date: 11 Nov 2008 Level: Intermediate PDF: A4 and Letter (96KB | 10 pages) Get Adobe® Reader® Activity: 361 views Comments: 0 ( Add comments ) Average rating (based on 1 vote) The concept of Web caches has existed as long as the Web and evolved from storage of frequently accessed files on an individual's personal computer or proxy server to Internet-based Web cache servers provided by companies as a paid subscription service for content providers. As multimedia has become more prevalent on the Web, content delivery networks (CDNs) have become a critical component of the Internet and an enabler of Web 2.0 applications like IPTV, mobile Web and TV devices, and content-rich Web databases such as Wikimedia. Frequently used acronyms ACL: Access control list GUI: Graphical user interface HTML: Hypertext Markup Language HTTP: Hypertext Transfer Protocol I/O: Input/output ISP: Internet service provider RAID: Redundant array of independent disks SOA: Service-Oriented Architecture UDP: User Datagram Protocol URI: Uniform resource identifier XML: Extensible Markup Language Continuing the Infrastructure architecture essentials series, this fifth article provides an overview of CDNs and distribution networks. It shows how they have evolved from simple Web caches for optimizing access to content on the Web to much more sophisticated and even intelligent content-management systems. Most of us who have used the Web since the beginnings of Web browsing (circa the early 1990s) recall that browsers have always had the option to cache frequently or recently accessed Web content. By the late 1990s, many new Web-based companies initiated Internet-based Web cache servers as a paid subscription service for Web 1 of 9 8/22/2009 6:50 PM Infrastructure architecture essentials, Part 5: Content delivery and distribu... http://www.ibm.com/developerworks/library/ar-infraarch5/ content providers. This network of cache servers became known as a content distribution network and has vastly improved user Web content access performance. The basic distribution services have evolved significantly since to include security for content providers and multimedia streaming and has subsequently become known as a content delivery network, with the focus shifted not just from distribution but also to delivery services for rich content, including video, audio, and multimedia databases. For a Web services architect designing a content-delivery system today, the scope of a CDN system is daunting both in scale and in the diversity of services, media types, and access performance these systems will be expected to provide. One of the stated goals of Web 2.0 (the second-generation World Wide Web) is collective intelligence. Although collective intelligence is perhaps an academic goal of Web 2.0, significant business opportunity exists and is proven by the emergence of social networks, viral video, IPTV, mobile Web services and media, and advertisement insertion in all these. Clearly, content providers will be looking to well-designed CDNs more and more to reach users. This article arms the systems designer and solution architect with methods for success in the design of content distribution and delivery systems with an eye toward the future of what these systems may become. The reach and health of the Internet today Collective intelligence Over time, the Internet has evolved from a simple network on which to share files and communicate through e-mail to the World Wide Web of information, which quickly became backed by sophisticated cache servers and eventually CDNs. Given the wealth of content, human knowledge, and interaction on the Web, it has become a nexus for collective intelligence. Exactly what collective intelligence is can be debated, but most would agree that intelligence requires rich media that matches human sensory experience and cognition, including video, audio, and text that can be used interactively. Web 2.0 promises to enrich content with much more real-time, high-definition video and audio along with massive knowledge databases that can support new levels of human coordination, cooperation, collaboration, and cognition. The Massachusetts Institute of Technology (MIT) Center for Collective Intelligence states that, "Our basic research question is: How can people and computers be connected so that—collectively—they act more intelligently than any individuals, groups, or computers have ever done before?" From this stated goal, it is clear that access, interaction, and the reach of Web-based content and services will be critical for collective intelligence. Between 2000 and 2008, use of the Internet has grown by more than 100 percent in North America, by more than 200 percent in Europe, and by more than 400 percent in Asia. Explosive growth in Africa and the Middle East has exceeded 1000 percent in both of those regions. Despite this continued high growth rate, saturation has still not been achieved even in North America, where more than 70 percent of the population uses the Internet (see Resources ). Of course, access and reach alone are not the only measures of success. Quality of the Internet experience measured by broadband data rates, access latency, and richness of content is also important. A list of the worldwide top 500 Web sites reveals an interesting trend (see Resources ). Looking at the Alexa ranking, it's not surprising that the top two sites are Web search services. Perhaps more revealing is the fact that in the top 10 you now find social networking, encyclopedia, blogger, and viral video Web services. It is also interesting to note that many of the emerging sites are ingesting user content, including text from bloggers as well as images (for example, photobucket.com) and video (for example, youtube.com). It is clear that the trend and the second Internet revolution that is often referred to as Web 2.0 will include rich content, greater user interaction, real-time media, and more user collaboration—such that users not only consume content but generate it. 2 of 9 8/22/2009 6:50 PM Infrastructure architecture essentials, Part 5: Content delivery and distribu... http://www.ibm.com/developerworks/library/ar-infraarch5/ Elements of a CDN Web services have evolved from scalable Web servers—as shown in Figure 1—to much more complex systems such as the Wikimedia CDN (see Resources ). The evolution from single-site scalable Web servers to distributed content servers—as shown in Figure 2 —has numerous advantages. Figure 1. Example of traditional Web services Figure 2. Example of content delivery services First, the content servers can be geographically placed such that clients are better served in a given region with lower latency and so that less traffic encounters congestion on backbone networks. The early CDNs were often called edge servers because of their placement closer to users and considered distribution servers , because one of the main goals was to eliminate network congestion and to improve user experience. More recently, as the richness of Internet content has grown to include more real-time media, CDNs have become known as content delivery networks, because many of these systems are becoming more specialized to provide not only better distribution of files but better streaming of real-time media. 3 of 9 8/22/2009 6:50 PM Infrastructure architecture essentials, Part 5: Content delivery and distribu... http://www.ibm.com/developerworks/library/ar-infraarch5/ Another, more recent trend has been the extension of CDNs to the users themselves such that personal computers can participate in distribution with peer-to-peer (P2P) services (see Resources ). Some of the major decisions for a CDN designer is the degree to which the CDN will include P2P features, or whether the CDN will be a more traditional distributed system of servers and what forms of media the CDN will handle. No matter what the goals are for CDN design, a CDN should include the following basic services: Web caches: For example, Squid Database servers: For example, MySQL or commercial databases such as IBM® DB2® Web servers: For example, Apache CD management: Content authoring, transcoding, and management tools; IT configuration, monitoring, and management; bug tracking; and so on. High-access storage: RAID arrays, solid-state disk (SSD) drives, and P2P distributed storage and access etwork attached storage (AS) heads: File servers to support heterogeneous clients, including Linux®, UNIX®, Mac OS X, and Windows® It is clear that centralized Web services, as shown in Figure 1 , will lead to network congestion as worldwide clients are routed through backbone networks to reach a single-server site. Even if that Web server can be scaled in terms of network bandwidth, processing, and storage I/O in order to keep up, the user experience will suffer because of wait times in queues and backbone network loading. One of the very first solutions to this problem in Web 1.0 was to use local Web caches with individual client browsers. This worked well when total content on the Internet was not so rich and varied; but today, most users browse a much broader range of content on many more sites than they did in the 1990s. Such browsing of extensive content led to the use of proxy servers that would cache Web content for major client locations like a university (and in some cases would filter content access). Neither client cache nor proxy cache solved the problem of wait time for users of the most popular content providers as Internet access and total content has grown.