only for RuBoard - do not distribute or recompile Copyright Table of Contents Index Full Description About the Author Reviews Reader reviews Errata Web Caching Duane Wessels Publisher: O'Reilly First Edition June 2001 ISBN: 1-56592-536-X, 318 pages A properly designed cache, by reducing network traffic and improving access times to popular web sites, is a boon to network administrators and web users alike. This book hands you all the technical information you need to design, deploy and operate an effective web caching service. It also covers the important political aspects of web caching, including privacy and security issues. only for RuBoard - do not distribute or recompile only for RuBoard - do not distribute or recompile Web Caching Preface Audience What You Will and Won't Find Here Caching Resources Conventions Used in This Book How To Contact Us Acknowledgments 1. Introduction 1.1 Web Architecture 1.2 Web Transport Protocols 1.3 Why Cache the Web? 1.4 Why Not Cache the Web? 1.5 Types of Web Caches 1.6 Caching Proxy Features 1.7 Meshes, Clusters, and Hierarchies 1.8 Products 2. How Web Caching Works 2.1 HTTP Requests 2.2 Is It Cachable? 2.3 Hits, Misses, and Freshness 2.4 Hit Ratios 2.5 Validation 2.6 Forcing a Cache to Refresh 2.7 Cache Replacement 3. Politics of Web Caching 3.1 Privacy 3.2 Request Blocking 3.3 Copyright 3.4 Offensive Content 3.5 Dynamic Web Pages 3.6 Content Integrity 3.7 Cache Busting and Server Busting 3.8 Advertising 3.9 Trust 3.10 Effects of Proxies 4. Configuring Cache Clients 4.1 Proxy Addresses 4.2 Manual Proxy Configuration 4.3 Proxy Auto-Configuration Script 4.4 Web Proxy Auto-Discovery 4.5 Other Configuration Options 4.6 The Bottom Line 5. Interception Proxying and Caching 5.1 Overview 5.2 The IP Layer: Routing 5.3 The TCP Layer: Ports and Delivery 5.4 The Application Layer: HTTP 5.5 Debugging Interception 5.6 Issues 5.7 To Intercept or Not To Intercept 6. Configuring Servers to Work with Caches 6.1 Important HTTP Headers 6.2 Being Cache-Friendly 6.3 Being Cache-Unfriendly 6.4 Other Issues for Content Providers 7. Cache Hierarchies 7.1 How Hierarchies Work 7.2 Why Join a Hierarchy? 7.3 Why Not Join a Hierarchy? 7.4 Optimizing Hierarchies 8. Intercache Protocols 8.1 ICP 8.2 CARP 8.3 HTCP 8.4 Cache Digests 8.5 Which Protocol to Use 9. Cache Clusters 9.1 The Hot Spare 9.2 Throughput and Load Sharing 9.3 Bandwidth 10. Design Considerations for Caching Services 10.1 Appliance or Software Solution 10.2 Disk Space 10.3 Memory 10.4 Network Interfaces 10.5 Operating Systems 10.6 High Availability 10.7 Intercepting Traffic 10.8 Load Sharing 10.9 Location 10.10 Using a Hierarchy 11. Monitoring the Health of Your Caches 11.1 What to Monitor? 11.2 Monitoring Tools 12. Benchmarking Proxy Caches 12.1 Metrics 12.2 Performance Bottlenecks 12.3 Benchmarking Tools 12.4 Benchmarking Gotchas 12.5 How to Benchmark a Proxy Cache 12.6 Sample Benchmark Results A. Analysis of Production Cache Trace Data A.1 Reply and Object Sizes A.2 Content Types A.3 HTTP Headers A.4 Protocols A.5 Port Numbers A.6 Popularity A.7 Cachability A.8 Service Times A.9 Hit Ratios A.10 Object Life Cycle A.11 Request Methods A.12 Reply Status Code B. Internet Cache Protocol B.1 ICPv2 Message Format B.2 Opcodes B.3 Option Flags B.4 Experimental Features C. Cache Array Routing Protocol C.1 Membership Table C.2 Routing Function C.3 Examples D. Hypertext Caching Protocol D.1 Message Format and Magic Constants D.2 HTCP Data Types D.3 HTCP Opcodes E. Cache Digests E.1 The Cache Digest Implementation E.2 Message Format E.3 An Example F. HTTP Status Codes F.1 1xx Intermediate Status F.2 2xx Successful Response F.3 3xx Redirects F.4 4xx Request Errors F.5 5xx Server Errors G. U.S.C. 17 Sec. 512. Limitations on Liability Relating to Material Online H. List of Acronyms I. Bibliography I.1 Books and Articles I.2 Request For Comments Colophon only for RuBoard - do not distribute or recompile only for RuBoard - do not distribute or recompile Web Caching Copyright © 2001 O'Reilly & Associates, Inc. All rights reserved. Printed in the United States of America. Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472. Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. The association between the image of a rock thrush and web caching is a trademark of O'Reilly & Associates, Inc. While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. only for RuBoard - do not distribute or recompile only for RuBoard - do not distribute or recompile Preface When I first started using the Internet in 1986, my friends and I were obsessed with anonymous FTP servers. What a wonderful concept! We could download all sorts of interesting files, such as FAQs, source code, GIF images, and PC shareware. Of course, downloading could be slow, especially from the busy sites like the famous WSMR- SIMTEL20.ARMY.MIL archive. In order to download files to my PC, I would first ftp them to my Unix account and then use Zmodem to transfer them to my PC through my 1200 bps modem. Usually, I deleted a file after downloading it, but there were certain files-like HOSTS.TXT and the "Anonymous FTP List"-that I kept on the Unix system. After a while, I had some scripts to automatically locate and retrieve a list of files for later download. Since our accounts had disk quotas, I had to carefully remove old, unused files and keep the useful ones. Also, I knew that if I had to delete a useful file, Mark, Mark, Ed, Jay, or Wim probably had a copy in their account. Although I didn't realize it at the time, I was caching the FTP files. My Unix account provided temporary storage for the files I was downloading. Frequently referenced files were kept as long as possible, subject to disk space limitations. Before retrieving a file from an FTP server, I often checked my friend's "caches" to see if they already had what I was looking for. Nowadays, the World Wide Web is where it's at, and caching is here too. Caching makes the Web feel faster, especially for popular pages. Requests for cached information come back much faster than requests sent to the content provider. Furthermore, caching reduces network bandwidth, which translates directly into cost savings for many organizations. In many ways, web caching is similar to the way it was in the Good Ol' Days. The basic ideas are the same: retrieve and store files for the user. When the cache becomes full, some files must be deleted. Web caches can cooperate and talk to each other when looking for a particular file before retrieving it from the source. Of course, web caching is significantly more sophisticated and complicated than my early Internet years. Caches are tightly integrated into the web architecture, often without the user's knowledge. The Hypertext Transfer Protocol was designed with caching in mind. This gives users and content providers more control (perhaps too much) over the treatment of cached data. In this book, you'll learn how caches work, how clients and servers can take advantage of caching, what issues are important, how to design a caching service for your organization, and more. only for RuBoard - do not distribute or recompile only for RuBoard - do not distribute or recompile Audience The material in this book is relevant to the following groups of people: Administrators This book is primarily written for those of you who are, or will be, responsible for the day-to-day operation of one or more web caches. You might work for an ISP, a corporation, or an educational institution. Or perhaps you'd like to set up a web cache for your home computer. Content providers I sincerely hope that content providers take a look at this book, and especially Chapter 6, to see how making their content more "cache aware" can improve their users' surfing experiences. Web developers Anyone developing an application that uses HTTP needs to understand how web caching works. Many users today are behind firewalls and caching proxies. A significant amount of HTTP traffic is automatically intercepted and sent to web caches. Failure to take caching issues into consideration may adversely affect the operation of your application. Web users Usually, the people who deploy caches want them to be transparent to the end user. Indeed, users are often unaware that they are using a web cache. Even so, if you are "only" a user, I hope that you find this book useful and interesting. It can help you understand why you sometimes see stale web pages and what you can do about it. If you are concerned about your privacy on the Internet, be sure to read Chapter 3.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages367 Page
-
File Size-