PHP Or Other Scripting Languages - and Scale Them to Compete with Established "Store Bought" Enterprise Web Technologies
Total Page:16
File Type:pdf, Size:1020Kb
Building Scalable Web Sites By Cal Henderson ............................................... Publisher: O'Reilly Pub Date: May 2006 Print ISBN-10: 0-596-10235-6 Print ISBN-13: 978-0-59-610235-7 Pages: 348 Table of Contents | Index Slow websites infuriate users. Lots of people can visit your web site or use your web application - but you have to be prepared for those visitors, or they won't come back. Your sites need to be built to withstand the problems success creates. Building Scalable Web Sites looks at a variety of techniques for creating sites that can keep users cheerful even when there are thousands or millions of them. Flickr.com developer, Cal Henderson, explains how to build sites so that large numbers of visitors can enjoy them. Henderson examines techniques that go beyond sheer speed, exploring how to coordinate developers, support international users, and integrate with other services from email to SOAP to RSS to the APIs exposed by many Ajax-based web applications. This book uncovers the secrets that you need to know for back-end scaling, architecture and failover so your websites can handle countless requests. You'll learn how to take the "poor man's web technologies" - Linux, Apache, MySQL and PHP or other scripting languages - and scale them to compete with established "store bought" enterprise web technologies. Toward the end of the book, you'll discover techniques for keeping web applications running with event monitoring and long- term statistical tracking for capacity planning. If you're about to build your first dynamic website, then Building Scalable Web Sites isn't for you. But if you're an advanced developer who's ready to realize the cost and performance benefits of a comprehensive approach to scalable applications, then let your fingers do the walking through this convenient guide. Building Scalable Web Sites By Cal Henderson ............................................... Publisher: O'Reilly Pub Date: May 2006 Print ISBN-10: 0-596-10235-6 Print ISBN-13: 978-0-59-610235-7 Pages: 348 Table of Contents | Index Copyright Preface Chapter 1. Introduction Section 1.1. What Is a Web Application? Section 1.2. How Do You Build Web Applications? Section 1.3. What Is Architecture? Section 1.4. How Do I Get Started? Chapter 2. Web Application Architecture Section 2.1. Layered Software Architecture Section 2.2. Layered Technologies Section 2.3. Software Interface Design Section 2.4. Getting from A to B Section 2.5. The Software/Hardware Divide Section 2.6. Hardware Platforms Section 2.7. Hardware Platform Growth Section 2.8. Hardware Redundancy Section 2.9. Networking Section 2.10. Languages, Technologies, and Databases Chapter 3. Development Environments Section 3.1. The Three Rules Section 3.2. Use Source Control Section 3.3. One-Step Build Section 3.4. Issue Tracking Section 3.5. Scaling the Development Model Section 3.6. Coding Standards Section 3.7. Testing Chapter 4. i18n, L10n, and Unicode Section 4.1. Internationalization and Localization Section 4.2. Unicode in a Nutshell Section 4.3. Unicode Encodings Section 4.4. The UTF-8 Encoding Section 4.5. UTF-8 Web Applications Section 4.6. Using UTF-8 with PHP Section 4.7. Using UTF-8 with Other Languages Section 4.8. Using UTF-8 with MySQL Section 4.9. Using UTF-8 with Email Section 4.10. Using UTF-8 with JavaScript Section 4.11. Using UTF-8 with APIs Chapter 5. Data Integrity and Security Section 5.1. Data Integrity Policies Section 5.2. Good, Valid, and Invalid Section 5.3. Filtering UTF-8 Section 5.4. Filtering Control Characters Section 5.5. Filtering HTML Section 5.6. Cross-Site Scripting (XSS) Section 5.7. SQL Injection Attacks Chapter 6. Email Section 6.1. Receiving Email Section 6.2. Injecting Email into Your Application Section 6.3. The MIME Format Section 6.4. Parsing Simple MIME Emails Section 6.5. Parsing UU Encoded Attachments Section 6.6. TNEF Attachments Section 6.7. Wireless Carriers Hate You Section 6.8. Character Sets and Encodings Section 6.9. Recognizing Your Users Section 6.10. Unit Testing Chapter 7. Remote Services Section 7.1. Remote Services Club Section 7.2. Sockets Section 7.3. Using HTTP Section 7.4. Remote Services Redundancy Section 7.5. Asynchronous Systems Section 7.6. Exchanging XML Section 7.7. Lightweight Protocols Chapter 8. Bottlenecks Section 8.1. Identifying Bottlenecks Section 8.2. External Services and Black Boxes Chapter 9. Scaling Web Applications Section 9.1. The Scaling Myth Section 9.2. Scaling the Network Section 9.3. Load Balancing Section 9.4. Scaling MySQL Section 9.5. MyISAM Section 9.6. MySQL Replication Section 9.7. Database Partitioning Section 9.8. Scaling Large Database Section 9.9. Scaling Storage Chapter 10. Statistics, Monitoring, and Alerting Section 10.1. Tracking Web Statistics Section 10.2. Application Monitoring Section 10.3. Alerting Chapter 11. APIs Section 11.1. Data Feeds Section 11.2. Mobile Content Section 11.3. Web Services Section 11.4. API Transports Section 11.5. API Abuse Section 11.6. Authentication Section 11.7. The Future About the Author Colophon Colophon Index Building Scalable Web Sites by Cal Henderson Copyright © 2006 O'Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (safari.oreilly.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 [email protected] . Editor: Simon St.Laurent Production Editor: Adam Witwer Copyeditor: Adam Witwer Proofreader: Colleen Gorman Indexer: John Bickelhaupt Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrators: Robert Romano and Jessamyn Read Printing History: May 2006: First Edition. Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, Inc. Building Scalable Web Sites, the image of a carp, and related trade dress are trademarks of O'Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 0-596-10235-6 [M] Preface The first web application I built was called Terrania. A visitor could come to the web site, create a virtual creature with some customizations, and then track that creature's progress through a virtual world. Creatures would wander about, eat plants (or other creatures), fight battles, and mate with other players' creatures. This activity would then be reported back to players by twice-daily emails summarizing the day's events. Calling it a web application is a bit of a stretch; at the time I certainly wouldn't have categorized it as such. The core of the game was a program written in C++ that ran on a single machine, loading game data from a single flat file, processing everything for the game "tick," and storing it all again in a single flat file. When I started building the game, the runtime was destined to become the server component of a client-server game architecture. Programming network data-exchange at the time was a difficult process that tended to involve writing a lot of rote code just to exchange strings between a server and client (we had no .NET in those days). The Web gave application developers a ready-to-use platform for content delivery across a network, cutting out the trickier parts of client-server applications. We were free to build the server that did the interesting parts while building a client in simple HTML that was trivial in comparison. What would have traditionally been the client component of Terrania resided on the server, simply accessing the same flat file that the game server used. For most pages in the "client" application, I simply loaded the file into memory, parsed out the creatures that the player cared about, and displayed back some static information in HTML. To create a new creature, I appended a block of data to the end of a second file, which the server would then pick up and process each time it ran, integrating the new creatures into the game. All game processing, including the sending of progress emails, was done by the server component. The web server "client" interface was a simple C++ CGI application that could parse the game datafile in a couple of hundred lines of source. This system was pretty satisfactory; perhaps I didn't see the limitations at the time because I didn't come up against any of them. The lack of interactivity through the web interface wasn't a big deal as that was part of the game design. The only write operation performed by a player was the initial creation of the creature, leaving the rest of the game as a read-only process. Another issue that didn't come up was concurrency. Since Terrania was largely read-only, any number of players could generate pages simultaneously. All of the writes were simple file appends that were fast enough to avoid spinning for locks. Besides, there weren't enough players for there to be a reasonable chance of two people reading or writing at once. A few years would pass before I got around to working with something more closely resembling a web application.