Web Client Programming with Perl
Total Page:16
File Type:pdf, Size:1020Kb
Web Client Programming with Perl Automating Tasks on the Web By Clinton Wong 1st Edition March 1997 This book is out of print, but it has been made available online through the O'Reilly Open Books Project. Table of Contents Preface Chapter 1: Introduction Chapter 2: Demystifying the Browser Chapter 3: Learning HTTP Chapter 4: The Socket Library Chapter 5: The LWP Library Chapter 6: Example LWP Programs Chapter 7: Graphical Examples with Perl/Tk Appendix A: HTTP Headers Appendix B: Reference Tables Appendix C: The Robot Exclusion Standard Index Examples Back to: Web Client Programming with Perl O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts International | About O'Reilly | Affiliated Companies © 2001, O'Reilly & Associates, Inc. [email protected] Web Client Programming with Perl Automating Tasks on the Web By Clinton Wong 1st Edition March 1997 This book is out of print, but it has been made available online through the O'Reilly Open Books Project. Table of Contents Preface 1. Introduction Why Write Your Own Clients? The Web and HTTP The Programming Interface A Word of Caution 2. Demystifying the Browser Behind the Scenes of a Simple Document Retrieving a Document Manually Behind the Scenes of an HTML Form Behind the Scenes of Publishing a Document Structure of HTTP Transactions 3. Learning HTTP Structure of an HTTP Transaction Client Request Methods Versions of HTTP Server Response Codes HTTP Headers 4. The Socket Library A Typical Conversation over Sockets Using the Socket Calls Server Socket Calls Client Connection Code Your First Web Client Parsing a URL Hypertext UNIX cat Shell Hypertext cat Grep out URL References Client Design Considerations 5. The LWP Library Some Simple Examples Listing of LWP Modules Using LWP 6. Example LWP Programs Simple Clients Periodic Clients Recursive Clients 7. Graphical Examples with Perl/Tk A Brief Introduction to Tk A Dictionary Client: xword Check on Package Delivery: Track Check if Servers Are up: webping A. HTTP Headers General Headers Client Request Headers Server Response Headers Entity Headers Summary of Support Across HTTP Versions B. Reference Tables Media Types Character Encoding Languages Character Sets C. The Robot Exclusion Standard Index Back to: Chapter Index Back to: Web Client Programming with Perl O'Reilly Home | O'Reilly Bookstores | How to Order | O'Reilly Contacts International | About O'Reilly | Affiliated Companies © 2001, O'Reilly & Associates, Inc. [email protected] Web Client Programming with Perl Automating Tasks on the Web By Clinton Wong 1st Edition March 1997 This book is out of print, but it has been made available online through the O'Reilly Open Books Project. Preface The World Wide Web has been credited with bringing the Internet to the masses. The Internet was previously the stomping ground of academics and a small, elite group of computer professionals, mostly UNIX programmers and other oddball types, running obscure commands like ftp and finger, archie and telnet, and so on. With the arrival of graphical browsers for the Web, the Internet suddenly exploded. Anyone could find things on the Web. You didn't need to be "in the know" anymore--you just needed to be properly networked. Equipped with Netscape Navigator or Internet Explorer or any other browser, everyone can now explore the Internet freely. But graphical browsers can be limiting. The very interactivity that makes them the ideal interface for the Internet also makes them cumbersome when you want to automate a task. It's analogous to editing a document by hand when you'd like to write a script to do the work for you. Graphical browsers require you to navigate the Web manually. In an effort to diminish the amount of tedious pointing-and-clicking you do with your browser, this book shows you how to liberate yourself from the confines of your browser. Web Client Programming with Perl is a behind-the-scenes look at how your web browser interacts with web servers. Readers of this book will learn how the Web works and how to write software that is more flexible, dynamic, and powerful than the typical web browser. The goal here is not to rewrite the browser, but to give you the ability to retrieve, manipulate, and redistribute web-based information in an automated fashion. Who This Book Is For I like to think that this book is for everyone. But since that's a bit of an exaggeration, let's try to identify who might really enjoy this book. This book is for software developers who want to expand into a new market niche. It provides proof-of-concept examples and a compilation of web-related technical data. This book is for web administrators who maintain large amounts of data. Administrators can replace manual maintenance tasks with web robots to detect and correct problems with web sites. Robots perform tasks more accurately and quickly than human hands. But to be honest, the audience that's closest to my heart is that of computer enthusiasts, tinkerers, and motivated students, who can use this book to satisfy their curiosity about how the Web works and how to make it work for them. My editor often talks about when she first learned UNIX scripting and how it opened a world of automation for her. When you learn how to write scripts, you realize that there's very little that you can't do within that universe. With this book, you can extend that confidence to the Web. If this book is successful, then for almost any web-related task you'll find yourself thinking, "Hey, I could write a script to do that!" Unfortunately, we can't teach you everything. There are a few things that we assume that you are already familiar with: ● The concept of client/server network applications and TCP/IP. ● How the Internet works, and how to access it. ● The Perl language. Perl was chosen as the language for examples in this book due to its ability to hide complexity. Instead of dealing with C's data structures and low-level system calls, Perl introduces higher-level functions and a straightforward way of defining and using data. If you aren't already familiar with Perl, I recommend Learning Perl by Randal Schwartz, and Programming Perl (popularly known as "The Camel Book") by Larry Wall, Tom Christiansen, and Randal Schwartz. Both of these books are published by O'Reilly & Associates, Inc. There are other fine Perl books as well. Check out http://www.perl.com for the latest book critiques. Is This Book for You? Some of you already know why you picked up this book. But others may just have a nagging feeling that it's something useful to know, though you may not be entirely sure why. At the risk of seeming self-serving, let me suggest some ways in which this book may be helpful: ● Some people just like to know how things tick. If you like to think the Web is magic, fine--but there are many who don't like to get into a car without knowing what's under the hood. For those of you who desire a better technical understanding of the Web, this book demystifies the web protocol and the browser/server interaction. ● Some people hate to waste even a minute of time. Given the choice between repeating an action over and over for an hour, or writing a script to automate it, these people will choose the script every time. Call it productivity or just stubbornness--the effect is the same. Through web automation, much time can be saved. Repetitive tasks, like tracking packages or stock prices, can be relegated to a web robot, leaving the user free to perform more fruitful activities (like eating lunch). ● If you understand your current web environment, you are more likely to recognize areas that can be improved. Instead of waiting for solutions to show up in the marketplace, you can take an active role in shaping the future direction of your own web technology. You can develop your own specialized solutions to fit specific problems. ● In today's frenzied high-tech world, knowledge isn't just power, it's money. A reasonable understanding of HTTP looks nice on the resume when you're competing for software contracts, consulting work, and jobs. Organization This book consists of seven chapters and three appendices, as follows: Chapter 1, Introduction Discusses basic terminology and potential uses for customized web clients. Chapter 2, Demystifying the Browser Translates common browser tasks into HTTP transactions. By the end of the chapter, the reader will understand how web clients and servers interact, and will be able to perform these interactions manually. Chapter 3, Learning HTTP Teaches the nuances of the HTTP protocol. Chapter 4, The Socket Library Introduces the socket library and shows some examples of how to write simple web clients with sockets. Chapter 5, The LWP Library Describes the LWP library that will be used for the examples in Chapters 6 and 7. Chapter 6, Example LWP Programs A cookbook-type demonstration of several example applications. Chapter 7, Graphical Examples with Perl/Tk A demonstration of how you can use the Tk extention to Perl to add a graphical interface to your programs. Appendix A, HTTP Headers Contains a comprehensive listing of the headers specified by HTTP. Appendix B, Reference Tables Lists URLs that you can use to learn more about HTTP and LWP. Appendix C, The Robot Exclusion Standard Describes the Robot Exclusion Standard, which every good web programmer should know intimately. Source Code in This Book Is Online In this book, we include many code examples. While the code is all contained within the text, many people will prefer to download examples rather than type them in by hand.