CHAPTER 6 Delivering Dynamic Content
ANY WEB SITE of moderate complexity will need to deal with the issue of generating dynamic content. Although the development Web applications is often seen purely as a programming problem, particularly by Web developers, there are many aspects to gen• erating content on the server that are crucial from the administration point of view-in particular, performance, resource usage, and of course security. Indeed, much of Apache's functionality-and much of the content of the rest of this book-is ultimately concerned with the generation of content dynamically. Technically, there are two categories of dynamic content, distinguished by where the content is actually generated:
Cllent-side content: Client-side content consists of executable code that's down• loaded to the browser, just like any other Web content, and executed there. Examples of client-side elements are Java applets and JavaScript code embedded into HTML or XML and proprietary technologies such as Macromedia Flash. In general, Web server administrators don't need to be overly concerned with client-side elements, beyond the need to ensure that the correct MIME type is attributed to the file as it's downloaded. Server-side content: Server-side content, on the other hand, is derived from pro• gram code executed on the server. Usually, this will be simple HTML or XML, or sometimes it'll be another document type such as a spreadsheet or PDF docu• ment. However, there's no reason why you can't generate client-side JavaScript code, or any other kind of client-side dynamic content, from a server-side program. In this chapter, you're primarily concerned with server-side dynamic content because this is where the primary complexity and concerns ofWeb server administra• tors are. There are many diverse technologies available to implement server-generated content, ranging from simple scripts to distributed networking applications to choose from, all of which allow Web content developers to build applications that function across the Internet. The oldest mechanism for supporting server-side dynamic content is the Common Gateway Interface (CGO. In this chapter, I'll show the various ways you can configure Apache to use CGI scripts and how to write CGI scripts to handle and respond to HTTP requests. CGI is technically a protocol and not a type of program: It defines how Apache (and other Web servers) communicate with external applications. CGI scripts are merely stand-alone programs that Apache executes and then communicates with using the CGI protocol. Underneath their layers of abstraction, almost all techniques for generating
P. Wainwright, Pro Apache © Peter Wainwright 2004 307 Chapter6
dynamic content involve a communication between Apache and something else that's either identical to or based on the CGI protocol. Many of the issues raised in this chap• ter with regard to CGI also need to be addressed when you use using any server-side technology. Security and performance are always issues with dynamic content, no mat• terwhat tool you use. Even if you don't plan to implement CGI scripts, there are impor• tant lessons that a good understanding of CGI can teach. Apache can do a lot more than simply execute CGI scripts, however. I'll also dis• cuss other ways of handling dynamic content in Apache, including generic support for dynamic content using the Action, SetHandler, and AddHandler directives. I'll also cover input and output filters, a new and powerful feature of Apache 2, that allows a request to be passed through a chain of handlers. Security is an important issue for any Web site that generates content dynamically. CGI, in particular, is a potential minefield of security hazards; I'll look at how security holes can be created in CGI and how to prevent them. I'll also discuss CGI wrappers and how they can be used to improve security in some circumstances, as well as the Apache 2 perchild MPM as an alternative to using a CGI wrapper. Finally, I'll tackle one popular approach to improving the performance of dynamic content and CGI using the third-party mod_fastcgi module and compare the approach it offers to other solutions. I'll start with the most basic way that Apache can assemble and deliver dynamic content-Server-Side Includes (SSis), which allow documents to embed other docu• ments and the output of CGI scripts within themselves.
Server-Side Includes SSis provide a simple mechanism for embedding dynamic information into an other• wise static HTML page. The most common and by far the most useful application of this is embedding other documents, as well as the output of CGI scripts, into a page. It's also possible to use SSis to include information such as the current time and other server-generated information in the content of a Web page. Documents that are processed this way are known as server-parsed documents. SSis are more powerful than they're often perceived to be. Although they've been largely superseded by more modem technologies such as stylesheets, templating engines, and XML, SSis have the advantage of being much simpler to set up than most of these alternatives; they can be surprisingly powerful, given that they're a much older technology. You can set and retrieve variables, evaluate sections conditionally, and nest SSis within each other, anticipating all the attributes of a simple templating system. Apache will automatically cache SSis in memory too, so they can be extremely fast• indeed, faster than running an external templating engine if your requirements are simple. This makes building pages out of composite SSI documents that in turn include further documents quite practical. Of course, there's nothing to stop a Web author using technologies such as XML and templates alongside SSI. This is particularly true for Apache 2 where you can pass the output of another application through the SSI processing engine using an output filter. Also, from Apache 1.3.10, you can use SSis in directory indices produced by mod_ auto index and error documents triggered through the ErrorDocument directive.
308 Delivering Dynamic Content
Enabling SSI SSI functionality is provided by mod_ include. Therefore, this module has to be present on the server. mod_include provides no directives in Apache 1.3 (apart from XBitHack, which isn't a fundamental part of its configuration) but does provide a handler, server-parsed, and also defines a MIME-type application/x-server-parsed, which triggers the handler on any file recognized as being of that type, as discussed in Chapter 5. The handler is preferred in modern Apache 1.3 configurations. Apache 2 provides the Includes output filter, which is more flexible than the han• dler and is now the preferred way to enable SSis. Its main benefit is that it can be used to process the SSis contained in the output of other modules or external applications. Apache 2 also provides directives to change the start and end syntax of SSI commands, as well as set the format of the SSI date and error texts. mod_ include is enabled by specifying Includes or IncludesNOEXEC. For example:
Options +Includes
IncludesNOEXEC is identical to Includes except that all commands that cause script execution are disabled. It's a popular option for hosting virtual sites where users aren't trusted to write their own scripts but are still allowed to include static files:
# allow static content to be included, but do not run CGis Options +IncludesNOEXEC
Note that if a CGI script is actually included with this option, then the actual code will be sent to the client. Because this is a serious security issue, check carefully before switching from Includes to IncludesNOEXEC. Because the Options directive works in container directives, you can use it to enable SSI selectively. For example, in a
This enables mod_include for ssidocs, but you still have to tell Apache when to use it; that is, when a request should be passed to mod_ include. You can do this by using a MIME-type association, by registering a handler, or (in Apache 2) by using a filter. To register a MIME-type that'll trigger mod_ include, define a type for the file exten• sion you want processed. For example, to define the extension . shtml, which is the conventional extension for SSI, you could use this:
AddType application/x-server-parsed .shtml
309 Chapter6
Alternatively, and preferably, you can relate the extension directly to the server-parsed handler:
AddHandler server-parsed .shtml
As with the Option directive, you can restrict this to a
This is quite convenient if you have a collection of standard includes you use for templates and don't want to keep appending • shtml to their names. You know they're includes because they're in the include directory, and because they're likely to be always embedded into other documents, the client will never see their names directly. You could also choose to treat all HTML files as potentially containing SSI com• mands. This causes Apache to waste a lot of time looking for SSI directives that aren't there, even if most of your documents are actually plain HTML. However, if you use SSI a lot and know that most or all documents contain them, you can disguise that with this:
AddHandler server-parsed .shtml .html .htm
If you want to use SSis with another handler-for example, to process the output of a CGI script-then you can't use the handler directly because you can't associate more than one handler at the same time. There can be, as they say, only one. Instead, in Apache 2, you can use the Includes filter. This has the same effect as the handler but processes the response from whatever handler actually generated the response, rather than generating the response itself:
AddHandler cgi-script .cgi AddOutputFilter Includes .cgi
310 Delivering Dynamic Content
This configuration will cause all files ending in . cgi to execute as CGI scripts, the output of which will then be passed through mod_ include to evaluate any SSis con• tained in the output of the script, before sending the response to the client. I'll cover CGI configuration in detail shortly. Actions, handlers, and filters are covered in more detail later in the chapter, but this is all you need to get SSis up and running. Now all that remains is to add some SSI commands into the documents to give mod_ include something to chew on.
Format of SSI Commands SSI commands are embedded in HTML and look like HTML comments to a non-SSI• aware server. They have the following format:
<1--#conunand parameter="value" parameter="value" •.• -->
When Apache sees this, assuming SSis are enabled, it processes the command and substitutes the text of the include with the results. It's a good idea, though not always essential, to leave a space between the last parameter-value pair and the closing --> to ensure that the SSI is correctly parsed by Apache. If the included document contains SSis, then mod_ include will parse them too, allowing you to nest included documents several levels deep. Because Apache will cache included documents where possible, includes are a lot faster than you might otherwise think, especially if you include the same document multiple times. A non-SSI-enabled server just returns the document to the user, but because SSI commands look like comments, browsers don't display them.
SSIStartTag and SSIEndTag In Apache 2, you can also redefine the prefix and suffix of SSI commands, which gives you the intriguing possibility of having SSis that look like XML. You do this through the SSIStartTag and SSIEndTag directives. For example, you can create an ssi namespace and convert SSI commands into closed XML tags that belong in that namespace with this:
SSIStartTag "
This changes the SSI syntax into an XML-compliant tag:
311 Chapter6
One application of this is to have different Includes filters that process different sets of SSI tags. It also makes it trivial to remove mod_ include and add an xsr:r stylesheet processor to do the same thing if that's the direction you decide to go. Alter• natively, you can whip up a simple SSI Document Type Definition (DTD} and validate your documents with it while retaining mod_ include to do the donkey work. Note that if you do change the syntax, then SSis using the old syntax are no longer valid and will appear as comments in the output. Conveniently, in this example, if the new syntax isn't recognized, then it'll create XML tags in the output that will also be disregarded by the client. A client-side JavaScript library that processes the tags on arrival is also a viable approach. Having SSI commands in XML form makes this an easier prospect because the XML parser built into most browsers can be used for the task.
The SSI Command Set SSis provide a limited set of commands that you can embed into documents. Despite being limited, they cover enough features to enable quite a lot. The original SSI com• mands were augmented by an extended set of commands called Extended Server-Side Includes (XSSO. XSSI added the ability to set and retrieve variables from the environ• ment and to construct conditional sections using those variables. If compiled with the -DUSE_PERL_SSI flag, mod_perl and mod_include can also col• laborate to provide a per 1 SSI command that allows mod _per 1 handlers to be called via SSis:
Alternatively, if you're already using mod _per 1, you can use the Apache: : SSI Perl module, which implements SSis in the same way as mod_ include but doesn't require it. CGI scripts that have been cached via Apache: :Registry can also be used via Apache: : Include. You'll see mod _per 1 in detail in Chapter 12. Online Appendix E contains an expanded list of all standard SSI commands with examples.
SSI Variables Variables for SSI commands can derive from one of three sources:
• Existing environment variables
• Variables set by mod_ include
• Pattern-match results from the last-executed regular expression
312 Delivering Dynamic Content
Any variable available in Apache's environment can be substituted into and evalu• ated in an SSI command. This includes the standard server variables, variables set by the client request, variables set by other modules in the server, and variables set by mod_ include's own set command. In addition, mod_ include defines the variables in Table 6-1 itself.
Table 6-1. mod_ include Predefined Variables Variable Description DATE GMT The current date in Greenwich Mean Time (GMT). DATE LOCAL The current date in local time. DOCUMENT NAME The leaf name of the URL corresponding to the requested document. This means the index file's URL in the case of a directory request. DOCUMENT PATH INFO The extra path information of the originally requested document, if any. See also the AcceptPathinfp and AllowEncodedSlashes directives. Compare to PATH_INFO. DOCUMENT URI The URL corresponding to the originally requested document, URL decoded. This means the index file's URL in the case of a directory request. It doesn't include the query string or path information. Compare to REQUEST_ URI. LAST MODIFIED The modification date of the originally requested document. This can be important for server-parsed documents because, as mentioned earlier, the Last -Modified header isn't automatically assigned. QUERY_STRING_UNESCAPED The query string of the originally requested document, URL decoded, if any. Compare to QUERY_ STRING (which is encoded). USER_NAME The name of the authenticated user, if set.
In Apache 2, the results of regular expression matches performed by directives such as AliasMatch or RewriteRule are also available as the variables $0 to $9. These aren't environment variables and aren't passed to other modules such as mod_ cgi but are made especially available to SSI commands. This allows you to write SSI com• mands such as this:
The contents of the number variables depend on the last regular expression that was evaluated, which can vary wildly depending on what Apache did in the course of processing the request. This is still the case even if the last regular expression to be evaluated didn't return any values (in other words, no parentheses were present). The value of $0 will always contain the complete matched text, however. Variables can be substituted into any SSI attribute string, such as (but not limited to) set value, include virtual, and if expr:
313 Chapter6
In more recent versions of mod_ include (since Apache 2.0.35), the var attribute of the set and echo commands can also be constructed from variable substitution:
If the surrounding text isn't distinguishable from the variable name, curly brackets can indicate the start and end of the variable name:
To include literal dollar, quote, or backslash characters, precede them with a back• slash to escape them:
If an undefined variable is evaluated in content that's sent back to the client (as opposed to a comparison, for instance), mod_ include will return the string (none). You can change this with the new SSIUndefinedEcho directive, which takes an alternative string as an argument. For example, to return a simple hyphen (-), use this:
SSIUndefinedEcho -
Alternatively, to return a string including spaces, or nothing at all, use quotes:
SSIUndefinedEcho "[undefined value)" SSIUndefinedEcho ""
This directive never actually affects the value of any variable, only what's returned to the client in the event an undefined variable is evaluated. It doesn't, for example, alter the results of expressions constructed from expanding the contents of one or more undefined variables.
314 Delivering Dynamic Content
Passing Trailing Path Information to SSis (and Other Dynamic Documents) Apache 2 provides two directives, AcceptPathinfo and AllowEncodedSlashes, that con• trol how Apache deals with extra information in a URL path. In Apache 1.3, a request that matches a URL but contains additional path information will retrieve the docu• ment corresponding to the shorter URL. For example, take a URL of this format: http://www.alpha-complex.com/briefingroom.html/green-clearance
If the whole URL doesn't match a document but there's a document that matches: http://www.alpha-complex.com/briefingroom.html then the rest of the path, /green-clearance, would be placed in the PATH_INFOvariable, which is covered in more detail in relation to CGI scripts a little later in the chapter. For now you just need to know why you may need to worry about this when writing server• parsed documents. The problem is that unless your document is dynamically gener• ated, this information will not be used because the document, being static, has no intelligence with which to respond to the extra information. Apache 2 considers this to be erroneous-or at least highly suspect-behavior, so it doesn't permit it by default. If the handler of a URL allows additional path information, such as CGI scripts, then Apache 2 will still set the PATH_ INFO variable and invoke the handler (in this case, the cgi-script handler). However, it'll reject a URL containing additional path infor• mation for other handlers, including server-parsed documents, unless you explicitly tell Apache that you want to allow it. This is only a problem if you want to access your SSis using additional path information; if not, you can safely ignore this issue. The logic behind this is that a longer URL is essentially wrong and shouldn't trigger a response corresponding to a shorter (and therefore different) URL if the document is static. A CGI script is entitled to accept or reject a URL when it sees the PATH_ INFO vari• able, so Apache will pass it on and let the script decide. The problem is that although a server-parsed document might include something capable of detecting and handling PATH_ INFO information, it might also include only static content, which can't. Because there's no way for Apache to know this ahead of time (it'd have to parse the entire docu• ment including all nested included documents to see if a CGI script were present), it falls to you to tell Apache in which situations the URL is permissible. The default behavior of rejecting additional path information will affect server• parsed documents that include CGI scripts that rely on this information. To re-enable it, you can change the default value of the Accept Path Info directive from default to on:
AcceptPathinfo on
315 Chapter6
Ifyou happen to have already put together server-parsed content that calls CGI (or FastCGI) scripts that rely on extra path information, this will keep them happy. If not, you probably don't need to worry. A closely related directive introduced in Apache 2.0.46 is AllowEncodedSlashes. This overrides Apache 2's default behavior of rejecting a URL that contains additional path information with URL-encoded slashes. For a Unix-based server, this means encoded backslash (!) characters, or %2f. For a Wmdows server, encoded forward slashes(\), or %Sf, are also rejected. To tell Apache that you want to allow this for a given location, you can use this:
AllowEncoded5lashes on
Note that this directive has no useful effect unless you also specify AcceptPathinfo on because without this, Apache will reject a long URL irrespective of what characters it contains. Strictly speaking, both of these directives apply to the handling of PATH_ INFO in any context where it's not allowed by default, which is, generally speaking, everywhere except CGI scripts. Consequently, you may also need to use one or both of them if you're implementing other kinds of dynamic documents (for example, PHP pages) to handle URLs such as http:/ /www.alpha-complex.com/a_php_page/extra/info/123.
NOTE See Chapter 12 for more about PHP and other scripting applications.
Setting the Date and Error Format If mod_ include can't parse a server-side include command-for example, because the SSI to be included isn't there-then it'll return an error message in place of the result of the SSI command. The default message is as follows:
[an error occurred while processing this directive]
You can change this message with the config SSI command like this:
However, it's usually more convenient to set it in the server configuration, which you can do with the SSIErrorMsg directive. For example, to have SSis fail silently, use this:
551ErrorMsg "<1--Dang, 551 fubared-->"
This affects only what goes to the client. The error itself will be logged only to the error log.
316 Delivering Dynamic Content
Similarly, you can configure the time format with SSITimeFormat using a format string compatible with the strftime system call (which does the actual work under• neath). This is the same format string used by the 1ogrotate script, and an example is as follows:
SSITimeFormat "%1:%M on %A, %d %8"
This is the same, but more convenient, than using the equivalent:
<1--#config timefmt="%1:%M on %A, %d %8" -->
Here, %1: %M is the time on a 12-hour clock, and %A, %d, and %8 are the day, date, and month respectively, with the day and month names in full.
Templating with SSis
An immediately obvious use of SSis is to put common elements of several documents into a single shared document that can then be used in each document. By editing the shared document, you can alter the appearance or content of all the documents that use it. XSSI allows you to go a step further. Because SSI variables are just environment variables, you can use them in conjunction with variables set by the server's environ• ment and other modules such as mod_ setenvif and mod_ rewrite to create dynamic, context-sensitive pages. Although not as powerful as an embedded scripting language such as mod _per 1, XSSI can still achieve quite a lot without adding additional modules to the server. For example, the following is a generic header you can use for the top of all your HTML documents, rather than writing them out explicitly each time. It incorporates the DTD, the HTML header with the title and stylesheet link, and the start of the body, including a dynamically generated navigation bar pulled in from a CGI script:
<7xm1 version="l.O" encoding="utf-8"?>
Now you can replace the top of all your documents with two SSis: one to set the title and one to include the header:
<1--#set var="TITLE" value="The Computer is Your Friend" -->
317 Chapter6
Not only is this a lot shorter, it also makes it easy to change any aspect of the header-for example, the background color-and have it immediately reflected in all documents without any additional editing. You can create a footer SSI in the same way. You can use SSis for more than just shortcuts with the use of conditional SSis, however. For example, you can implement a printable page feature that allows a client to redisplay a page with the normal logo and navigation bar removed. You can also insert the URL of the page so that the printout will show where the page came from. First, create a variable that will be set by the server whenever a printable URL is seen. Because you can use a regular expression for this, you could use a number of dif• ferent techniques for identifying a URL as printable. However, for this example, choose anyURL that starts with /print/:
SetEnvif Request_Uri A/print/ PRINTABLE=l
Second, rewrite your pages so that they'll display the usual button bar header if you're viewing a page normally or a cut-down version for the printable one. Because the logic is the same, each time you can create a common header to do it and include it into every page. Call this SSI navigation and put it in the /include directory next to the header SSI created earlier:
http://<1--#echo var="REQUEST_URI"-->
<1--#endif -->
Likewise, if you have a footer, you can switch it off for the printable page with this:
<1--#if expr="!$PRINTABLE" -->
Now just replace the SSI command in the header that includes /cgi/navigation. cgi with one that includes /include/navigation:
You could just as easily include the text of the navigation SSI directly into your header, of course. However, this illustrates how easy it is to build complex pages using a collection of simple integrated SSI components. Keeping the navigation SSI separate from the header SSI makes it easier to maintain them. It also allows for the possibility of creating more than one header, both of which can use the navigation SSI. You can even add it to the footer if you want to put a navigation bar on both the top and bottom of your documents.
318 Delivering Dynamic Content
In this solution, the printable version of the document has a different URL than the original by the addition of /print to the front of the URL. For this to actually work, you also need to make this URL valid, either by adding a symbolic link called print in the document root that points to the current directory (so that the path /print/ document. html works equally well as I document. html) or by using an alias:
Alias /print/ I
You can also use mod_ rewrite to do both at the same time, depending on your needs and preferences:
RewriteEngine on RewriteRule A/print/(.*)$ /$1 [E=PRINTABLE:yes]
This illustrates how effective a simple application of SSis in conjunction with a few environment variables can be. As an example of how far you can go with this, you can use cookie-based sessions and set the logged-in user details as environment variables to produce customized pages, all using SSis.
Caching Server-Parsed Documents In principle, Apache generates Last -Modified and ETag headers that contain the date and time of when the file was last changed for every document it sends to a client. On the basis of this header, proxies can decide whether to save the document. Unfortu• nately, SSis make determining the correct modification time problematic at best and impossible if a CGI script is involved. As a result, Apache treats a document processed by mod_ include as dynamic and will simply not transmit a Last -Modified header when delivering it. The consequence is that clients and intermediate proxies will usually not cache server-parsed documents. There are three ways to get around this limitation, excluding just taking the hit (so to speak):
• The first possibility involves using the XBi tHack directive of mod_ include, which I'll discuss in a moment.
• The second possibility is offered by mod_ expires, which is discussed in detail in Chapter 4. With this you can add an Expires header to the document, which will cause proxies to consider it cacheable up until the expiry date.
• The third possibility is to sidestep mod_ include entirely and write your own server-side parser, from which you can set your own Last-Modifier header. I'll give an example of a simple SSI parser that just detects and parses include vir• tual SSI commands later.
319 Chapter6
Identifying Server-Parsed Documents by Execute Permission Rather than teaching Apache to identify server-parsed documents by extension or location, you can selectively turn SSis on or off for individual files by setting their exe• cute permissions. This isn't an ideal solution and is appropriately enabled with the XBi tHack directive. It works by treating all documents that have both a MIME type of text/html and that are also considered executable by the filing system as SSI docu• ments. The default for XBi tHack is off:
XBitHack off
You can switch it on by simply specifying this:
XBitHack on
XBitHack also has a third option, full, that additionally causes Apache to ignore that the document was server parsed and adds a Last -Modified header to it, allowing proxies to cache the document:
XBitHack full
The logic behind this is that you're telling Apache that the SSis included in the document aren't going to change, so the unparsed document's own modification time can validly be used as the modification time of the parsed result. Whether this is actu• ally true is entirely another matter, of course. The benefit of XBi tHack is that it allows you to selectively process HTML docu• ments rather than brute-force process all HTML through mod_include whether it con• tains SSis or not. The drawback is that it's terribly vulnerable to mistakes; you can lose the exe• cutable permission on a document and not immediately notice because the filename hasn't changed, resulting in unparsed documents containing SSI commands being sent to the client. It also severely limits what you can treat as server parsed; the MIME type must be text/html. Furthermore, the resulting MIME type is also text/html. This is noticeably different from the Includes filter, which will handle any MIME type (because, being a filter, it doesn't care). Finally, it doesn't work at all on Wmdows plat• forms because they have no method to identify executable files except through their extension.
320 Delivering Dynamic Content
CGI: The Common Gateway Interface Dynamically generated Web pages have become obligatory on most Web sites, han• dling everything from shopping basket systems to the results of an interactive form. CGI provides the basic mechanism to process user-defined Web page requests. CGI is a protocol that defines how the Web server (in this case, Apache) and external programs communicate with each other. The generic term for programs that are written to this protocol is CGI script. Strictly speaking, CGI scripts aren't necessarily scripts at all; many are written in C or C++, and compiled into binary executables. However, when CGI was originally intro• duced, most programs· were written as Unix shell scripts. Today, scripting languages are popular choices for CGI because they're often more portable and maintainable than C or C++. They also provide an easy environment to prototype scripts for reimplementa• tion as high-performance C code once their design concept is proven. The most popu• lar CGI scripting languages are Perl and Python, but Ruby and Tel are also popular choices. I'll use the word script throughout this chapter, regardless of the language in which the program was written. In this section, you'll see how to configure Apache to enable CGI, have CGI scripts triggered by events, and ensure that you have CGI enabled exactly where you intend and nowhere else. You'll also look at how Apache communicates with CGI scripts through environment variables. Finally, you'll apply some of this knowledge to imple• ment and debug some example CGI scripts. Much of this is also applicable to handlers, filters, and FastCGI scripts, which I'll discuss at the end of the chapter.
CGI and the Environment The primary mechanism by which Apache and CGI scripts communicate is the envi• ronment. CGI scripts are loaded and executed whenever Apache receives a request for them. They discover information about the nature of the particular request through environment variables, which tell the script the URL it was called with, any additional parameters, the HTTP method, and general information about both the server and the requesting agent. Additional data may also be provided via the script's standard input; the entity body of a POST request is passed to a script in this manner. From this range of data, a CGI script has to extract the information it needs to do its job. It then communicates back to Apache through its standard output, returning a suitable chunk of dynamic content, correctly prefixed with headers, to satisfy the request. Apache will then make any further adjustments it sees fit (for instance, passing the response through one or more filters), stamps the response with the server token string and date, and sends it to the client.
321 Chapter6
Important Environment Variables To provide CGI scripts with the information they need to know, Web servers define a standard set of environment variables on which all CGI scripts can rely. This standard set is defined by the CGI protocol, so scripts know that they can rely on them being set correctly, regardless of what Web server executed them. Apache also adds its own envi• ronment variables, which vary depending on which modules have had a chance to process the request before it reached the CGI script. The most important of these variables for CGI scripts are the following:
• REQUEST_ METHOD: How the script was called, GET or POST
• PATH_INFO: The relative path of the requested resource
• PATH_ TRANS LA TED: The absolute path of the requested resource
• QUERY_ STRING: Additional supplied parameters, if any
• SCRIPT_NAME: The actual name of the script
Depending on how CGI scripts are called (with GET or POSD and how CGI scripts have been enabled, the variables presented to a CGI script may vary. For example, if ScriptAlias has been used, SCRIPT _NAME will contain the alias (that is, the name the client called the script by) rather than the real name of the script. Likewise, only CGI scripts called with GET should expect to see anything defined in QUERY_ STRING. In fact, it's possible to request a URL with a query string with a POST method in case the amount of information is large.
NOTE Online Appendix D contains a full list ofstandard and Apache-contributed environment variables.
Viewing the Environment with a CGI script While I'm on the subject of environment variables, a constructive example to display them is useful. The following is a simple CGI script that'll print all environment vari• ables in Unix:
#!/bin/sh echo "Content-type: text/plain" echo env
322 Delivering Dynamic Content
A similar script for Wmdows platforms is as follows:
@ECHO OFF ECHO Content-Type: text/plain TYPE lf set
Note that this uses aDoS hack; I use this to get around DoS batch IDes' inability to output a simple Unix-style linefeed character (DoS outputs a carriage return followed by a linefeed). The TYPE command takes the content of a IDe and outputs it (such as the Unix cat command), so I simply create a IDe (called 1f) in the same directory as the batch IDe, which contains a single Unix linefeed. This ensures that the obligatory two linefeed characters in a row appear between the headers and the content. Ifyou happen to have mod_ include enabled, you can also use the printenv SSI command:
<1--#printenv -->
This includes an unformatted list of all known environment variables into the out• put page. Because it's unformatted,
•••tags are a great help in making the result actually legible. Otherwise, for these examples to work, you need to tell Apache how to recognize CGI scripts.
Configuring Apache to Recognize CGI Scripts To enable Apache to recognize a CGI script, it's necessary to do more than simply put it somewhere on the server and expect Apache to run it. CGI scripts are handled by Apache with mod_ cgi (or the equivalent mod_ cgid for threaded Unix Apache 2 servers) and can be enabled in one of two basic ways. You can use Apache's ScriptAlias directive, which simultaneously allows a URL inside the document root to be mapped onto any valid directory name and marks that directory as containing executable scripts. Alternatively, you can use the ExecCGI option specified in an Options directive:
Options ExecCGI
This allows greater flexibility in defining what Apache treats as CGI and allows CGI scripts to be identified individually, by wildcard or regular expression matching, by MIME type, and by extension. It also allows CGI scripts to be triggered when certain types of IDe content are accessed.
323 Chapter6
Setting Up a CGI Directory with ScriptAlias If you only want to allow CGI scripts in one systemwide directory, you can enable those using ScriptAlias. One good use for this is to improve the security ofWeb sites that allow users to upload their own Web pages. By placing the aliased directory outside the document root of the server, you can allow users to run the CGI scripts you supply but not interfere with them or add their own. This works because they don't have access to the CGI directory, and Apache will not recognize scripts in the document root as CGI scripts. The standard Apache configuration comes with a ScriptAlias directive to specify a directory where CGI scripts can be kept. The default (for a Unix server) is as follows:
ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/"
Apache then interprets any incoming request URL as a request to run a CGI script and will attempt to run a script by that name if it's present:
http://www.mydomain.com/cgi-bin/
The trailing I on both the alias and target directory might appear redundant, but it forces scripts to appear in the directory rather than just extend the name. Without the trailing slashes, you'd have this:
ScriptAlias /cgi-bin "/usr/local/apache/cgi-bin"
This would allow users to ask for URLs such as I cgi -bin-oldlbadscript. cgi and have Apache execute them as lusrllocallapachelcgi-bin-oldlbadscript.cgi. This would be terribly unfortunate if you happen to have put some problematic scripts in a neighboring directory while you test them, safe in the incorrect knowledge that they were out of harm's way.
Hiding CGI Scripts with ScriptAlias and PATH_INFO The usual method of passing parameters to CGI scripts in a URL is by a query string. The following URL:
http://www.alpha-complex.com/cgi-bin/script.cgi?document.html
would create the following value for QUERY_ STRING:
document.html
324 Delivering Dynamic Content
This approach has a few problems:
• It's obvious to the client that a CGI script is being used to generate the output and easy for unscrupulous users to play with the data in the URL. This can be significant if the client chooses to alter its behavior when it sees a CGI in the URL; for example, some search engines won't follow URLs with CGI scripts when indexing a Web site, which can prevent important content from being found.
• Badly intentioned clients can try to break a CGI script or get past security meas• ures if they know there's a script present. I'll cover this issue in some depth later in the chapter.
By hiding the script, you can reduce, if not avoid, these problems. By default, Apache 1.3 and Apache 2 will both accept a request with extra path information and relay it to CGI scripts. Even if you were to allow CGI scripts with no extension, the question mark gives away the presence of the script. However, ScriptAlias has a useful side effect; when it matches a URL that's longer than the alias, the URL is split in two, and the latter part is presented to the CGI script in the PATH_ INFO environment variable. This allows you to pass one parameter to a CGI script invisibly. For example, you could define a subtler ScriptAlias such as this:
ScriptAlias /directory/ "/usr/local/apache/secret-cgi-bin/"
Ifyou place a CGI script called subdirectory in secret-cgi-bin, you can request this: http://www.alpha-complex.com/directory/subdirectory/document.html
The script will execute with the following value for the PATH_ INFO environment variable:
/document.html
The CGI script is called with the remainder of the URL and can generate whatever dynamic output it pleases. But as far as the client is concerned, it has requested and received an ordinary HTML document. A problem with this approach that you might need to be aware on Apache 2 servers is that directives provided by mod_ mime such as AddOutput Filter won't process the results of URLs that were handled using additional path information based on their extensions. Instead, as discussed in Chapter 5, the additional part of the path is disre• garded for the purposes of determining the MIME type. To tell mod_ mime to use all the path information when considering which filters to apply to a processed URL, you must use this:
ModMimeUsePathlnfo on
325 Chapter6
There are two more directives you might need to consider if you're using server• parsed documents to handle URLS with additional path information. Although it's fine to disguise CGI scripts this way, Apache 2 doesn't accept requests for server-parsed documents with extra contain path information unless you also specify this:
AcceptPathlnfo on
You can also restore default behavior with default or force the feature off entirely with off. This would prevent hidden scripts from working entirely. Ifyou need to allow encoded slashes in the path information, which Apache 2 by default rejects, you also need this:
AllowEncodedSlashes on
Note that these directives are only significant if you're attempting to hide a server• parsed document (including both SSis and other dynamic documents such as PHP pages); they don't matter for ordinary CGI scripts. See the earlier "Passing Trailing Path Information to SSis (and Other Dynamic Documents)" section for a more detailed analysis of this issue.
Using ScriptAliasMatch to Define Multiple Directories You can have as many ScriptAliased directories as you like. If you were dividing a Web site into logical areas, you could use this:
ScriptAlias /area_one/cgi-bin/ "/usr/local/apache/cgi-bin/" ScriptAlias /area_two/cgi-bin/ "/usr/local/apache/cgi-bin/" ScriptAlias /area_three/cgi-bin/ "/usr/local/apache/cgi-bin/"
However, you could write this more efficiently using the ScriptAliasMatch direc• tive, which allows you to specify a regular expression instead of a relative URL:
ScriptAliasMatch /area_.*/cgi-bin/ "/usr/local/apache/cgi-bin/"
Here • * means "match anything zero or more times." A better solution that doesn't match intermediate directories might be ["'I]+, that is, "match anything except slash one or more times." See Online Appendix F for more information about writing regular expressions. Incidentally, administrators who are looking to achieve this kind of effect with vir• tual hosts might want to investigate the VirtualScriptAlias and VirtualScriptAliasiP directives ofmod_vhost_alias in Chapter 7 as an interesting alternative.
326 Delivering Dynamic Content
Improving Security in ScriptAliased Directories You've already seen that it pays to terminate a ScriptAlias directive with slashes for both the alias and target directories. To improve the security of a CGI directory further, you can use a Directory container tag to prevent using . htaccess files that could weaken the server's security:
Note that because this CGI directory is outside the document root (by default, /usr/local/apache/htdocs), you must use
Setting Up a CGI Directory with ExecCGI: A Simple Way ScriptAlias is actually identical to Apache's more generic Alias directive, but it also marks the directory as containing CGI scripts. It's thus, effectively, a shorthand method for specifying several directives such as this:
Alias /cgi-bin/ "/usr/local/apache/cgi-bin/"
Declaring Individual Files As CGI Scripts To make specific files into CGI scripts rather than a whole directory, you can use a
327 Chapter6
You can also put a
You can even mimic a
On its own, this setup will apply to rues within the Web site, hence the use of /home/web/alpha.:.Complex in the pathnames; in the previous example, the cgi-bin directory is still inside the document root. However, if you want these rues to exist outside the document root instead, you could do so by adding an Alias or AliasMatch directive.
Defining CGI Scripts by Extension The SetHandler directive in Apache causes a handler to be called for any me in its enclosing
AddHandler cgi-script .cgi .pl .py .exe .bat
NOTE For this to work, ExecCGI must also have been specified as an option; ScriptAlias is the only directive that enables CGI without ExecCGI being specified.
Defining CGI Scripts by Media Type mod_cgi and mod_cgid recognize rues with a MIME type ofapplication/x-httpd-cgi as being CGI scripts, and Apache will automatically pass a me to them for execution if this
328 Delivering Dynamic Content media type is associated with the file. Apache's AddType directive gives you the ability to relate file extensions to MIME types. This gives you another way to define CGI scripts by extension:
AddType application/x-httpd-cgi .cgi
However, although this works, it's deprecated in favor of associating the handler with the extension directlyusingAddHandler.
Setting Up a CGI Directory with ExecCGI: A Better Way Another way to specify a directory for CGI scripts is to use ExecCGI and SetHandler inside a
Specifying ExecCGI as an option enables files to be interpreted as CGI scripts, and SetHandler then marks the whole directory (and any subdirectories) as a CGI -enabled location. You could allow a mixture of files in this directory by defining a file extension so that only files that also end with • cgi are considered to be CGI scripts:
You could also achieve the same effect using AddType:
329 Chapter6
Triggering CGI Scripts on Events Apache allows CGI scripts to intercept access requests for certain types of resources and reply to the client with their own generated output instead of delivering the origi• nal file (if indeed there is one). This can be useful for hiding the existence of the script from the client. There are three basic approaches, depending on the required behavior. You can have Apache pass a request to a specified CGI script based on the following:
• The media type (MIME type) of the requested resource
• The file extension of the requested resource
• The HTIP method of the request sent by the client
All of these approaches use combinations of the Action or Script directives sup• plied by mod_ action. I'll cover this module in detail later on in the chapter; at this point, you should be interested in its application with CGI configuration.
Configuring Media Types to Be Processed by CGI You can configure Apache to call a CGI script when a particular media type is accessed. You do this by adding an Action directive for the MIME type of the media you want to handle. For example, you can parse HTML files with the following:
Action text/html /cgi-bin/parse-html.cgi
What parse-html. cgi does is entirely up to you. It could, for example, check the timestamp of the passed URL against the time stamp of a template and database record and then regenerate the HTML if it's out-of-date as a caching mechanism. Alter• natively, it could process directives the way mod_ include does. The one thing it must do is return a valid response to Apache, preferably (but not necessarily) related to what• ever was originally asked for. Note that the second argument to Action is a URL, not a file. This URL is treated by Apache using the usual directives for CGI scripts, which is to say that ScriptAlias and ExecCGI are important and do affect where and if the script executes. In this example, a ScriptAlias for I cgi -bin/ would apply if it were in the configuration.
Configuring File Extensions to Be Processed by CGI The first argument to Action doesn't have to be a MIME type. You can also have Apache call a CGI script when a file with a given extension is seen. However, instead of using the built-in handler cgi-script supplied by mod_ cgi or mod_ cgid, you can define your own handler with Action and relate an extension to it using AddHandler as before:
Action my-handler /cgi-bin/myhandler.cgi AddHandler my-handler .myextension
330 Delivering Dynamic Content
The choice of handler name is arbitrary. You define the meaning of it with the Action directive, after which AddHandler is free to use it as a handler name alongside Apache's built-in handlers.
Configuring User-Defined Media Types to Be Processed by CGI In parallel with defining CGI scripts by media type, which I covered previously, you can also configure a CGI script to be triggered on a file extension by defining your own media type. This is no different from configuring an existing media type to be processed by CGI, except that you have to define the MIME type for the media first:
AddType text/my-parsed-file .myextension Action text/my-parsed-file /cgi-bin/parsemyfiletype.cgi
The MIME type can have almost any name, but it helps for all concerned if you choose something sensible for the first component, for example, image for graphic files rather than text. Custom MIME types that don't relate to documents or images that a generic browser might be expected to have a stab at handling itself should generally have a major part of application and a minor part beginning with x--for example, application/x-alphacomplex. Alternatively, if the response constitutes text that can at least be read, even if it can't be correctly interpreted, a major part of text is permissi• ble. This affects what browser clients will do when presented with a MIME type they don't know how to handle, rendering it in the window or offering a Save dialog box. One possible use of this approach is to convert files from formats that browsers can't handle into ones they can, for example, converting foreign image formats into GIFs. This allows you to keep files in their original format but still allow browsers to access them without additional software. Ideally, you'd also add an ability to cache the results of such a conversion because it's wasteful to repeatedly convert static data.
Configuring HTTP Methods to Be Processed by CGI It's also possible, though considerably less common, to configure Apache to respond to a specific HTTP request method. One example might be for a script designed to handle PUT requests, a method for which Apache normally has no valid response. To give it one, you can use this:
Script PUT /cgi-bin/put.cgi
This would cause put. cgi to be called whenever a client sends the server a PUT command. You could use this to define a special directory for clients to upload files to, rather like an FTP incoming directory. You might also want to use a
331 Chapter6
Also note that this directive can't be used to catch GET requests unless they have a query string, that is, a normal URL, followed by a ?, followed by a list of keywords or parameters. Ifyou want to force Apache to pass GET requests to a script this way, then you can use mod_ rewrite to artificially add a query string if one isn't present:
RewriteRule !\? - [C] RewriteRule (.*) $1?no_query
The first rule attempts to match a ? anywhere in the requested URL. Because ? has special significance in regular expressions, you escape it with a \.The test is negated by prefixing it with an exclamation, so it only succeeds if a question mark isn't present. It's chained via the [ C] flag to the second rule, which only executes if a question mark isn't found. It matches the entire URL and adds a query string containing the (arbitrarily chosen) keyword no_ query. You can now write the put. cgi script to handle this special case by checking explicitly for this keyword. Like many other solutions, this one is a little expensive if you apply it globally. It's a good idea to enclose all of the previous directives within
ISINDEX-Style CGI Scripts and Command Line Arguments Technically, the CGI protocol isn't the only way that Apache can communicate with a CGI script. HTML supports an element called
http://www.alpha-complex.com/isindex-script.cgi?findme
is converted into this command:
$ /home/sites/alpha-complex.com/cgi-bin/isindex.cgi findme
The query string is automatically URL decoded so that when you read it from the command line, you don't need to decode it yourself. However, although this is interest• ing from a historical point of view, it's not particularly useful because creating a proper HTML form and URL decoding the query string isn't a great deal more effort. It's also a potential security risk because it's notoriously difficult to parse command line argu• ments safely. Theoretically Apache isn't vulnerable to any security issues surrounding the passing of
CGICommandArgs off
332 Delivering Dynamic Content
With this directive specified, you can still extract the query string using the QUERY_ STRING environment variable, albeit still URL-encoded, so you've lost nothing by removing this feature unless you have a legacy CGI script that requires you use it this way.
Writing and Debugging CGI Scripts Having configured Apache to run CGI scripts when and where you want it to, you'll now turn to actually implementing a CGI script. As I commented earlier, although CGI scripts are traditionally written in a scripting language, it's perfectly permissible for CGI programs to be written in any language. However, Cis probably the most popular compiled language used for CGI programming. As long as a program obeys the requirements of the HTTP and CGI protocols, it can be a CGI script. To start with, you'll create a very simple CGI script and then make it progressively more interactive. At the same time, you'll also learn about adding headers to the HTTP response sent by the script to change the browser's behavior and to debug badly behaved scripts. This is the CGI response between the script and the server; HTTP is between the server (Apache) and the client. However, the CGI response must be a valid HTTP response, so the distinction is, for the most part, moot.
A Minimal CGI Script The first thing a CGI script must send in its output is a Content-Type header with a MIME type, followed by two linefeeds. The following minimal (albeit not terribly use• ful) CGI script illustrates how a Unix shell script might do it:
#!/bin/sh echo "Content-Type: text/html" echo echo "
Good Morning Alpha Complex!
"Without a Content-Type header, a browser won't know how to handle the rest of the script output and will ignore it. In this case, you're sending a short but functional HTML document, so you send textlhtml to let the client know an H1ML document is coming. Clients detect the end of the header section and the start of the actual content by looking for a blank line. In this example, you use a second echo with no argument to achieve the second linefeed. The Wmdows shell, Command •COM, is fairly limited. If you intend to use shell• scripted CGI on Wmdows, you might want to look into the Cygwin tools, which provide an effective Unix Bash shell environment that will run on Wmdows. Check out http: I I cygwin. com/ for freely downloadable versions of these tools. Alternatively, Perl and Python are both excellent portable languages for writing CGI scripts and are available forWmdows and all other significant (and many insignifi• cant) platforms. Some scripting languages such PHP-when used for CGI, also take care of generating content headers automatically.
333 Chapter6
The first line of the script tells Apache what interpreter to use to run the script. This follows a standard Unix tradition understood by most Unix shells-if the first line of an executable text file starts with#! (known as a shebang line, from shell hash-bang), then the rest of the line is the path to the interpreter for the file, in this case, the Bourne shell sh. On other platforms, includingWmdows, Apache also understands this convention even if the underlying platform doesn't; in Wmdows, you can choose either to use the interpreter specified in the# I line or, alternatively, to look up an interpreter in the reg• istry based on the file extension of the script file. To use the script's interpreter line, write this:
ScriptinterpreterSource script
Instead, to look up the Wmdows registry with the script's file extension, write this:
ScriptinterpreterSource registry
This searches for SHELL \OPEN\COMMAND entries in the registry for programs capable of opening files with the given extension. However, this isn't very specific. In Apache 2, to refine the search to only commands that are truly eligible for running CGI scripts, you can instead search the alternate path SHELL \EXECCGI\COMMAND with this:
ScriptinterpreterSource registry-strict
As a security precaution, • cmd and . bat scripts are always run using the program named in COMSPEC (for example, co11111and. com) irrespective of the setting of Scriptin• terpeterSource. Looking up extensions in the registry is only effective when the correlation between extension and language is constant; it'll work fine if your Perl scripts end in • pl and your Python scripts end in . py, but you may have trouble with more generic extensions such as . cgi, which can equally be a binary executable, shell script, or any other kind of executable file. Conversely, refusing to parse the shebang line and using the registry instead makes it a little bit harder for users to install dubious scripts on sys• tems where overall platform security isn't as strong. Being able to specify an interpreter allows you to use a different scripting language if you want, for example, Perl, which is probably the most popular language for CGI scripting on the Internet. (I don't count Java here because Java is compiled and isn't typically used for CGI programming; servlets are a more complex beast, covered in Chapter 12.) Here's the same minimal CGI script given previously, rewritten in Perl:
#!/usr/bin/perl -Tw print "Content-Type: text/html\n\n"; print "
Hello World
";334 Delivering Dynamic Content
Unlike the Unix echo command, Perl must be told to print linefeeds explicitly, which you do using the escape code convention derived from the C language. To pro• duce a linefeed, use \n. You don't actually need the linefeeds that the echo program added to the end of the lines because HTML doesn't give them any meaning. In this example, I've assumed that Perl lives in /usr/bin; the actual location may vary between platforms. I've also added an argument to the Perl interpreter call on the first line. This is a combination of two flags that all Perl scripts ought to have:
• -w tells Perl to generate warnings, which will appear in the error log.
• -T tells Perl to use taint checking, which is a security device specifically intended for detecting potential loopholes in CGI scripts. I'll mention taint checking again later.
It's possible to create quite involved CGI scripts that don't require input from the user, for example, an on-the-fly list of names and addresses extracted from a file. How• ever, for most CGI scripts to be useful, the user needs to be able to pass information to them. HTTP provides two primary ways for CGI scripts to gather information from a client request-the GET method and the POST method. You can extend Apache and your scripts to handle additional methods, including but not limited to PUT, which you saw earlier, and the rarely seen DELETE.
The GET Method
The vast majority of HTTP requests made by clients are GET requests, whether to access static data or to call a CGI script. Additional data such as form fields are transmitted to the CGI script in the URL path by specifying a query string, so called because of the question mark that separates it from the script path. The following is an example of an HTTP request using GET and a query string:
GET /cgi-bin/askname?firstname=John&surname=Smith HTTP/1.1 Host: www.alpha-complex.com
In this example, /cgi-bin/askname is the CGI script. Apache automatically detects the question mark and transfers the remainder of the URL, without the leading ques• tion mark, into the environment variable QUERY_ STRING. This can then be accessed by the script. In the previous example, QUERY_ STRING would be the following: firstname=John&surname=Smith
Because the URL must conform to fairly stringent requirements to guarantee that it can be interpreted correctly, several characters can't validly appear in it. These include many accented characters-the ? ,&, and =characters (which have special sig• nificance in query strings), forward slashes, periods, and spaces. Obviously, you need
335 Chapter6
to treat these characters specially if you're going to pass them as data and then inter• pret them correctly in a script. Before these characters can be sent from the client to the server, they have to be encoded by being replaced by their ASCII values in hexadecimal in accordance with ISO 8859-1 (Latin-I), with a percentage sign placed in front. This process is known as URL encoding. If, in the previous example, Milller is entered as the surname, a client would transmit M%FCller in the URL path to the Web server. If an HTML form generated the query, then another type of encoding, form escap• ing, may also take place. This is designed to allow characters with meaning in HTML to be handled safely in forms; for example, it'll turn spaces into plus signs, and you can see this in the resulting queries sent to search engines. A space is encoded as %20. A+ sign must be URL encoded to be sent safely. Most programming languages have libraries available to handle CGI program• ming, including the decoding of URL-encoded text strings. C programmers can take advantage of libWWIII, available from the W3C Web site (http: I /WWIII. w3 •org). Perl pro• grammers can take advantage of the immensely capable (and ever-evolving) CGI. pm module, which automatically decodes the query string and also transparently handles GET and POST methods. Tel programmers can use either the ncgi package included with tcllib or Don Libes's cgi package available at http: I I expect. nist. gov I cgi. tel/.
The POST Method The second option for transmitting data to a CGI script is the POST method. In this case, the data is passed to the server in the body of the request. CGI scripts see this data as input to the program on the standard input. For example:
POST /cgi-bin/askname HTTP/1.1 Host: www.alpha-complex.com Content-Length: 29
firstname=John&surname=Smith
POST methods are required to use the Content-length or Transfer-Encoding header to define how much data is being sent in the body of the request. CGI scripts can use these headers, which are passed into environment variables, to determine how much data to expect on its standard input. POST bodies aren't subject to URL escaping as GET queries are, but they may still be subject to form escaping.
Choosing Between GET and POST The choice of GET or POST is mostly a personal preference. Although they differ substan• tially in how they're structured and processed, both methods in the end boil down to transferring data from the client to the server. Once it's there, it doesn't much matter how it arrived. For large quantities of data, POST is advisable because servers aren't obliged to process URLs beyond a length of 256 characters; this limits the amount of data a GET -driven CGI script can receive, especially given the additional burden
336 Delivering Dynamic Content imposed by URL escaping. This is compounded if the client is using a proxy, which adds the proxy's URL details to the client's originating URL. Conversely, because GET -driven scripts gather input data from the request URL, it's possible for a user to bookmark the resultant page in a Web browser. This can be incredibly useful, for example, in bookmarking the results of a particular set of criteria passed to a search engine; the user only has to recall the URL to regenerate the search on fresh data. The data submitted via a POST request isn't generally retained by the browser and is in fact explicitly forbidden if the server has sent cache control headers denying its reuse. If a client bookmarks the result so it can return later, the data wouldn't be resub• mitted. Worse, the URL would be called up via a GET request, which may not even be a valid form of access. A well-designed CGI script that doesn't need to allow for large quantities of input should therefore keep the URL as short as possible and use GET so users can bookmark the result, unless of course, you want to actually prevent clients from being able to do this. In that case, POST is your man. There's no reason why a CGI script shouldn't be able to handle both GET and POST requests. This makes it less prone to malfunction if it receives parameters different from those it expects and also makes it more resilient because you can change how it's called. Smart programmers will use a CGI library, such as CGI. pm for Perl, and have the whole problem dealt with for them.
Interactive Scripts: A Simple Form The GET and POST requests to drive a CGI script are generated by the client, usually a browser program. Although a GET request can be generated in response to clicking a link (note that it's perfectly legal to say •• •