<<

Behavior Research Methods. Instruments. & 1997.29 (2), 189-193

Operate your own World-Wide

WIUlAM C. SCHMIDT, RONHOFFMAN, and JOHN MAcDONALD Dalhousie University, Halifax, Nova Scotia, Canada

Although many researchers wishing to use the World-Wide Webfor academic purposes rely on cen­ tralized Webservices, they should be aware thatit is neither expensive nor difficult to operate their own server. Doing so provides research-related benefits such as complete control over their host name and documents provided, the guaranteed ability to execute common gateway interface and server-side in­ clude programs, immediate access to their collected , and the ability to better control who partic­ ipates in their experiments. This paper surveys Web-server features likely to be of interest to psychologists and conceptually summarizes their operation and use. The basic steps required to set up a Webserver on popular microcomputers are reviewed, and security issues concerning Web-serveropera­ tion are discussed. An accompanying resource Webpage can assist users in setting up their own servers.

Numerous applications ofWeb technology promise to These include the ability to use common gateway inter­ assist psychologists. A wide range ofuses for psychology face (COl) programs, server-side includes (SSI), and have already been promoted; these include using the image maps. The benefits of operating your own Web World-Wide Web (WWW) as an instructional aid (Welch server, as opposed to entrusting this task to others, will & Krantz, 1996) and presenting lab and departmental then be examined and security issues pertaining to server Web pages (Krantz, 1995). Some of the more attractive operation discussed. uses of the WWW for psychological research involve interaction with remote participants for the purpose of THE CLIENT/SERVER RELATIONSHIP gathering data in experiments conducted via the Web. Work along this line has already been reported for psychoacous­ Figure 1 depicts a small portion ofthe Web. A client is tical experiments (Welch & Krantz, 1996) and for survey a that runs a , requests informa­ research (Schmidt, 1997). Other research and teaching tion from a Web server, and displays received information applications are sure to follow. to the user according to a predefined convention (hyper­ This paper provides researchers who wish to conduct text markup language, or HTML). experiments via the Web with a conceptual-level descrip­ Within HTML code, links may be found. Such links pro­ tion ofthe operation ofa Web server and familiarizes the vide (universal resource locator-a specially for­ reader with features ofservers that may be ofuse in creat­ matted address ofa document on the Web) to other Web ing well-implemented experiments. Specific Web-server documents or programs, and following such links presents software is not discussed (due to the rapidity with which the user with the content referenced by these. Because of details associated with such software change and the the versatility ofthe URL, any HTML document can ref­ variations that exist across computer platforms). How­ erence any document or program on other Web servers ever, a Web page has been created (http://or.psychology. anywhere on the Web. dal.ca/r-wcs/ WebServerLinks.html) to help users get Web-server software runs in the background on the host started in finding server software that satisfies their goals computer, waiting for document requests to come from and is appropriate for their machines. This resource page various clients. The server responds to such requests by also includes rudimentary directions for setting up a sending HTML-formatted information to the client. The server on a number of popular platforms and provides specific details ofsuch requests conform to the hypertext links to other Web-server resource pages. transfer protocol (HTTP) standard. Clients send requests After discussing the basic concepts underlying the to receive the contents ofa file containing HTML (which client/server relationship, a number ofsoftware features is then displayed) or the output ofa program referenced provided by many Web-server programs will be presented. by the URL (the so-called CGI program, the output of which is displayed).

The writing ofthis paper was supported by NSERC Canada and the WEB-SERVER FEATURES Izzak Walton Killam Foundation. A page maintained by the first author containing all ofthe links to the resources referenced in this paper, as Different server programs do not have identical capa­ well as topically related links, can be found at http://or.psychology. bilities. For this reason, it is important for experimenters dal.ca/-wcs/WebServerLinks.html. W.C. Schmidt's address is: Depart­ ment ofPsychology, Dalhousie University, Halifax, NS, Canada 83H 4J I to understand precisely what their server software is ca­ (e-mail: [email protected]). pable of doing. Understanding these server features can

189 Copyright 1997 Psychonomic Society, Inc. 190 SCHMIDT, HOFFMAN, AND MACDONALD

Computer B Computer A specific information back to the user. The capabilities of the CGI are limited only by programmers, their imagina­ I doc.htm! tion, and the limitations inherent in HTML. Whenever a , user clicks on a button, for instance, a CGI program is called to process the information submitted. i::::~~~~:~!'il Hypertext Link Whenever a CGI program is run, privileged informa­ tion about the HTTP transaction or the environments of both the host and client computers becomes available to the CGI program. Inferences based on such information can help determine actions to take in experimental de­ livery. Table 1 lists some environmental information available to CGI programs and a few ways that such in­ formation could be used by designers ofpsychology ex­ periments to improve the integrity ofthe data collected. For a complete list ofenvironment information available to CGI programs, consult http://hoohoo.ncsa.uiuc.edu/ Computer C cgi/env. html. To be able to run CGIs at all, the experimenter needs Figure 1. The client/server relationship. to be using a Web server that is able to execute CGI pro­ grams, and must have permission to install CGI pro­ grams on the host. Not all servers have CGI capabilities, assist experimenters in overcoming many obstacles to particularly some free or inexpensive server applications. successful data collection by taking appropriate mea­ sures in the implementation oftheir experiments. Server-Side Includes (SSI) Just as CGI programs add the ability for a Web server Common Gateway Interface (CGI) Programs to dynamically deliver information in response to a user's The CGI program is the most powerful component be­ actions (i.e., clicking on a button), SSIs add the ability to hind experiments conducted via the Web, and much use­ execute programs or other directives, but at the time of ful information has already been reported on them (see document delivery and without any actions on the part of Kieley, 1996). These programs enable one to dynami­ the user. SSIs allow the Web server to deliver documents cally use information supplied by the user of the client with tailored information to the client. Built-in instruc­ computer, to transform that information, and to supply tions or specially written programs similar to CGIs are

Table 1 Environment Variables Available to CGI and SSI Programs Variable Name Description Potential Use -the URL of the document referencing -ensure that only clients using authentic the CGI on host documents are sending data to your CGI -the browser the client is using to send -knowledge of the user's browser can the request assist in sending properly formatted -contains not only browser name, but HTML (i.e., lynx text browser versus version and system the user is running graphical browsers) (i.e., Windows version, MacOS CPU) -the IP address ofthe remote computer -track who is sending information to hosting the client that is making the your CGI program, or whom your SSI request is sending information to -if available, the name of the -track who is sending information to REMOTE_ADDR computer your CGI program, or whom your SSI -ifunavailable, same as is sending information to REMOTE_ADDR SERVER_NAME -the server's hostname -may be used by self-referencing URLs AUTH_TYPE -ifserver supports protection, -useful to know how to decode userl then this variable holds protocol password information -ifserver supports password protection, -useful to access identity of user from and the script is protected, this is the within CGI/SSI username approved SCRIPT_NAME -holds virtual path to the script -may be used by self-referencing URLs PATH_INFO -URLs may contain additional info -allows extra information to be passed beyond the SCRIPT_NAME in the URL request -PATH_INFO holds this raw information WWW SERVER 191

executed by the Web server when they are encountered Most SSI-capable servers provide facilities for a num­ within an HTML document and are expanded inline. That ber of functions to Web administrators without pro­ is, before delivering information to the client, the server gramming, such as including the contents offiles in the software filters each document requested, looking for spe­ document being distributed to the client (i.e., embedding cific keywords that instruct it to carry out SSI tasks. The a file within another), the date or time that the client's extra information included can be dynamically computed request was answered, a file's last modification date, or at the time that the document is delivered, allowing spe­ any of the environment information listed in Table 2. cialized or up-to-date information to be delivered to tar­ Not all Web servers support SSIs, so this is a feature geted users. to consider when selecting Web-server software. Keep in The SSI Web-server component fits into the server­ mind that there is a small time overhead in the delivery client relationship on the server side (see the SSI loop of of SSI documents, because they must be interpreted by Computer B in Figure 1). The client first makes a docu­ the server, and actions taken, prior to sending the client ment request. This document is fetched and examined by code. For this reason SSIs are not usually enabled by de­ the Web-server software for included instructions to the fault; to be enabled, they generally require some access server. Any instructions found are executed and replaced to the Web-server program. However, the time delay is by the HTML output ofthe SSI instruction on the server negligible and unlikely to be noticeable in the low traf­ side, all before the client is sent any information. Once the fie that psychology experiments would produce. HTML code to be sent to the client is assembled, the server transmits this information. It is unlikely that the client Image Maps will distinguish between having been sent a dynamic SSI Another feature that some Web-serversoftware provides document or a regular static HTML document. that may be ofinterest to psychology experimenters is the With the use of SSIs, specialized documents can be image map. Readers with graphical browsing experience generated on the fly to satisfy certain goals ofthe exper­ on the Web are likely to be familiar with image maps­ imenter. For example, suppose that the server adminis­ pictures that have different URLs associated with differ­ trator is conducting a survey, and wishes to present a ent locations or objects in the picture. By clicking on the differently formatted page to users oftext-based versus object, one is transferred to the site ofa designated URL graphics-based browsers. SSI programs have access to associated with that object. all ofthe environment information that CGI programs do Image-map capabilities can be ofuse to psychology ex­ (see Table 1), as well as to extra pieces of information perimenters in a number ofways. Collecting the coordi­ about the client's request (see Table 2). Because ofthis en­ nates of the mouse cursor when the user clicks along a vironment information, an SSI program can determine picture ofa scale (categorical or continuous) can be im­ the type ofbrowser being used by the client (by examin­ plemented by way ofan image map. Similarly, one could ing the USER_AGENT -see Table 1) use image maps to collect categorical responses among and select different HTML content to send, depending a set ofprovided options. upon the browser in use. This method allows for the graph­ There are two types ofimage maps: server and client ical delivery of information despite inconsistencies that side. Client-side image maps do not involve the Web­ different browsers (and sometimes different versions of server program, so no special Web-server feature is re­ the same browser) have in the presentation ofcertain types quired. However, client-side image maps do not give the ofgraphic information. As another example, suppose that server the opportunity to collect information about the an experimenter wished to ask international users a set of user's click. On the other hand, server-side image-map questions that differed slightly from those asked of domes­ programs, which are CGIs, process information about the tic users. Again, an appropriately written SSI program click and make decisions about how to respond. Psy­ could determine the user's geographical locale by using chologists may wish to use the server-side method so that the client's or address, and could they have control overprecisely where participants clicked then deliver the appropriate material. Other uses for SSIs and can record such clicks as data points rather than sim­ include randomly selecting images, sounds, links, or ques­ ply have them trigger the presentation ofnew Web pages. tions to present, and updating internal hit counters. Password Protection and Domain Restriction Table 2 For some experiments conducted via the Web, it may Additional Environment Variables Available to SSt Programs be desirable to restrict document access to authorized or Variable Name Description desired participants. Some Web servers have built-in pass­ DOCUMENT_NAME -the html document name word protection (or user-authentication) facilities that DOCUMENT_URI -the path from server root to this document can fulfill precisely this demand. Servers with password­ (such as /docs/tutorials/foo.shtml) based document access require that the administrator DATE_LOCAL -the current date. host's time zone create a user and password list for document directories DATE_GMT -same as DATE_LOCAL but in Greenwich that are to be accessed only by authorized users. For Web mean time servers that have this feature and are run on microcom­ -the last modification date ofthe current puters, utility programs are often supplied to assist in eas­ document ily adding or removing users and/or changing . 192 SCHMIDT, HOFFMAN, AND MACDONALD

Whenever a client attempts to access a password-protected or reconfiguring and testing the server on line. It is not Web page, they are asked for a user name and password technically difficult or expensive to operate one's own which is compared with the server's . The re­ Web server, thereby excluding the intermediary and ex­ quested documents are delivered only to registered users erting complete control over one's experiments. All that with correct passwords. is required to operate your own server is a popular micro­ Independent ofpassword protection capabilities, some computer (i.e., a Macintosh or PC-compatible system) Web servers can deny clients access 011 the basis oftheir configured for internet access' and running appropriately network addresses. This feature is known as domain re­ chosen server software (which is often freely available). striction, and it could potentially assist psychology ex­ Assuming that researchers are going to use CGI pro­ perimenters by allowing them to restrict the population grams to collect data, or that they will take advantage of that they access on the basis of the potential client's geo­ server features that can assist them in confidently col­ graphical information. For instance, suppose that one lecting data through the Web, then the benefits ofoper­ were restricting one's survey to members ofa specific uni­ ating their own Web servers are obvious. They will not versity. In this case, the experimenter could arrange the have to rely on others to configure their experiments pre­ server so that it would not deliver their survey to clients cisely as they want them, and they will have instant access whose domain oforigin did not match what was desired. to the ability to track the experiments under way.Operat­ If the survey were to be restricted to Stanford University, ing their own servers provides researchers with a savings then only clients from computers whose domain ended in oftime and frustration, particularly during development. .stanford.edu could be serviced. It should be noted that this It can also enable researchers to attempt broader and more method is rather coarse: users with telnet access to Stan­ complex methods for collecting data over the WWW ford systems could participate and Stanford University medium and potentially save them some money. community members attempting to access the server from other domains would be rejected. Nonetheless, domain THE BASICS OF INSTALLING A SERVER restriction provides an easy way to better control the pop­ ON A MICROCOMPUTER ulation that contributes data to a Web research project. An important first step in configuring your Web server BENEFITS OF OPERATING is to find a server application that will fit your needs. Con­ YOUR OWN SERVER sult the earlier discussion on running CGI programs, SSIs, and image map use, as well as domain restriction Many of the features discussed above (CGI and SSI pro­ and password protection, and determine what options you gram execution, domain restriction, and password pro­ need in a server. Next, consult the Web page accompa­ tection) require that the Web-server program be appro­ nying this paper (http://or.psychology.dal.ca/-wcs/Web priately configured. Access to accomplish this is not ServerLinks.html) for helpful links to that are available to the casual user on multiuser systems such as available for comparing various server products. Acquire those that are commonly supplied for free at many uni­ the server software for your platform, and follow the de­ versities or companies, or for a fee to an internet-service tailed instructions that should be provided with it. provider for smaller research institutions or individuals. The installation of server software on Macintosh and For instance, on many multiuser systems, allowing any PC-compatible servers for Windows is quite straightfor­ user access to running CGIs or reconfiguring the server ward. Recall that server applications are programs that software in any way they wish, is viewed as a security await incoming requests from clients. On these micro­ risk, because the user must be given access to a wider computer platforms, server programs need simply to be range ofthe system's resources than they would normally running in order to activate your Web site, and they gen­ be allowed. Furthermore, reconfiguring changes the erally do not interfere with the ability to run other appli­ operation of the server for other users. Such increased cations. Web software is commonly distributed in instal­ privileges jeopardizes the integrity ofthe server system. lation packages that simply need to be executed, and the Running CGIs on such systems requires placing the exe­ installer will set up everything the server needs to oper­ cutable CGI program in an authorized (CGI bin ate (except your HTML documents, the custom CGIs that by default) and making a reference to it as appropriate you wish to use, and any resources that CGI or SSI pro­ from an HTML document. The CGI-bin directory is com­ grams may need). Configuring your server to do things monly located in an area where common users cannot like SSIs, image maps, password authentication, and do­ gain access, and installing CGI programs or reconfigur­ main restriction will require reading some documenta­ ing any aspect of the server is under the system admin­ tion that provides details for your server package. istrator's control. Requiring that a act as an inter­ SECURITY ISSUES PERTAINING TO mediary between you, your experiment, and your data is SERVER OPERATION a potentially awkward undertaking. This arrangement is particularly limiting during the early development ofthe Web security is a burgeoning field. Ofparticular worry data-collection process which necessarily involves re­ are security loopholes in servers housed on large multi­ peated testing and modification ofCGI and SSI programs, user systems. Small microcomputer servers are gener- WWWSERVER 193

ally the most secure, because they do not have extensive More information dealing with WWW security, in­ schemes for protecting resources from multiple users, cluding a list ofknown loopholes in various Web-server and the server software is not dependent on many shared applications, can be found at http://www-genome.wi. system resources. mit.edu/WWW/faqs/www-security-faq.html. Security concerns on single-user microcomputer serv­ ers are primarily related to the execution of insecurely CONCLUSIONS written CGI programs. Because such programs are flex­ ible in their operation and can immediately access all of This paper has surveyed a number ofWeb-server soft­ the system's resources, loopholes in their writing can ware features (CGI and SSI execution, image maps, pass­ sometimes lead to penetration by knowledgeable outside word protection, and domain restriction) that can assist sources. One particular problem with many Windows psychology researchers in the collection ofdata and give single-user systems is that they require that the scripting them better control over delivering their experiments via language ofany CGI scripts be located in the CGI direc­ the WWW. Experimenters are urged to consider operat­ tory. This directory is accessible clients anywhere on the ing their own server so that they can have immediate ac­ Web. If the scripting language accepts directives or en­ cess to configuring all facets of the server software to tire programs via the command line, any Web user can further their experimental goals. submit a for execution on the targeted server! To escape this problem, ensure that all scripting REFERENCES programs accessible as CGIs do not accept command­ KIELEY,J. M. (1996). CGI scripts: Gateways to World-Wide Web power. line instructions. For even more security, obtain a server Behavior Research Methods, Instruments, & Computers, 28, 165-169. that allows script languages and the CGI scripts that use KRANTZ, 1. H. (1995). Linked Gopher and World-Wide Web services them to be located in different places, and allow only for the American Psychological Society and Hanover College Psy­ CGI programs to exist in CGI-accessible directories. chology Department. Behavior Research Methods, Instruments, & Computers,27,193-197. Another potential problem, from a data-integrity view­ SCHMIDT, W. C. (1997). World-Wide Web survey research: Benefits, point, is that information can be sent to a CGI program potential problems, and solutions. Behavior Research Methods, In­ on your server from any client on the Web, regardless of struments, & Computers, 29, 274-279. where the HTML documents referring that information WELCH, N., & KRANTZ, J. H. (1996). The World-Wide Webas a medium were dispersed from. Figure I depicts the situation in for psychoacoustical demonstrations and experiments: Experience and results. Behavior Research Methods, Instruments, & Computers, which a document on Computer B (doc.html) sends infor­ 28, 192-196. mation to a CGI program to be executed on Computer C, should the user of Computer A (the client) activate this NOTE link. As the figure depicts, the CGI program and the doc­ I. Even ifa researcher does not have these resources, a user account ument responsible for referencing it do not need to be ac­ with about 6 MB ofdisk space on almost any internet-capable plat­ cessed from the same Web server. Hence, any Web user form can suffice. Although this is a technically more involved method could create an HTML document that sends data to your ofsetting up a server, it is inexpensive. See the resource page associated CGI. Measures can be taken when writing CGI programs with this article (http://or.psychology.dal.ca/-wcs/WebServerLinks. to ensure that the data they receive originated from author­ html) for an introduction toward getting such a server up and running. ized hosts (see environment variable HTTP_REFERER (Manuscript received September 23, 1996; in Table I). revision accepted for publication January 9,1997.)