Becoming an information provider on the : creating quality information sources

by Bruce Hunter, BA.

A Master's Dissertation, submitted in partial fulfilment of the requirements for the award of Master of Science degree of Loughborough University.

September 1997

Supervisor: Alan PouIter, MA, MSc, ALA Department of Information and Library Studies

© B.W. Hunter, 1997 Abstract

The World Wide Web has great potential as an infonnation source, but only if Web sites are suitably maintained. This dissertation aims to discover how it may be possible to aid infonnation retrieval on the Web through appropriate design of Web sites and pages, and by using various tools which are available. The possibility of one tool in each category being preferable is considered, but the conclusion reached is that the choice depends on personal preference and expertise.

The World Wide Web is briefly discussed, describing its development from the rest of the Internet, and explaining important concepts.

The possible structures for Web sites are explained, and hierarchical structures are dealt with in more detail. The problems of navigating in a hypertext environment are mentioned, and solutions to these problems suggested. Next, the design of Web pages is considered, with emphasis placed on how page layout and typography can improve a page's usability, and methods for using graphics within documents to enhance infonnation retrieval are discussed. Style sheets are a recent development which gives greater control over the appearance of Web pages, and the potential benefits of these are explored.

A selection oftools designed to help with the provision ofinfonnation via the Web are examined. Web servers are an essential element in the transfer of documents, and their main functions are discussed. Four examples of servers are compared to detennine ifthere are any major differences. HTML editors and image map creators simplify the creation of Web pages, and this study looks at how they can be used to design pages that fulfil the criteria from the previous section. Finally, tools to produce interactive and dynamic sites are examined. These include the Common Gateway Interface, Java, and JavaScript, all of which can help those using a Web site to locate specific infonnation. Acknowledgements

I would like to thank my supervisor, Alan Poulter, for his help during the writing of this dissertation. I would also like to thank my parents for their invaluable support and advice in the latter stages. Contents

List of Tables iv

List of Figures v

Chapter 1 - Introduction and Methodology 1

1.1 Introduction 1

1.2 Methodology 3

Chapter 2 - Introduction to the World Wide Web 8

2.1 The history of the Internet 8

2.2 How the World Wide Web works 10 2.2.1 The Hypertext Transfer Protocol 10 2.2.2 MIME types II 2.2.3 Hypertext Markup Language 12

Chapter 3 - Designing a Web Site 15

3.1 Web Site Structure and Navigation 16 3.1.1 Structure 16 3.1.2 Hierarchies 19 3.1.3 Navigation 20 I

I 3.2 Page Design 22 3.2.1 Page Length 23 3.2.2 Identifying the Page 24 3.2.3 Page Layout and Typography 25

3.3 Graphics 29 3.3.1 Graphic File Formats 29 3.3.2 Graphics in Web Pages 30 3.4 Style Sheets 32

3.5 Miscellaneous 36

Chapter 4 - Tools to improve a Web site 43

4.1 Servers 45 4.1.1 Market Share 46 4.1.2 Features of Web Servers 47 4.1.3 Reviews of Some Web Servers 49 4.1.3.1 Apache 49 4.1.3.2 Microsoft Internet Information Server 50 4.1.3.3 Netscape Enterprise Server 52 4.1.3.4 NCSA 54

4.2 HTML Editors 54 4.2.1 HoTMetaL Pro 3.0 56 4.2.2 Luckrnan's WebEdit 2.0 59 4.2.3 Microsoft FrontPage 97 61

4.3 Image Maps 63 4.3.1 How Image Maps Work 63 4.3.2 Server-side and Client-side Image Maps 64 4.3.3 Image Map Creators 65 4.3.3.1 Web Hotspots 65 4.3.3.2 Mapedit 66 4.3.3.3 LiveImage 67

4.4 The Common Gateway Interface 68 4.4.1 ProcesseslMechanisms 69 4.4.2 How Data is Sent 70 4.4.3 HTML Forms 71 4.4.4 Designing Applications 72 4.4.5 Programming Languages 72 4.4.6 Server Side Includes 74

11 4.5 Java & JavaScript 75 4.5.1 Java 75 4.5.1.1 The History ofJava 75 4.5.1.2 How Java Works 76 4.5.1.3 Uses for Java 77 4.5.2 JavaScript 78 4.5.2.1 How JavaScript Works 79 4.5.2.2 Uses for JavaScript 80

Chapter 5 - Conclusions 88

5.1 Web Site Design 88

5.2 How Tools can Help 90 5.2.1 Web Servers 91 5.2.2 Editing Tools 92 5.2.3 Programming Tools 93

5.3 Final Comments 94

Bibliography 97

iii List of Tables

Table 4.1: Web server developers, market share July 1997 46

Table 4.2: Web server , market share July 1997 47

Table 4.3: Common operating systems and COl programming languages 73

iv List of Figures

Figure 3.1: Web pages in a sequence. 17

Figure 3.2: Web pages in a grid. 17

Figure 3.3: Web pages in a hierarchy. 18

Figure 3.4: Web pages in a Web. 18

Figure 3.5: An example of anavigation bar. 21

Figure 3.6: Examples of good and bad page layout. 25

Figure 3.7: A selection of icons 30

Figure 4.1: The client/server system. 45

Figure 4.2: Screen shot of HoTMetaL Pro. 57

Figure 4.3: Screen shot of Luckman's WebEdit. 59

Figure 4.4: Screen shot of Microsoft FrontPage. 62

Figure 4.5: Screen shot of Web Hotspots. 66

Figure 4.6: Screen shot of Mapedit 67

Figure 4.7: HTML form showing checkbox, radio, text, submit, and reset inputs. 71

Figure 4.8: How a server processes a document containing Server Side Includes. 74

Figure 4.9: Example of JavaScript hierarchy. 79

v Chapter I - Introduction and Methodology

Chapter 1 - Introduction and Methodology

1.1 Introduction

The World Wide Web has been in existence for several years now, and there are many

Web sites of varying quality. People who publish information on the Web need many

traditional publishing skills, but also have to take into account specific aspects of

electronic publishing. This dissertation looks at how information can be made

available over the World Wide Web and how this information can be made easily

accessible to people more familiar with printed sources, with the aim of enabling them to fully utilise the potential of the Web.

The organisation of Web sites can be, and often should be, different from that of books or journals, and the layout of the pages is also critical for ease of use. To help

Web authors and designers, a nwnber of style guides have been written describing how to structure documents; pages, and Web sites. A recent development designed to overcome the limitations imposed by HTML on the appearance of Web pages are

Style Sheets. These can be used to achieve a consistent presentation amongst related pages and to design specific document layouts.

The essential Web servers which supply documents when requested and execute CGI scripts differ from each other in the facilities they support and in their ease of use.

1 Chapter I - Introduction and Methodology

Various software packages are available to simplify the task of creating and

maintaining Web sites. HTML authoring tools can be used to achieve the required

structure and layout and also carry out some site management tasks. Image map

creators can be used to design image maps which provide the ability to link to

information from pictures, supplementing plain textual descriptions.

The Common Gateway Interface can provide users with interactive searching

facilities to help them find the specific information they require and can create

documents 'on the fly'. Further improvements can be made to Web pages by using

Java and JavaScript (the best supported of the scripting languages) to extend the range

of interactive abilities.

A great many books, articles, and Web pages have been written about every aspect of running a Web site, from configuring servers to writing and designing Web pages.

These information sources are mainly concerned with the technical aspects of the processes involved, and little specific attention is given to how to improve the quality of information provision. Web page design guides describe how to make pages look good using HTML, which is important, but many of them do not explain how this can be used to ensure that people visiting the site find the information they are looking for. Similarly, style sheets are generally discussed in relation to the appearance of documents without stating why an improvement of appearance is useful. The books and sites dealing with Web servers are largely focused on how to set up and run a server and do not include much advice on how this can be used to benefit the end user. Sources concerned with the programming aspects of the Common Gateway

Interface, Java, and JavaScript concentrate more on how to write the scripts than on

2 Chapter I - Introduction and Methodology

what can be done to improve information retrieval, with the exception of the database

searching capabilities of the Common Gateway Interface.

The aim of this study is to discuss how to create and maintain a high quality

information source on the World Wide Web. A well designed Web site should make

it simple for people using it to retrieve the information they require (assuming of

course that the information is in the site) and ways of ensuring maximum efficiency

will be investigated. It is not sufficient simply to publish documents in whatever

format they may be. Instead, Web pages should be designed with the specific

medium in mind, and should utilise whichever of the many available enhancements

are appropriate. The study deals with the quality of the site as a whole and the appearance of individual pages. Maintaining the quality of the information being presented will not be considered.

1.2 Methodology

Before considering what is involved in designing Web sites it is useful to have an understanding of how the World Wide Web operates. There are various processes and protocois in operation, as well as the markup language which is used to present the pages. The first section of the dissertation will examine the principle concepts of the World Wide Web and will offer definitions of some commonly used jargon. The information for this section will be gathered from books and Web pages dealing generally with using the Internet, and running a Web site.

3 Chapter 1 - Introduction and Methodology

The second section is an investigation of the importance of good design and style for a Web site. The following questions will be considered. How should a site be organised to aid the retrieval of information? How can pages be designed to make the reader feel comfortable with this medium? What typographical effects are possible with the markup used for Web pages? How can graphics be used to simplify information retrieval? Guides will be discussed in relation to improving the provision of information through suitable design. A selection of guides dealing with the design and style of Web sites and pages will be examined to determine the criteria for making a Web site a worthwhile information source and a valid alternative to printed materials.

The final section will examine how various software packages can help improve a

Web site. What is required from a Web server today to simplify the process of delivering documents and what extra features are on offer to increase users' satisfaction? Several, differing packages will be examined, using reviews and the developers' information, to determine which particular packages and platform may be preferable. Authoring tools and image map creators have been developed to make designing pages simpler. What they offer and the comparative merits of a selection of products will be discussed. Lastly this section will look at some of the methods for making a site more interactive. The different abilities of the Common Gateway

Interface, Java, and JavaScript will be explored to determine in what ways they can improve information provision.

The most up to date information on the constantly evolving software packages and programming languages, as well as on how best to utilise HTML, is, appropriately, to

4 Chapter I - Introduction and Methodology be found on the World Wide Web. Information used in this study was located through the use of the search engines Alta Vista and Yahoo. Many of the most useful sites, however, are not easily found through search engines, but are traced via links from other sites.

Another useful source of current information is the many mailing lists for Web site developers and administrators available bye-mail. These lists are forums where anyone interested in this field can ask questions, inform their peers of matters of interest to them, and discuss relevant issues. Where personal views of the merits, or otherwise, of specific Web servers, HTML editors, or image map creators are required, interviews are not necessary, since the opinions expressed in the messages sent to mailing lists are those ofthe authors and can be used instead.

Mailing lists dealing with the creation and management of Web sites include:

• Web-support: a UK list for the discussion of topics relating to the World Wide

Web, including Web browsers, Web servers, the HTML language, and HTML

documents and editors. The list covers Macintosh, DOS and Unix

platforms(l).

• Website-info-mgt: a list for the UK HE community to discuss all aspects of

managing an institutional web site, maintaining the information, producing web

pages, dealing with organisational and management issues, and for sharing of

relevant experiences of maintaining a large information system(2).

5 Chapter 1 - Introduction and Methodology

• Web4Lib Electronic Discussion: an American mailing list for the discussion of

issues relating to the creation and management of library-based Web servers

and clients(3).

Although these lists contain extensive discussion on issues that are either too technical for or not relevant to this dissertation, some useful information was found.

6 Chapter I - Introduction and Methodology

References

1. web-support. (URL: http://www.mailbase.ac.ukllists/web-support/).3 September 1997.

2. website-info-mgt. (URL: http://www.mailbase.ac.ukIlists/website-info-mgtl). 3 September 1997.

3. Web4Lib Electronic Discussion (DL SunSITE). (URL: http://sunsite.berkeley.edulWeb4Libl),3 September 1997.

7 Chapter 2 - Introduction to the World Wide Web

Chapter 2 - Introduction to the World Wide Web

The World Wide Web is part of the Internet, an international network of computers accessible to all with a suitable computer and software. The Web is a hypertext system, where each resource (documents, images, video, audio) can have embedded links to other resources.

2.1 The history of the Internet

The Internet has its origin in 1969, when the American defence department wanted a secure and reliable computer network for US military research(I). ARPANET, as it was called, was developed to be resistant to nuclear attack, and for this reason was spread across America. In 1983 the military network was separated from the remainder, and this remainder was made available to non-military researchers(2.). At first only the academic community had access, while commercial organisations used their own proprietary networks which were similar to the Internet, only on a smaller scale.

Since the start of the Internet, various tools were developed. Among the earliest were e-mail, in 1971, FTP(FileTransferProtocol), and USENET News(3). The first electronic publishing involved collecting files into directories (archives) for people to download using FTP~ This system was not very user-friendly as there was no way of

8 Chapter 2 - Introduction to the World Wide Web finding out what files were available where, so 'Archie' was developed. Using Archie it was possible to search archives for a file with a particular name. This was an improvement, but it was still necessary to know the name of a file, or at least part of it. The next development was 'Gopher', devised at the University of Minnesota(4).

This provided a menu of the files in an archive, along with a description of the contents. Menus could also contain links to other sites, so now the Internet could

"offer an integrated network of information resources rather than presenting a disjointed set of information repositories"(5).

The World Wide Web is an extension of all the previous elements of the Internet. In

1989, Tim Berners-Lee proposed the Web as a project to help high-energy physicists working around the world to collaborate more easily, and it was intended to unify all the various protocols into one point of access. The initial project was at CERN in

Switzerland, and set out to design a system that could be used for linking related information in a manner similar to the association of ideas in the human mind. Tim

Berners-Lee also wrote the first server and browser programs, completed by

Christmas 1990. These were introduced for internal use at CERN in Spring of 1991, and the software was made publicly available in January 1992(6). In 1993, the US

National Center for Superconducting Applications (NCSA) developed a new type of browser, called 'Mosaic'. Mosaic was a graphical Web browser, initially for Unix machines, that used icons, pop-up menus, and coloured text for links. It was now possible to place images on the page with the text, and there was support to multimedia applications, such as sound and animations. Mosaic was made freely available late in 1993for Apple Macintosh, Windows-based machines, and X

Windows on Unix, and this was when the World Wide Web took off.

9 Chapter 2 - Introduction to the World Wide Web

2.2 How the World Wide Web works

The World Wide Web operates on a client-server system. Each user has a browser

(the client) on which they can display information transmitted by a server. The browser handles the user's requests for documents. First it determines which host machine to connect to, then it fetches the document from the host and displays it on the user's screen. The server, on the other hand, waits for an incoming request and then sends back the required document. The connection between the browser and the server is only open for as long as this process takes, and then the server terminates the connection to free up its resources for other requests. Web browsers can use several different data transfer protocols (for example, HTTP, FTP, Gopher, NNTP (for

USENET News), and Telnet) which tell the computers how to talk to each other and how to interpret the data. Whichever protocol is used, a Uniform Resource Locator

(URL) is required. A URL uniquely identifies each unit of information, containing the protocol required, the name of the server, and instructions for accessing the information once a connection has been made. When there are in-line images in a page, a separate request is required for the page and each image.

2.2.1 The Hypertext Transfer Protocol

The principal protocol used for transmitting documents on the World Wide Web is the

Hypertext Transfer Protocol (HTTP). Hypertext is a system whereby units of information (known as nodes) are linked to other nodes in a non-linear fashion, and it is possible to move between nodes along the links. Hypertext systems have been in

10 Chapter 2 - Introduction to the World Wide Web existence for many years, but the Web is the largest, and most accessible. When a browser contacts a server it makes an HTTP request, most commonly a GET request.

GET asks the server to return to the browser the contents of the document specified.

Another common request method is POST, which enables more complex data to be sent to the server.

Following the browser's request, the server responds with a status code. This three­ digit code may simply state that the document is on its way, or indicate that there is an error with the request. After the status code comes the response headers. There are several fields in the header, most of which are optional, with the Content-Type field, which states the MIME type of the data (see section 2.2.2), being the only compulsory one. Other fields are used to identifY the server, today's date and the document's date, and various pieces of information about the document, such as the size of the file. Finally, after the response headers, the server places a blank line followed by the document data.

2.2.2 MIME types

Every document served by a World Wide Web server has a MIME type. MIME stands for Multipurpose Internet Mail Extensions, and is "an extensible system developed for sending multimedia data, such as graphics and videos, over Internet mail"(7). HTTP uses MIME to describe the contents of a document, referring the browser or server to a list of document types specifYing how to view or execute the file. MIME types are divided into application, audio, image, message, multipart, text,

11 Chapter 2 - Introduction to the World Wide Web and video(8). Each of these types then can have one of several subtypes, for example text/plain or text/. Since MIME is an extensible system, as and when new types are developed they can be added to the list in a browser, and set to use any necessary external applications.

2.2.3 Hypertext Markup Language

The Hypertext Markup Language (HTML) is the language that Web pages are written in. It is a subset of the Standard Generalised Markup Language (SGML), "a system for formalizing the structure of documents and enabling documents to be interchanged between different document processing packages"(9). Primarily, HTML is a structuring language. HTML can be used to give formatting instructions to affect the way Web browsers will display a document, create hypertext links between documents or sections of a document, or insert files of many non-textual media into documents.

\ HTML uses 'tags', with or without various attributes, to tell the browser what to do.

There are tags for a hierarchy of six headings, paragraph breaks, line ends, and more complex layout elements such as lists and tables. How the text appears can be controlled, as can text and background colours. Graphics can be placed at any point in a document, and links can be attached to any element on the screen. HTML can create forms for the user to fill in, complete with check boxes, pop-up menus, and buttons. Using frames it is even possible to display several documents on the screen at the same time.

12 Chapter 2 - Introduction to the World Wide Web

There are various versions of HTML. The original specifications were retrospectively named version 1.0, and there have since been versions 2.0 and 3.0. Version 3.2 is the latest, but there are also several tags which were created by browser manufacturers in an effort to make their browser do things the others couldn't. Hopefully, in the near future, all the difference will be unified to produce an undisputed standard for HTML.

13 Chapter 2 - Introduction to the World Wide Web

References

1. Tseng, Gwyneth, Alan Poulter & Debra Hiom. The library and information professional's guide to the internet, 1996, p. 9.

2. Ibid., p.9.

3. Ford, Andrew & Tim Dixon. Spinning the Web, 1996, p. 3.

4. Ibid., p. 4.

5. Ibid., p. 5.

6. Stein, Lincoln D. How to set up and maintain a World Wide Web site, 1995, p. 2.

7. Ibid., p. 42.

8. Spainhour, Stephen & Valerie Quercia. Webmaster in a nutshell: a desktop quick reference, 1996, p. 112.

9. Ford, ref. 3, p. 42.

14 Chapter 3 - Designing a Web Site

Chapter 3 - Designing a Web Site

Anybody can publish on the World Wide Web. The quality of infonnation and the standard of the presentation are very variable, so it is obviously an advantage for infonnation providers to make their sites stand out as well designed and of good quality. To achieve this, a user-centred design is needed, where the requirements of those people who will use the site are foremost in the planning. It is essential, therefore, to know who the intended audience is, and to detennine the specific infonnation needs they have. The Yale Style Manual identifies the three main types of user and their needs as follows(l):

• Web Surftrs have to be caught as they pass. First, an attractive appearance is

needed to entice them in and then a statement of contents to infonn them of the

site's purpose.

• Novice and occasional users who seek out a site require a clear structure and

easy access to overviews of the arrangement of the site.

• Expert andfrequent users, people familiar with the WWW and with specific

goals in mind, look for detailed menus and structure outlines to help them

quickly find the infonnation they require.

Once the make up of the audience is known, the specific infonnation that the provider wants to supply can be gathered and the structure of the site can be planned. A clear structure helps users navigate around the site, while logical and consistent layout of the individual pages makes scanning for infonnation easier.

15 Chapter 3 - Designing a Web Site

I

I 3.1 Web Site Structure and Navigation

The structure of a Web site is important not only from the point of view of the user trying to find the information he/she wants, but also for the information provider to be able to arrange the information logically and sensibly. Consequently, a detailed initial structure is needed for a site to develop without problems(2).

3.1.1 Structure

Large bodies of information need to be subdivided. People can only retain a few pieces of information in short-term memory, and "smaller, discrete units of information are more functional and easier to navigate through than undifferentiated units"(3). There are four basic steps in organising information(4):

• Divide it into logical units.

• Establish a hierarchy of importance and generality.

• Use the hierarchy to structure relationships between the units.

• Analyse the functionality and aesthetics of the system, that is, how efficient and

easy to use it is, and what the presentation should be like.

Sites can be arranged in various pattems:

• Sequence. This is the simplest way to organise, as it is just a linear sequence of

pages. This is a suitable arrangement only if the site, or section, is quite small,

16 Chapter 3 - Designing a Web Site

as a Web page is not a very convenient fonnat for reading large amounts of

linearly arranged infonnation.

Figure 3.1: Web pages in a sequence. From Yale Style Manual-Site structure (htlp:llin!o.med.yale.edulcaimlmanuaIlSITESISite_structure.html)

• Grid. Pages are linked in a row and column type arrangement. The individual

units must share a highly unifonn structure of topics and subtopics, for

example, in a historical infonnation source the rows could correspond to

specific years, and the columns to aspects of history (e.g. political or

agricultural).

C'III~ __ _

•• ... ••--- ••------

Figure 3.2: Web pages in a grid. From Yale Style Manual-Site structure (htlp:llinfo.med.yale. edulcaimlmanuallS1TESISite _structure. html)

• Hierarchy. A hierarchy from general at the top, moving down to more specific

infonnation is the best way to organise a body of infonnation in most

circumstances. It is easy to understand, as users are familiar with this type of

structure. A hierarchy forces a good organisation on the site.

17 Chapter 3 - Designing a Web Site

Figure 3.3: Web pages in a hierarchy. From Yale Style Manual-Site design (http://info.medyale.edulcaimlmanuaI/SITES/Site_ design.html)

• Web. Here, pages contain links to most other pages in the site, and there are

few restrictions on the paths taken around the information. A web-like

structure is not very practical, and is hard for the user to understand( 5).

Figure 3.4: Web pages in a Web. From Yale Style Manual-Site structure (http://in!o.med.yale.edulcaimlmanuaI/SITES/Site_strucfure.html)

In general, most Web sites will contain elements of some or even all of these structures.

Whichever structure is used, it should be designed to aid scanning and locating information. Information must not be buried in a deep hypertext structure, but should be easily accessible, or else users may not have the patience to find it. The Apple

18 Chapter 3 - Designing a Web Site

Web Design Guide suggests making all information accessible in three clicks if possible(6), and Sano suggests that the most important, or most accessed information should go in the top two or three layers of a hierarchy(7). The individual documents are best presented in "short, uniformly-organized chunks of information"(8), as users expect to find a specific unit of information on a page with related units.

3.1.2 Hierarchies

The structure most commonly found and most useful for the majority of collections of information is a hierarchy. Hierarchies are clear and logical, if well planned, and are easily understood. At each level of a hierarchical subject-tree the choices lead to subsets ofthe subject, and users can narrow down the field in which they are searching. Clues are needed to indicate to the user what is in each of the branches, and these branches should contain related information. It is important not to have too shallow a hierarchy with too few levels, unless the site is very small, or there may be unrelated information in the same branch, but too deep a hierarchy can lead to frustration at the number of menus to be negotiated. The indexes and menus at each level should be organised according to the same criteria to create a familiar feel for the users. Each menu should have at least four or five links, and up to a dozen is acceptable for a list-based menu, as long as all the choices will fit on one screen of information(9).

The top page of a hierarchy, and the preferred point of entry, is the home page. The home page should be carefully designed, as it gives the first impression ofthe quality

19 Chapter 3 - Designing a Web Site of the site. As well as some form of identification of the site and its purpose, a general menu should be offered. If there are many unconnected sections of information, then secondary home pages can be used for each area, and these should be designed with a similar layout and structure to the main page so that there is a consistent feel to the site.

3.1.3 Navigation

Navigation within hypertext is different to navigation within printed media, where a linear structure is usually followed. In hypertext it is not only possible but desirable to be able to go off at tangents and to follow individual chains of information, taking the person visiting the site to the precise information they want. This freedom, however, can lead to users complaining of being 'lost'. Edwards and Hardman identify three forms of being lost(10):

• Not knowing where to go next.

• Knowing where to go, but not knowing how to get there.

• Not knowing the present location in the overall structure of a document.

In an experiment to determine the best structure for hypertext documents, comparing hierarchical, index only, and a combination of the two, Edwards and Hardman found that a hierarchical structure gave the best cognitive representation of the structure, most "satisfaction of interaction", and the least feeling of being lost. McKnight and

20 Chapter 3 - Designing a Web Site

Simpson also found that navigation is more efficient when a more accurate cognitive map is fonned in the mind of the user(ll).

Users will find navigation within a site simplified if they have the use ofa navigational tool bar. This should be consistently placed at the top or bottom of every

Web page (both if it is a long page), and should contain icons or text linking to various pages in the site. Pages to be linked to should include those that users are likely to want to visit, such as the site's home page, the top of the current section, the page immediately above the current one in the hierarchy, and possibly a local search mechanism. There should be no pages where the only way out is by using the browser's 'back' button, as someone unfamiliar with Web browsers could get stuck there. If graphics icons are used, then a text alternative should also be provided for those unable or unwilling to display graphics. The navigation bar is especially useful for people who have entered the site from an external link and may find themselves on any page, as it can give them links to the home page and any contents pages.

Figure 3.5: An example ofa navigation bar.

If a document involves a linear sequence of pages, it is important not to refer in the link to the 'previous' and 'next' pages. The previous page for the person reading the document may be anywhere in the World Wide Web. Nor should there be links labelled 'back' and 'forward', as these instructions may be confused with the similarly named controls on the browser. Likewise, within a hierarchy 'up' and 'down' should

21 Chapter 3 - Designing a Web Site not be used on their own. In all these cases it is better to describe the destination of the links in absolute terms, stating to what information the link will take the reader.

It is helpful to users of a site to have an idea of the depth of the coverage, so that they can decide if it is appropriate to their needs. Providing some clues as to the size of the site can help avoid false expectations(12). Similarly, when a document is very long, whether it is all on one page or broken up into linked pages, the user should be made aware of this fact. When a very long page is presented there should be internal links to all the main sections from a contents section at the top. There should also be links up to the top from the end of each section to avoid the user having to scroll all the way back.

3.2 Page Design

Pages within a Web site should be designed to enable quick and easy retrieval of information. This requires the correct use of typography and graphic design, as well as careful attention to certain aspects unique to hypertext documents. The Yale Style

Manual states that "a careful, systematic approach to page design can simplifY navigation, reduce errors, and make it much easier for users to take full advantage of the information and features of your Web site"(13).

22 Chapter 3 - Designing a Web Site

3.2.1 Page Length

A major consideration in page design is whether to make a body of information available as a single document, or to break it up into smaller chunks connected by hypertext links. Rees points out that readers are familiar with, and therefore more comfortable with, coherent articles where all the information is on one page(14).

Books and journal articles are published this way, and it is easier for the reader to navigate within a page than between pages. He raises the point that, although a long page may take some time to download, having many pages will ultimately lead to slower retrieval of information, as users may have to download and scan through many pages to find what they want. Rees suggests that if the space is available on the server it might be useful to present the information as both a single complete article and as linked pages, thus satisfying the needs of different kinds of user. Isaacs, on the other hand, feels that "large unstructured documents which require continuous scrolling are generally not satisfactory, not only because of the time they take to load, but also because the reader may feel rudderless in a sea of words"(l5). In practice the choice of single or multiple pages will depend on the type of document being published and its intended use. Shorter pages are preferable for reading on-line, whereas if the document is more likely to be printed and read later then it should be in a single page(16), unless it is very large, when it should be broken up into chunks of about 30K(l7). The intended audience should also be taken into account. If they are largely home computer users with low bandwidth connections then large pages will be very slow to download, and so shorter pages would be more appropriate(18). If, on the other hand, the target audience are using computers with high bandwidth connections in universities, then the downloading time will be much faster.

23 Chapter 3 - Designing a Web Site

Navigational pages, such as menus and content pages, should always be kept brief, no more than two screens, or users will have to scroll back and forth to see all the choices.

3.2.2 Identifying the Page

All pages within a Web site should have an HTML title which "accurately summarizes the content of the page"(19). Each title should be unique, and understandable out of , in order to make the content clear quickly, as it will be used by search engines, and it goes in the bookmarks on users' browsers(20). If the title is ambiguous then some people may not investigate the page, while others will visit and be disappointed at not finding what they want.

Page headings should be clear and informative, since people reading the page may have arrived at the page from anywhere on the Web, and may not know what the site is about. "The title that appears in the header of the browser window should match the HTML page title"(21) to help the reader identify the page on a later occasion, and to avoid confusion. HTML headings should be used, in order, and only as headings, not for sizing the text. The header should also contain information identifying the site, including reference to the organisation or company if appropriate, and should be entirely visible without scrolling on an average monitor(22).

24 Chapter 3 - Designing a Web Site

3.2.3 Page Layout and Typography

In order to create a consistent site a standard page layout should be determined and used for the whole site. This will make the site more predictable, and users will feel more comfortable when they find each page familiar in style.

Poo~ page lagoHt Bette~ page lagoHt ...... m~llI-

I ... • • . • •• .1111 ~ -- . • m. 11 •

Figure 3. 6: Examples ofgood and bad page layout. From Yale Style Manual­ Balanced page designs. (http://info.med.yale.edulcaim/manuaI/PAGES\Balanced yages. html)

Since the top of the page, the first four inches or so, is all that is visible without scrolling, as many as possible of the key points dealt with in the page should be placed there to ensure they get seen. Where information is in menus it should be positioned near the start of the page, and arranged according to importance, top·to· bottom, left-to-right, and grouped "to make relationships explicit"(23). Lists should be used where possible, as they are a simple way of making the different options

25 Chapter 3 - Designing a Web Site clear, and these should be kept short. If a list has more than seven rows, it may help scanning ifthere is a white space after every fifth row(24). Lastly, "group all minor, illustrative, parenthetic, or footnote links at the end of the document where they are available but not distracting"(25), unless the infonuation can be placed in the article instead(26).

Typography is just as important in Web sites as it is in traditional fonus of publishing.

"Good typography depends on the visual contrast between one font and another, and the contrast between text blocks and the surrounding empty space"(27). A limited number of fonts should be used within a page, with as few styles for headings and subtitles as possible. If the font or style is constantly changing then it may make the page harder to read. Bold should be used to emphasise text, but only for important words or phrases and never for whole blocks of text, or there is no contrast(28). White space should be placed around headings, paragraphs of text, and graphics in order to separate the various elements and make the page easier on the eye(29). The Yale

Style guide suggests using narrow columns of text such as in a newspaper or journal(30). These are easier to read, as the human eye can only span about three . inches without the head being moved. Narrow columns can be achieved by using

HTML tables, which is the only option for page layout in the current version of

HTML, as otherwise the user's browser window size and settings will detenuine the dimensions. To prevent resizing, cell widths can be defined for the tables with absolute values. Tables can also be used to create margins and larger spaces between text, as well as to position text and graphics more precisely on the page.

26 Chapter 3 - Designing a Web Site

Text in a hypertext document should read just as easily as a paper version.

Poor writing undermines confidence in the value of the information presented and poor

quality information reflects on the institution• from which it is issued(32).

Two features characteristic of Web pages need comment. First, links. When a sentence has a hypertext link in it the structure of the sentence should be unaffected, and the links should be placed on the most relevant word or phrase. The expressions

'click here', or 'select this link for ... ' should never be used, as they disrupt the flow of the text. It is also important to provide sufficient context with the link to infonn users of what it will give them. Second, colours. The colours used for unvisited and visited links are also important. If these colours are specified, they should be similar in tone to the text colour so as not to stand out too much and distract the reader's attention from the rest of the text(31). Similarly, the colours used for text and background should be chosen to create contrast, but primary colours should be avoided, especially in the background, where subtle shades are preferable. If the colours are too strong or bright, some people may be put off. It should go without saying that the text itself must be well written.

The bottom of each page should present infonnation about the document's origin and the date it was written and published. It is useful to provide author or contact person information and an e-mail address to which comments can be sent, as this is the best method of discovering if the readers are satisfied with the site, or feel things should be altered. Any affiliation to a company or organisation that the author has should be made clear to avoid accusations of bias or a hidden agenda. An unambiguous date format should be used, such as 1si July 1997 rather than 117/97 which may be

27 Chapter 3 - Designing a Web Site understood as 1st July or i h January in different countries. Many Web documents are regularly updated, and others are replaced by newer versions. When this happens, the date indicated should be that of the latest version to be published. Finally, the URL of the page should be included, as many people print or save pages to read them later, and this may be the only record of the URL available to them.

When deciding on the layout of the page, especially if graphics are involved, it is important to know what the current average monitor size and resolution are. Graphics should be sized so that all the necessary details can be seen on a 14-inch screen, as many users do not have access to larger monitors. Currently, most screens are

640x480 pixeIs(33), although the actual size of the browser window will be smaller than this, as space on the screen is taken up by the tool bars and scroll bar. Many users may also still have old, lower resolution displays. Pages should be designed so that they are not any wider than that, unless what is being displayed requires more width, for example, information in a matrix. It is also necessary to accommodate a variety of browsers, computers, and operating systems. Pages should be tested in as many different browsers as possible, including text-only browsers such as Lynx or

Mosaic, and text alternatives put in the HTML for users with text-only browsers, or with the images turned off. Where an image is used as a link, a redundant text link should also be used to enable all users to follow that link(34). Alternatives should be clearly available for any pages that are very graphics dependent or that use frames or

JavaScript, since a considerable proportion of users still have slow connections or old browsers that do not support recent developments.

28 Chapter 3 - Designing a Web Site

3.3 Graphics

"The function of graphics should be to help convey your infonnation, to enhance content rather than simply make it look pretty"(35). Graphics should be used to reinforce or support infonnation, not detract from it. When a picture, diagram or illustration goes in conjunction with text, it should be placed on the page at the side of the infonnation, or at least below it, so that it is clear that the text and image belong together.

3.3.1 Graphic File Formats

There are currently two graphic file fonnats supported by most browsers. These are:

• Graphic Interchange Format (GJF). This is limited to 8-bit colour palettes

giving 256 colours.

• Joint Photographic Experts Group (JPEG). This allows 24-bit palettes, giving

over 16 million colours, or 'true colour'.

GIFs are supported by all graphic browsers, and are preferable for diagrams, line drawings, and solid colour graphics. No data is lost in compression, so the image can appear exactly as created, although subject to the capabilities of the user's display.

Images can be saved as interlaced GIFs, and when downloaded these appear at first in low resolution building up to full resolution over several passes. While these images do not finish loading any faster than conventional GIFs, the picture is recognisable

29 Chapter 3 - Designing a Web Site before it is fully loaded, enabling the user to see more quickly what it is. The GIF format also supports transparent backgrounds, enabling images to be placed on a page with a background colour or image without a rectangular box appearing around the image and obscuring the background.

JPEG images, on the other hand, are bett~r for photographs and other images where full colour is required. The compression ratio can be higher, creating smaller files and faster downloading, but some of the original detail can be lost in compression.

3.3.2 Graphics in Web Pages

Graphics can be used for icons and for illustrations.

Icons should be used with consistency throughout a site. If a certain image is used on one page to represent something, then the same one should be used on every page so that the user knows what to expect. Each icon should be easily distinguishable from all the others, especially if it is to be used for navigational purposes. Icons alone, however, are not enough, since not all users will know what they mean, even if a key is provided. It is necessary to have textual labels, either as part of the icons or beside them, to indicate their function, rather than forcing people to guess their meaning.

Figure 3.7: A selection oficons

30 Chapter 3 - Designing a Web Site

Large illustrations are very slow to download, especially over modems using the standard telephone network. Unless the image is important as part of the information and needs to be seen in full detail, it should be kept small to avoid frustrating those with low-bandwidth connections. If a large illustration is essential, it is a good idea to provide a 'thumbnail' image with a link to the full-size picture so that users can choose whether or not to view it, depending on the speed of the connection and their personal interest. It is helpful to give an indication of the size of the file in this instance. "Provide enough information to let users know whether it is worth their time and trouble to download"(36). Wherever possible, images should be reused, as most browsers will cache the file and reload it from there, thus avoiding another long wait for the user.

Web pages can be made to appear to download more quickly by specifying the

WIDTH and HEIGHT of images, because then the layout of the page is set before the graphics are downloaded. The text will be displayed first, and the user can start reading while waiting for the images, without the page being rearranged as they do so.

Images should fit into the browser window whenever possible. Currently this means no more than 475 pixels wide (see section 3.2.2)(37).

Some people visiting a site will be using text-only browsers and others will have images turned off to reduce downloading times. To cater for these the ALT tag should be used for all graphics, including those used as links, to provide information about what is not being displayed.

31 Chapter 3 - Designing a Web Site

3.4 Style Sheets

When the World Wide Web was first developed there was no facility using HTML for selecting fonts or controlling most of the common typographical effects(38).

Selection of fonts is now possible, but the browser is relied on to create them and the range of sizes is limited. Advanced layout can only be achieved by using HTML tables which is a poor solution as the text must be broken up, and therefore is not all kept together in the code. Another problem for Web authors has been that Netscape and Microsoft have both developed their own HTML extensions to improve typography in an effort to outdo each other. Despite all this, "there are no mechanisms for an author to specify how she [sic] wants specific elements rendered, or to control aspects of page layout"(39).

Style sheets are a means of avoiding all these problems. They allow Web authors to separate document structure and content from appearance, with the result that, even if a browser does not support style sheets, it will still be able to display the document in a simple HTML form.

Style sheets are an add-on to HTML and "describe how documents are presented on screens, in print, or perhaps how they are pronounced"( 41). They act as a template for the layout and design of each Web page, allowing the author to specify margins, leading, colours, fonts, link styles, alignment, sizes of text and images, and many other features( 42). Elements of a style sheet are called definitions, and each definition consists of a listing of properties for such tags as headings and paragraphs, stating size, font, colour, indentation, etc.

32 Chapter 3 - Designing a Web Site

Developed from a first draft in May 1995, Cascading Style Sheets, level 1 (CSS 1), by

HAkon Lie, was accepted by the World Wide Web Consortium in December 1996 as their recommendation for a core specification. Several more specific specifications are now being worked on. At the core of all of them are the following requirements:

A style sheet language for the Web must: • Allow designers to express common typographical effects • Allow externally linked, as well as internal and in line style sheets • Be interoperable across common Web applications • Support visual, as well as non-visual, output media • Be extensible to support the requirements of tomorrow's Web design(40)

Style Sheets have been used in the typesetting of printed materials for a long time, and

CS S contains many of the same elements. The following are just some of the capabilities:

• Font properties can be selected in order to give a distinctive appearance to any

part of a document. Not only can the font be changed, but the size and

thickness of letters can be altered and features such as bold, italics, and small

capitals can be specified. Through such local changes, readability can be

enhanced by improving contrast and by using appropriately sized type.

• The appearance of the text as a whole can be controlled. Word spacing and

letter spacing can be set, as can vertical and horizontal alignment and the

indentation of paragraphs. CSS even provides for actions such as capitalising

the first letter of each word.

• Colours can be set for both text and background. The background colour can

be changed for specific portions of a document, which could be used for

highlighting important sections, or to provide contrast between sections.

33 Chapter 3 - Designing a Web Site

• Various elements are available to alter the appearance of sections of text.

Some are similar to the HTML tags

and
, which can render 

text indented or in a preformatted manner, but with more flexibility. Other

elements control the presentation of lists, allowing a greater variety of bullets

and numbering styles, as well as text wrapping options.

• Margins for documents and padding for individual elements on a page (sections

of text, or images) can be set for the top, bottom, left, and right. Borders in

various styles and colours can be placed around page elements, the width of

each of the four sides being individually adjustable. Again this could be used

to differentiate elements or to highlight important information. Since the width

and height of an element, including text, can be dictated, the need to use tables

to achieve text layout is eliminated.

There are three methods of implementing style Sheets in a document: external, document level, and inline(43).

• External Style Sheets are best if several documents need to have the same style.

The style sheet is contained in its own file, separate from the HTML

documents, and each document is referred to this file.

• Document Level Style Sheets are contained in the same file as the HTML mark­

up, but are not contained in the code. Rather, they are located in the head of the

document. This has the advantage for the user of no delay while a second file

is downloaded.

• Inline Styles involve placing the style attribute in individual tags within the

HTML code. This is the appropriate method if an author wants to vary the

34 Chapter 3 - Designing a Web Site

appearance of a page from one section to the next, drawing the reader's eye to

particularly important parts, or distinguishing different elements.

If more than one of the above types of style sheet is present then inline styles take precedence, with document level next and then external last in importance(44).

Similarly, if a reader has his or her own style sheets, or if the browser has defaults set, then the author's style sheet takes priority, followed by the reader's, and finally the browser's default. This is why Cascading Style Sheets were given their name.

Other CSS-based proposals include:

• CSS Positioning: extends CSS to include control over the positioning of

HTML elements at the pixellevel.

• Aural Cascading Style Sheets: a set of CSS properties for speech synthesisers

and other aural output devices. "Aural CSS allows the attachment of properties

controlling such speech factors as pitch, volume, pause control and spatial

audio qualities which can greatly enrich the aural presentation of a page"( 45).

• Web Printing Extensions: intended to improve the printing of Web pages,

including factors such as page breaks.

As style sheets are separate from the HTML code, it is possible to have a single sheet applying to many documents. This creates a simple way to apply a unifonn style to all the documents in a Web site or section of a site. Alternatively, different style sheets could be applied to the same document. If an author wants to change the appearance of a document, he or she can simply use a different style sheet, or if the person reading a document has special needs, they can define their own styles, such as larger text for the visually impaired or changes to the voice and intonation on a speech

35 Chapter 3 - Designing a Web Site synthesiser(46). Using style sheets, therefore, the infonnation on a Web site can be made available to a wider audience, as well as improving the look of the pages.

Currently not many browsers support style sheets, but new versions of the major browsers are including this support. Style sheets have been being progressively developed for about two years now, and the inevitable Netscape and Microsoft versions will, hopefully, be unified and absorbed into the final specifications ensuring that users of all browsers will be able to see the same documents in the near future.

3.5 Miscellaneous

As with any published work, documents in a Web site should be of a high standard, with well written content. All the documents should be spell-checked and proof read, and the factual content should be accurate and verified. Unlike printed materials,

Web pages can be rewritten and updated, and so people expect the information to be current as well. If an old document is left on the site for a particular reason and the information it deals with is out-of-date, then this should be made clear by putting the date of publication on the page.

One of the benefits of the World Wide Web is the way in which hypertext documents can be linked to each other. If extemallinks are provided, these should be checked to ensure the infonnation is accurate and of at least as high a quality as that on the site

36 Chapter 3 - Designing a Web Site being prepared. The links should be tested periodically to check that the page the user is directed to is still at the same address.

37 Chapter 3 - Designing a Web Site

References

1. Lynch, Patrick J. Yale Style Manual-Purpose ofyour site. (URL: http://info.med.yale.edulcaimlmanualIINTROlPurpose.html), 9 June 1997.

2. Isaacs, Margaret. Guide to good practices for WWW authors - The Web Site. (URL: http://info.mcc.ac.uklCGU/SIMAlIsaacs/Website.html), 15 June 1997.

3. Lynch, Patrick J. Yale Style Manual-Site Design. (URL: http://info.med.yale.edulcaimlmanual/SITES/Site_design.html), 9 June 1997.

4. Ibid.

5. Lynch, Patrick J. Yale Style Manual-Site Structure. (URL: http://info.med.yale.edulcaim/manual/SITES/Site_structure.html), 9 June 1997.

6. Apple web design guide. (URL: http://applenet.apple.comlhilweb/web.html), 2 June 1997.

7. Sano, Darrell. Designing large-scale web sites, 1996, p. 106.

8. Lynch, ref. 3.

9. Ibid.

10. Edwards, Deborah M. & Lynda Hardman. 'Lost in hyperspace': cognitive mapping and navigation in a hypertext environment. In: Ray McAleese (ed.). Hypertext: theory into practice. Oxford: Intellect, 1993, pp. 90-105.

11. McKnight, Cliff & Annette Simpson. Navigation in hypertext: structural cues and mental maps. In: Ray McAleese & Catherine Green (eds.). Hypertext: state ofthe art. Oxford: Intellect, 1990, pp. 73-83.

38 Chapter 3 - Designing a Web Site

12. McKnight, Cliff, Andrew Dillon & John Richardson. A comparison of linear and hypertext formats in information retrieval. In: Ray McAleese & Catherine Green (eds.). Hypertext: state o/the art. Oxford: Intellect, 1990, pp. 10-19.

13. Lynch, Patrick J. Yale Style Manual-Page Design. (URL: http://info.med.yale.edulcaimlmanuaIlSITESlPage_design.html), 9 June 1997.

14. Rees, Gareth. Style gUide. (URL: http://www.cl.carn.ac.uk/userslgdrlllstyle­ guide.html), 16 May 1997.

15. Isaacs, Margaret. Guide to good practices/or WWWauthors- The Web Page. (URL: http://info.mcc.ac.uk/CGUlSIMAlIsaacs/Webpage.html), 15 June 1997.

16. Lynch, Patrick J. Yale Style Manual-Page length. (URL: http://info.med.yale.edulcaimlmanuaIlSITESlPage_length.html), 9 June 1997.

17. Degener, Jutta. What is good hypertext writing. (URL: http://kbs.cs.tu­ berlin.de/-juttalwriting-html.html), 16 May 1997.

18. Lynch, ref.16.

19. Levine, Rick. Guide to web style: printing summary. (URL: http://www.sun.comlstyleguide/tableslPrinting_Version.html). 9 May 1997.

20. Lynch, Patrick J. Yale Style Manual-Editorial style. (URL: http://info.med.yale.edulcaimlmanual/SITESlEditoriaUtyle.html), 9 June 1997.

21. Levine, ref. 19.

22. Lynch, Patrick J. Yale Style Manual-Headers looters. (URL: http://info.med.yale.edulcaim/manual/SITES/HeadersJooters.html), 9 June 1997.

39 Chapter 3 - Designing a Web Site

23. Detweiler, Mark C. & Richard C. Omanson. Ameritech Web interface standards & guidelines. (URL:

http://www.ameritech.comlnews/testtownllibrary/standardlweb ~uidelines/ Principles.htrnl), 15 June 1997.

24. Detweiler, Mark C. & Richard C. Omanson. Ameritech Web interface standards & guidelines. (URL: http://www.ameritech.comlnews/testtownllibrary/standardlweb _guidelines/ Text.html), 15 June 1997.

25. Lynch, ref. 20.

26. Rees, ref. 14.

27. Lynch, Patrick J. Yale Style Manual-Typography. (URL: http://info.med.yale.edulcaimlmanual/SITES/Typography.html), 9 June 1997.

28. Apple web design guide. (URL: http://applenet.apple.comlhi/web/web.html), 2 June 1997.

29. Detweiler, Mark C. & Richard C. Omanson. Ameritech Web interface standards & guidelines. (URL:

http://www.ameritech.comlnews/testtownllibrary/standardlweb~idelinesl Text.htrnl), 15 June 1997.

30. Lynch, Patrick J. Yale Style Manual-Typography. (URL: http://info.med.yaJe.edulcaimlmanual/SlTES/Typography.html), 9 June 1997.

31. Lynch, ref. 20.

32. Isaacs, Margaret. Guide to good practices for WWW authors - Content. (URL: http://info.mcc.ac.uk/CGUlSIMAlIsaacs/Content.html), 15 June 1997.

40 Chapter 3 - Designing a Web Site

33. Apple web design guide. (URL: http://applenet.apple.com/hi/web/web.html), 2 June 1997.

34. Ibid.

35. Isaacs, ref. 15.

36. Detweiler, Mark C. & Richard C. Omanson. Ameritech Web interface standards & guidelines. (URL: http://www.ameritech.comlnews/testtownllibrary/standardlweb_guidelinesl Graphics.html), 15 June 1997.

37. Ibid.

38. Packet: Simson Garfinkel- Technology. (URL: http://www.packet.comlpacketlgarfinkeIl97/10/index2a.html). 14 July 1997.

39. Tilton, Eric. Composing good HTML. (URL: htlp:llwww.cs.cmu.eduHiltlcghJ), 14 July 1997.

40. W3C activity: Style sheets. (URL: http://18.23.0.23/pub/wWW/Style/Activity), 17 April 1997.

41. Web style sheets. (URL: http://www.december.comlcmc/mag/editorial/style.html). 14 July 1997.

42. D.J. Quad's ultimate Style Sheets tutorial. (URL: http://quadzilla.comlstylesheetsl), 29 April 1997.

43. Wilson, Brian. Cascading style sheet frequently asked questions. (URL: http://www.blooberry.com/htmllstyle/stylefaq.htm). 14 July 1997.

44. Wilson, Brian. Cascading style sheets-what the cascade means. (URL: http://www.blooberry.com/htmllstyle/cascade.htm). 14 July 1997.

41 Chapter 3 - Designing a Web Site

45. Wilson, Brian. Proposed cascading style sheet Extensions. (URL: http://www.blooberry.comlhtml/style/extensions.htm). 14 July 1997.

46. Packet: Simson Garfinkel- Technology. (URL: http://www.packet.com/packet/garfinkel/97/10/index2a.html). 14 July 1997.

42 Chapter 4 - Tools to improve a Web site

Chapter 4 - Tools to improve a Web site

Writing a simple Web page is a very easy task. The basic HTML tags dealing with the structure of the page are not difficult to master, and the rest is straightforward word processing. This, however, will only produce a bland page with the text and possibly some images. To write a complex page with the layout of the various elements precisely determined is not so simple, and if the HTML is written in a text editor then a great deal of planning, sketching of layouts, and calculating heights and widths in pixels is required. This is made even more time consuming if tables or forms are included in the page. Image maps, with hot spots that can be clicked to activate a hypertext link, can also be created by hand, but this would require exact measurements of the image, and a lot of hard work calculating the co-ordinates, in pixels, of the vertices of each shape, or the centre of a circle.

To simplifY the creation ofHTML documents various editors and converters can be used. HTML editors today are often like a word processor, with most of the structural tags added automatically. The more complex parts of the process can usually be achieved simply by selecting options from menus and toolbars, just like any other editing software. HTML converters also provide the necessary markup, but in this case a document with another format, for example a word processed document or a spreadsheet, is converted into a form which can be displayed on a Web browser.

Image map creators can be used to simply draw the required shapes onto the image, and then enter the URL of the link. All the co-ordinates and necessary HTML tags are automatically put into a suitable file.

43 Chapter 4 - Tools to improve a Web site

HTML alone can only provide static documents, with no facility for interactive features such as searching other Web pages, or databases. The common gateway interface is a mechanism whereby many interactive tasks can be performed, including searches. This requires the use of executable programs on the computer running the server, and a means for the browser to communicate requests to the server to execute the correct program in the required manner. Other, still evolving means of producing interactive pages include Java and JavaScript. These are programming languages which are executed within the browser, Java programs being sent as separate files to the HTML document, and JavaScript embedded in the document.

The tools mentioned above can all be used to create or enhance the content of a Web site, but they cannot make the documents accessible to anyone. For that a Web server is needed. The server is the software which, using the correct protocols, listens for requests from browsers and then returns the requested document, script, file, or any type of data that can be converted into binary code. As the server is the most important piece of software, and indeed the only essential one, it will be considered first.

44 Chapter 4 - Tools to improve a Web site

4.1 Servers

The World Wide Web operates on a client/server system. The client is the individual user at their terminal, requesting documents or files through his or her browser. "The

Web server is the software responsible for accepting browser requests, retrieving the specified file (or executing the specified CGI script), and returning its contents (or the script's results)"(l). A server uses a TCPIlP (Transmission Control Protocol/Internet

Protocol) connection to receive requests and return the results. Currently this is the only type of connection that is supported by all servers and browsers(2). Servers are often known by the name H1TPd, after a Unix convention of naming daemons by the name of the service followed by the letter 'd'. The service in this case is the Hypertext

Transfer Protocol (HTTP).

Browser Server

Request document

Figure 4.1: The client/server system.

45 Chapter 4 - Tools to improve a Web site

4.1.1 Market Share

There are various platforms that server software can be run on. The most common is

Unix, but the percentage of servers on Windows NT is steadily increasing, and in July

1997 the figure was 21.1%(3). Other servers run on Windows 95, Windows 3.1, and

Macintosh.

In the Netcraft Web Server Survey for July 1997, "A survey of Web Server software

usage on Internet connected computers"(4), 1,203,096 sites were polled with an

HTTP request for the server name. The top server developers globally were Apache

(42.62%), Microsoft (16.9%), Netscape (11.76%), NCSA (5.63%), and O'Reilly

(3.24%), as shown in Table 4.1. In the United Kingdom, the top server is thttpd, with

39.33%, against Apache's 33.15%(5), whereas thttpdonlymanages2.4%world

wide(6).

Developer Jun97 Percent Jul97 Percent Change Apache 489695 43.83 512768 42.62 -1.21 Microsoft 186097 16.66 203316 16.90 0.24 Netscape 135387 12.12 141504 11.76 -0.36 NCSA 68278 6.11 67705 5.63 -0.48 O'Reilly 38167 3.42 39038 3.24 -0.18

Table 4.1,' Web server developers, market share July 1997

46 Chapter 4 - Tools to improve a Web site

Server Jun97 Percent Jul97 Percent Change Apache 489695 43.83 512768 42.62 -1.21 Microsoft-lIS 167759 15.02 186799 15.53 0.51 NCSA 68278 6.11 67705 5.63 -0.48 Netscape-Enterprise 50423 4.51 54438 4.52 0.01 Netscape-Communications 36244 3.24 37319 3.10 -0.14 Netscape-Commerce 39359 3.34 37227 3.09 -0.25

Table 4.2: Web server software, market share July 1997

As can be seen from Table 4.1, all of the top developers, apart from NCSA,

experienced an increase in the use of their products during the last month, as the

World Wide Web expanded. Microsoft, however, is the only producer increasing its

market share, while all the others are decreasing. These trends have been continuing

for some months.

4.1.2 Features of Web Servers

Web servers today offer far more than simple file delivery. Most packages allow

access to databases, and some integrate the server software with the

(for example, Microsoft lIS with Windows NT), to give even more flexibility.

A good Web server today should have many features:

Servers should be simple to install and easy to customise. Often the ability to

customise will have to be weighed against the ease with which it can be done.

47 Chapter 4 - Tools to improve a Web site

Apache is one of the most customisable servers, but changes are made through rewriting the program code, so some programming knowledge is required. On the other hand, Windows-based packages, such as Netscape Enterprise and Microsoft lIS, are simpler to work with, but more restricted in flexibility. Whatever the customising procedures involved, comprehensive documentation should be supplied for everything that may need to be done. This may be printed, or on-line.

Some Web servers can be accessed via Web browsers for remote administration. This is especially useful if the server is a long way from the administrator's office, or if more than one person has access, as all the relevant people can make adjustments to the setup or contents from their own workplace. Site management tools should be intuitive, that is, able to perform many task automatically, including the checking and repair of broken links. Content management may also be possible through the server, saving the time and effort of moving pages between server and editor.

Security is increasingly important, especially where remote administration is possible.

The amount of sensitive information being transmitted is constantly increasing and at the same time ever greater numbers of people have software capable of accessing a server, and the ability to tamper with the. All Web servers allow the site administrator to install procedures for authentication ofthe users. A user ID and password may be required before access is given, and some servers restrict access by IP address or host name. Encryption can also be employed, and all Web servers support Secure Sockets

Layer (SSL) 2.0. Some, such as Netscape's and Microsoft lIS, implement SSL 3.0, which supports more key-exchange and encryption algorithms than SSL 2.0. "SSL

48 Chapter 4 - Tools to improve a Web site creates a secure, encrypted channel between the server and browser by using certificate authentication"(6).

As well as plain text and graphics based pages, Web servers can use various programming languages, such as Java and JavaScript, to provide more active pages.

CGI programs can also create interactive Web pages and provide search facilities to help navigate around a site or a local database. These topics are discussed in sections

4.5 and 4.4.

4.1.3 Reviews of Some Web Servers

4.1.3.1 Apache

The Apache Group is an association of server users with no outside sponsors or institutional affiliation who wanted a better product than was available, and set out "to provide a secure, efficient and extensible server which provides HTTP services in sync with the current HTTP standards"(7). The Apache server, developed by the

Group from the NCSA 1.3 code early in 1995(8), is a "full-featured, general purpose

HTTP server"(9). The software is available free to all users, and comment and suggestions for improvements are welcomed.

Apache is currently the most popular server, accounting for over 40% of those in use in July 1997(10). When version 1.2.0 was released at the start of June 1997, over

91,000 sites moved to the new version in the first month( 11). It is easy to use,

49 Chapter 4 - Tools to improve a Web site reliable, highly configurable through the source code (requiring knowledge of Unix), and much liked by its users. Although there is no official technical support, known bugs and fixes are listed at the Apache Web site. There is a mailing list "to inform people of new code releases, bug fixes, security fixes, and general news and information about the Apache server"(12), and questions can also be posted to comp.infosystems.www.servers.unix, where many Apache developers field questions.

Since everything is accessed through the Unix shell, Apache does not have a management interface. It cannot therefore be administered from a remote browser.

Whenever it is necessary to reconfigure the server, the whole code must be recompiled, so the server will be down while this takes place. There are some basic site management capabilities, but no utilities to handle content management or creation. A 'Redirect' directive operates to tell the server that a document has been relocated to a different server, and •Alias' to indicate a new path on the local site.

The public-domain version of Apache has very few security options. Restricted access by IP address is possible, but if further security, such as encryption, is required, then the commercial version, 'Stronghold Apache', must be bought.

4.1.3.2 Microsoft Internet Information Server

Microsoft Internet Information Server (IIS) is a Web server that runs on Windows NT and is integrated into the operating system. IIS is free and is supplied with all purchases ofNT Server 4.0, but will it only run on Windows NT. With a market

50 Chapter 4 - Tools to improve a Web site share in July 1997 of 15.53%(13) and rising, IIS is now the second most popular Web server software, and the most common not running on Unix. As well as the WWW service, lIS provides FTP and Gopher services, all of which are controlled by the

Internet Service Manager (ISM) application.

IIS is very easy to install, configure, and administer, because everything is done through graphical menus and many of the modules in Windows NT are used, for example, User Manager to maintain users and groups, and the Event Viewer and

Performance Monitor to view such things as cor requests and bytes sent per second.

IIS can be remotely administered by running an HTML version of ISM from a browser.

Through plug-ins, IIS supports all of the common programming languages used on the World Wide Web, for instance Java and Perl, and it includes native scripting engines for VBScript and JScript. Active Server Pages (ASP) allows programmers to create compile-free language-independent scripts. ASP requests are all returned as standard HTML. There are also facilities for accessing databases in various formats.

Content management and site management are performed through FrontPage 97.

Complex Web pages and Active Server Pages can be built without having to write any

HTML code. Index Server 1.1 can be used to "index and search site content and perform advanced searches on document properties"(l4), and the NetShow add-in provides audio, video, text, and images for users on low-bandwidth connections, using multicasting and data-streaming techniques(15). Reports on server usage can be produced automatically to presentation quality. These consist oflog-fiIe data, and are

51 Chapter 4 - Tools to improve a Web site produced in HTML format, complete with graphs and tables. The data in the reports can be easily customised. "The Internet Server API (ISAPI) is designed for custom server extensions like content indexing, log analysis, data input forms, bulletin boards, database access and third-party applications such as document management, accounting systems, and Web-site creation and management tools"(16).

Security in lIS is based on Windows NI's in-built security features. It is possible to restrict access to a directory or URL by user, group or IP address. Authentication and

SSL 3.0 support are also available.

4.1.3.3 Netscape Enterprise Server

Netscape Enterprise Server 3.0, a commercial server running on Windows NT or

Unix, was released on the 18th of May 1997(17). Enterprise is based on Netscape

FastTrack Server(18), along with a robust Web development platform, and site and content management tools have been added. Installation is very simple and self­ explanatory, with the installation program prompting for all the information it requires. Almost every feature can be fine tuned to suit individual needs, allowing the site administrator to control "everything from performance tuning to creation of default configuration styles for documents"(19). The management interface, Netscape

Server Manager (NSM), which can be run from any browser supporting frames and

JavaScript, permits remote administration to be performed on any type of platform.

52 Chapter 4 - Tools to improve a Web site

Enterprise supports all of the standard utilities, such as COl and JavaScript, and also comes with Netscape API (NSAPI), which offers great scope for site development. A

300-page JavaScript guide is provided(20), giving all authors the capacity to create dynamic pages. The package contains built-in search engines and automatic cataloguing tools. Navigator Gold is included for writing Web pages in, and has support for HTTP 1.1. Pages can be changed on the server and then resaved, all with the click of a button. Live Wire provides JavaScript methods for accessing databases which are using any of the common data types.

Access to the server can be limited by user, group, IP address, or host/domain name.

SSL 3.0 and encryption are all supported, as is Lightweight Directory Access Protocol

(LDAP). Stronger ciphers are available if required. Intelligent agents can be programmed to notify the site administrator or the authors bye-mail whenever a document changes.

For maintaining the Web sites, the administrator has three tools: Site Manager, which . provides ''tools for tracking all site elements and verifying internal and external hyperlinks"(2I); Application Manager, which allows the author to create, edit, debug and execute Web applications using simple tools; and Communicator, for "drag-and- drop, HTTP-based Web publishing"(22) and for updating documents and organising them into directories.

53 Chapter 4 - Tools to improve a Web site

4.1.3.4 NCSA

The NCSA server is one the older packages, is free, and runs on Unix. Its popularity is waning, however, because Apache has improved on the original code and servers running on Windows NT are increasing their market share. In July 1997 NCSA still accounted for 5.63% of servers in use, but the figure is declining.

The NCSA server supports all of the major functions of the other servers, but has few of the frills. HTTP 1.0 is accepted, and it can handle older browsers using HTTP

0.9(23). cor capabilities and Imagemap support are also provided. Security is covered by limiting access to the server at the directory level through the use of passwords. It is also possible to restrict access to specific URLs. NCSA features

Server Side Includes, "a system under which you can place keywords, directives, and snippets of executable code directly into a hypertext document in order to change its appearance on the fly"(24).

4.2 HTML Editors

In the early days of the World Wide Web, HTML had to typeq out manually using a text editor. This was not only very time consuming, but it was also easy to make mistakes in the markup. Determining matters of layout, such as what size to make the images and where they should go, or the sizes of tables and frames, is very much trial and error: entering the code, saving, viewing on a browser, and then re-editing.

54 Chapter 4 - Tools to improve a Web site

Nowadays there are dozens ofHTML editors available, providing a wide range of facilities from support for tables and frames to spell checkers.

At the basic level, an HTML editor should automatically create the and

tags, provide line break and paragraph tags, and create links. However, it is when the more advanced features are included that an editor can greatly simplify creating a high quality Web site. Many of the suggestions made in the chapter on style are facilitated by a good editor. It should offer a table designer, a frame designer, and a form designer, and possibly contain an image map editor and support for JavaScript. An editor for ordinary images is also useful, avoiding the need to change applications while manipulating graphics. Some editors contain site management tools so that the structure of the site can be altered while writing or editing the pages. A built-in previewer removes the need to save the file and re-open it in a browser, and may have options for more than one of the popular browsers.

Using a previewer with multiple options would ensure that the pages are equally accessible to all users.

It is important that the HTML mark-up is correct, and there are many different versions ofHTML. Currently most browsers should support versions 1.0 and 2.0, and many also support version 3.0. There are, however, various Netscape and Microsoft specific enhancements which are displayed differently or not at all on browsers other than those for which they were developed. An HTML editor should include some form of HTML validation, and preferably one that includes the N etscape and

Microsoft extensions. This will help make sure that the pages being written are going to be seen as intended by the majority of browsers. HTML 3.2 is under

55 Chapter 4 - Tools to improve a Web site development, and any new editor should also include the latest innovations. A good editor should also support style sheets, enabling greater control over the design of the pages.

A useful feature of many editors is dialog boxes for links and images. With these, when a link or image is inserted a dialog prompts the entry of the various possible attributes, such as height, width and an alternative for images, or provides a browsing feature to find the file a link points to. Many Web pages will contain material that has already been written and formatted in a normal word processor, and it is convenient to be able to import these into an HTML editor and have all the typographical mark-up added automatically. Finally, if a spell checker is included, then it is possible to ensure each document is free of spelling mistakes without having to transfer it to a standard word processor.

As stated earlier, numerous HTML editors are available, both commercially and for free. This section will now look at three of the more recent ones: HoTMetaL Pro 3.0,

Luckman's WebEdit 2.0, and Microsoft FrontPage 97.

4.2.1 HoTMetaL Pro 3.0

HoTMetaL Pro is "a full-featured, powerful, and versatile program. It offers both

HTML and WYSIWYG editing, an excellent graphics editor, and the best HTML

checking"(26). The interface is a combination of text and graphical, giving the option

of editing documents at the mark-up level.

56 Chapter 4 - Tools to improve a Web site

ID ~ [http://www ••q. oom/] ~====~~===::; SoftOuad@)

te>FO\Uld ~[http://www •• q.com/' htto:/lwww.sa.com/ (ID

®Overview: @ . ~ ID. Windows 3.xfNT/95 (3. ID. Stand alone program @ ID. Hybrid between text style ID. HTML checking -- during I[). Very Stable @ I[). Commercial: @ I[). ~[hotmetal.q1f] ID· ~ [u.rcmnt . nt~'l",'otr'. ':~2!..::""":'~~~.;.:l!,:::::"!,,, __""":'~~~.;.:l.;.:l.:.J

Figure 4.2: Screen shot ofHoTMetaL Pro. From Carl Davis's HTML Editor Reviews. (http://www.techsmith.comlcommunity/htmlrevlhotmetal.gif)

HoTMetaL Pro includes a frames editor and graphical editing of forms, and provides easy editing of tables. Tables are inserted and edited in much the same way as with a word processor, making it simple to arrange the layout of information on each page in the most appropriate manner. There is a built-in image editor (Metal works) which

"allows users to quickly modify graphics including: transparency, image map creation, resizing, and bit plane (number of colors) optimization"(27). Hundreds of

JavaScript applets (sections of executable code) are provided, and these can be easily added to a document, or new ones can be written(28).

57 Chapter 4 - Tools to improve a Web site

HoTMetaL Pro comes with site management tools which can provide a detailed tree structure of the site layout. This view of the layout will simplify the process of structuring the site, as all the pages can be saved into the correct directories without the site administrator having to refer to a structural diagram or reorganise the documents and directories afterwards. Broken links and files which have no links to them can be located, ensuring all the information can be accessed(29). Site-wide searching can be performed on all the documents saved, and Style Sheets can be created.

Documents can be previewed using a selection of browsers to ensure the presentation is suitable for all users. HTML validation is performed both during editing and when a file is imported. The editor can state which level of HTML is being used, and it points out any browser specific tags. One of the drawbacks of HoTMetaL Pro is the automatic rules-checking, which prevents the inclusion of new tags as and when they are developed. This feature can, however, be disabled(30). HoTMetaL Pro also contains a spell checker, a thesaurus, and a search and replace facility.

Word processor files can be imported and converted into HTML files. HoTMetaL

Pro accepts over 20 different file formats, including Word and WordPerfect(31).

Files can be converted individually or in batches as required. Many sample files and templates are included which simplify the task of creating the desired appearance for a site.

58 Chapter 4 - Tools to improve a Web site

HoTMetaL Pro 4.0, soon to be released, adds several new features, among which are a step by step site maker, support for the latest Web enhancements, and a new improved user interface(32).

4.2.2 Luckman's WebEdit 2.0

Luckman's WebEdit is a text-style editor, although the user can have a split screen with the HTML code on the left and the resulting page displayed on the right. Editing capabilities are provided for forms, frames, and tables. Tables can be imported into the editor in a spreadsheet-type grid, and the editor adds the required HTML tags(33).

A rudimentary image map editor is included which allows the user to visually divide images into different hotspots.

Figure 4.3: Screen shot ofLuckman 's WebEdit. From Carl Davis's HTML Editor Reviews. (http://www.techsmith.comlcommunitylhtmlrev/webedit.gij)

59 Chapter 4 - Tools to improve a Web site

WebEdit contains some basic site management tools, including the ability to manage files which will help site administrators to maintain the desired structure. A link validation wizard checks that documents' links are still correct, both intemallinks and those to different servers(34). The validation of extemallinks is useful, as it helps ensure that all the information pointed to from a site is still accessible. There is also a table of contents generator which creates intemallinks to all the sections on a page, thereby simplifYing navigation through a document. A simple quick previewer is built into the editor, so that an instant check can be kept on the appearance of a document. It is a drawback that the previewer does not support all of the tags in use, only including up to HTML 2.0(35). Fortunately, any Windows-based browser can be linked up to WebEdit, several browsers if required, and documents can be tested on these at the click of a button. WebEdit supports all the current versions of HTML up to 3.0, as well as Netscape 2.0 and Explorer 2.0 enhancements. HTML tags are validated against these standards, and a help file on HTML is included which lists each tag and the standard that supports it. Another feature removes all of the tags in a document or section, which enables the user to reformat the document or; if desired, transfer it to a word processor(36).

Luckrnan's WebEdit makes considerable use of dialog boxes for images, links, figures, forms, and tables(37). The link and in-line image dialogs contain every possible option, including the Netscape specific and Explorer specific ones, and a

URL buiIder(38). Unfortunately attributes can be added to tags only when inserting the tags, and they cannot be modified. This may be a problem if changes are made to a site, although it is still possible to edit the HTML code. Finally, a spell checker is available in several languages, and further languages are being added all the time.

60 Chapter 4 - Tools to improve a Web site

4.2.3 Microsoft FrontPage 97

Microsoft Front Page is a graphical editor which contains everything needed to start a

Web site, including a personal Web server with CGI extensions for searching, collecting feedback, and creating discussion groups(39). There is support for Internet

Database Connector, simplifying the task of creating pages to retrieve information from databases, and access to all the packages is integrated into the editor. FrontPage comes with a WYSYWIG frames editor, and it is possible to "graphically manipulate forms and tables inside the displayed page"(40). Graphics can be added to pages by the use of Microsoft Image Composer and Microsoft GIF Animator, and over 2000 sample images are provided(41). Java applets and ActiveX controls can be written, enabling the creation of interactive, dynamic pages.

FrontPage includes comprehensive site management tools, which allow viewing and manipulating oflinks, directories, and individual files( 42). Links can be automatically updated as files are moved or renamed, thus maintaining the required site structure without having to manually edit the code or worry about broken links.

HTML 3.2 is supported, as well as the various Microsoft developments such as

Explorer HTML enhancements and VB Script. Files are viewed using the Microsoft

Internet Explorer browser. Although they will be best suited to this browser, files can also be previewed in other browsers. As a Microsoft product, FrontPage is biased towards other Microsoft products; for example, it integrates with the various packages in Office and omits some common features such as JScript(43).

61 Chapter 4 - Tools to improve a Web site

Page .m'~ ~:::ion lok' J' I]j> Dela~ Home Page '; ifJ ~ Dd:a·j What's New PbgO &l..... IliP Del&! Products to Services p~ :,•.,:, Irn;> Oela-l Table 01 Contern Page

.'f] ~ Deb-l Feecllack FolII'I PlIge ~ Deta-l leM! Sea/ch PBge :~ ~ Del:a-I T"Kt Search Page Ct} ~ Del&! Table 0/ Contents Page '<'.1::1... t._TA.... \.I":"D ...··II Prock.lct

Figure 4.4: Screen shot ofMicrosoft FrontPage. From Carl Davis's HTML Editor Reviews. (http://www.techsmith.com!community/htmlrev!frontpage.gij)

Dialogs are provided for forms, links, and images; the latter two allow browsing for locations. Files can be imported from most Microsoft applications and other common word processors, as can existing HTML documents or directories, even straight from the World Wide Web. As FrontPage integrates with Microsoft Office 97, all the familiar tools, such as spell checking and thesaurus, can be used( 44).

62 Chapter 4 - Tools to improve a Web site

4.3 Image Maps

Image maps are images which have been divided into sections, each of which when clicked provides the user with a separate link, or other effect. A common use for the image map is as a navigation bar. Any style of navigation bar can be produced, and only one image needs to be downloaded. An image map can add to the appearance of a Web page and can be useful for providing a graphical set oflinks to other areas in a

Web site rather than using a text menu to list the options. If the image is large, however, the downloading time can increase, and it is important to be aware of users who have text only browsers or who have images turned off. For these reasons it is important always to include a textual menu in addition to any image map.

4.3.1 How Image Maps Work

Three elements are needed for an image map to work. First an image, of any type that browsers will display. Second, to produce a map a set of map data must be created, either by hand, or using an image map creator. Map data consist of the co-ordinates of the regions that are the hot spots, along with the URLs to which the hot spots points. Each region must include a shape descriptor chosen from 'rectangle', 'circle',

'polygon' and 'background'. The 'circle' descriptor also allows ovals, and the

'polygon' descriptor can have any number of vertices. The 'background' descriptor is for any region of the image not covered by a hot spot. Third, once the co-ordinate

. data have been calculated, an instruction must be placed in the Web page to display

63 Chapter 4 - Tools to improve a Web site the image and inform the browser that there is map data in the document or on the server, depending on which type of image map is being used.

4.3.2 Server-side and Client-side Image Maps

There are two categories of image map, server-side and client-side.

Server-side image maps, which were the first to be developed, store the map data on the server, and the browser simply sends the selected co-ordinates with the HTTP request. A program on the server will then compare these co-ordinates with the map data and determine which link has been requested. The method the server uses to fulfil the request varies, as there are two standards for server-side maps developed by

NCSA and CERN(45). The. main drawback of server-side maps lies in the fact that the server has to do the processing, and this can be slow if the server is busy.

Client-side image-maps were developed as a solution to the slowness of server-side maps. With a client-side map the map data are stored in the HTML code and can be placed anywhere in the document. When a user selects a region of a map, the browser interprets and processes the map data without any interaction with the server. Once the link has been determined, the browser then sends a standard request to the server.

64 Chapter 4 - Tools to improve a Web site

4.3.3 Image Map Creators

While image map data can be calculated manually, this can be very time consuming, especially for polygons with many vertices. Image map creators are software packages which greatly speed up the process of creating maps. The required image is loaded and the hot spots selected using standard selection tools. Some packages allow free-hand selecting, and then convert the co-ordinates into a polygon. References for the links must then be entered, possibly in answer to a prompt.

4.3.3.1 Web Hotspots

Web Hotspots is a powerful image map editor from Automata(46). It can be purchased on its own or as part of a more complete image editor. All of the basic tools are included, along with a free-form tool which allows the user to create irregularly shaped areas. Once a free-form area has been drawn, the edges can be moved, so perfect mouse control at the initial drawing stage is not necessary. Other useful features include the ability to test the map while editing, including checking external hyperlinks, ensuring there are no broken Iinks(47). Support is given for both server-side and client-side mapping, although only for NCSA server-side maps.

LOWSRC images for browsers with slow connections can be saved automatically.

65 Chapter 4 - Tools to improve a Web site

'Web Hotspots 8~ El . file .Edit 8n~. Qptions Window l!elp SAMPLE.HOT "Sample Image ~ Map" IIIII~ El

,------, ' : .. : , '\ ' , ______1'

Figure 4.5: Screen shot a/Web Hotspots.

4.3.3.2 Mapedit

Mapedit, from Boutell, is "a graphical editor for World Wide Web image maps"(48) which can be used to create client-side and server-side image maps. It supports both the NCSA and CERN standards, and in the latest release all common image file formats can be used. Only limited tools are offered, and the interface is very simple, but Mapedit does everything needed to create image maps(49). Although there is no free-hand selector, it is possible to add points to a polygon hot spot. There is a test feature to check all the hot spots have the correct URLs attached, but no facility to verifY the links. A limited amount of JavaScript can be added while creating or

66 Chapter 4-Tools to improve a Web site editing the map, using onMouseOver and onMouseOut attributes that control the browser's behaviour as the cursor passes over each hot spot.

Object URl Ei

Figure 4.6: Screen shot ofMapedit

4.3.3.3 LiveImage

Livelmage is a powerful tool for creating image maps. The creator of Livelmage originally produced the freeware image map creator 'Map This', before developing and improving the software for Mediatech(50). Livelmage uses wizards to handle many aspects of the map creation process, including selecting and creating the image

67 Chapter 4 - Tools to improve a Web site and publishing the final HTML document. As well as importing image files, it is possible to create graphics within the editor. A two-frame view is provided, with "the map displayed on the right side and a hierarchical view of all the file's hot spots on the left"(51). All of the standard shapes can be drawn, and using the zoom-in tool it is possible to draw a polygon that is almost free-hand. Live Image produces server-side and client-side maps, and generates all the necessary code automatically.

4.4 The Common Gateway Interface

Using the Hypertext Markup Language (HTML) alone only static pages with links can be created. These pages cannot change unless the document is rewritten. Servers on their own can only send the document and tell the browser what form it takes.

This is all very well if the required information is easily found and on a small number of pages. If, however, this is not the case, then the Common Gateway Interface (CGI) can be used to provide interactive elements in a Web site to increase the likelihood of finding what is required. CGI "lets Web servers execute other programs and incorporate their output into the text, graphics, and audio sent to a Web browser. The server and the CGI program work together to enhance and customize the World Wide

Web's capabilities"(52).

The Common Gateway Interface started as a means oflinking databases to the World

Wide Web(53). A program was needed to transmit information to a database, gather the results, and send them back to the client. This is still one of the main uses of CGI,

68 Chapter 4 - Tools to improve a Web site

but many other types of programs (scripts) can be written to invoke external

programs, or create pages on the fly.

Using CGI scripts a Web site can become truly interactive, allowing users to perform

searches on the contents of the site, linking to external databases, or perhaps accessing

constantly changing data, such as share prices or the availability of a product, that would otherwise need a frequently rewritten page.

There are security risks with CGI programs, since outside people are running programs on the computer and sending data to it. For this reason, CGI scripts are usually kept in a special directory, commonly called CGI-BIN, so that access can be restricted, and so that the server knows to execute the programs rather than display them.

4.4.1 ProcesseslMechanisms

Most HTTP servers will only serve documents, and are not able to process data that has been input by users(54). This requires a separate program along with a mechanism to deliver the data to this program and back to the client. The mechanism of the CGI process can be summarised as follows:

I. Your browser decodes the first part of the URL and contacts the server.

2. Your browser supplies the remainder of the URL to the server.

3. The server translates the URL into a path and file name.

4. The server realizes that the URL points to a program instead of a static file. \

69 Chapter 4 - Tools to improve a Web site

5. The server prepares the environment and launches the script.

6. The script executes and reads the environment variables and STDIN.

7. The script sends the proper MIME headers to STDOUT for the forthcoming

content.

8. The scripts sends the rest of its output to STDOUT and terminates.

9. The server notices that the script has finished and closes the connection to your

browser.

10. Your browser displays the output from the script(55).

STDIN (Standard Input) and STDOUT (Standard Output) are used to pass data into and out of the server and script. MIME (Multipurpose Internet Mail Extensions) headers tell the browser what type of content the file contains (text, audio, etc.). Even if no other data is returned by a cm script, it must return a header message explaining the MIME type.

4.4.2 How Data is Sent

There are two methods of sending data from the client to the server. The first of these is 'GET', where the data is sent in the URL. For example, the URL http://info .lboro. ac. uk/cgi-bin/program?query_string would send the data in query_string to the server at info.lboro.ac.uk where it would be forwarded to the script program(56). The second is 'POST'. In this case, the data is contained in the body of the HTTP request sent by the client to the server. POST is more complex than GET, but allows more complex data to be sent(57).

70 Chapter 4 - Tools to improve a Web site

4.4.3 HTML Forms

The data that will be sent to the script is usually collected using HTML forms. The

tag requires two attributes, action and method(58). The action attribute tells the browser the URL of the COl program to which the data is being sent, and the method attribute states whether GET or POST is being used. The tag allows the actual input of the data, providing eight types of input: text, password, hidden, checkbox, radio, image, submit, and reset( 59).

El HTML Forms example· MiclOsoft Inlernel Explorer I!I@&J ~ew lao Fjlvoriles !:Ielp ITl ~. Refresh Home'

What cWly newspaper. do you read? w: Guardian r Independent r The Time. r Telegraph w: Others

How do you rate OUt service? (' Great! jo'. VetyGood (' Good (' Fair (' Poor

Miscellaneous Comments: ny lexl can be enlered inlo Ihis box!!!

Figure 4.7: HTMLform showing checkbox, radio, text, submit, and reset inputs.

71 Chapter 4 - Tools to improve a Web site

4.4.4 Designing Applications

A COl application has a very simple structure. First the server instructs the program to begin, and the program determines the method of input (60). The script must decode the query string and separate the variables. Next, the script must process the

input and return the required output. What this entails will vary depending on what the script does, but it must always create a header containing the content-type. Finally the script must clear up after itself and terminate. If any system resources have been

allocated to the application, they must be freed-up, as each COl application is a one­

off operation.

The process of writing COl scripts is greatly simplified by the use of libraries of code..

Many libraries containing whole scripts, modules, and routines in all the commonly used programming languages are publicly available and these scripts can be used or modified for specific tasks. It is wise to build up a personal library of scripts, especially if many of them contain common elements.

4.4.5 Programming Languages

CGI programs can be written in any language that can be run on the computer that is running the server(61). Table 4.3 indicates which platforms support which commonly used languages(62).

72 Chapter 4 - Tools to improve a Web site

Language Macintosh OS/2 VMS Win-NT UNIX AppleScript ,f X X X X UNIX Shell X ,f X X ,f C/C++ ,f ,f ,f ,f ,f Visual Basic X X X ,f X Perl ,f ,f ,f ,f ,f TCL X X ,f ,f ,f Java ,f X X ,f ,f JavaScript ,f X X ,f ,f VBScript X X X ,f X

Table 4.3: Common operating systems and CGI programming languages

There are three types of scripting language: interpreted languages, compiled

languages, and compiled interpreted languages.

Interpreted languages require a separate program called an interpreter to perform the

programmed tasks. They are written in ASCII form, and include AppleScript, UNIX

Shell Scripts, and PERL.

Compiled languages are more complicated, as once the code has been written it must

be processed by a compiler to convert it into binary code that can be executed by the

computer. Compiled languages do, however, take up less space and run faster than

interpreted languages. Examples of compiled languages are C, C++, and Visual

Basic.

Java is an example of a compiled interpreted language, where the interpreter is located

on the client's computer as opposed to with the server, and the applications are run

73 Chapter 4 - Tools to improve a Web site there rather than on the server's computer. JavaScript and Visual Basic Script are also run on the client side.

4.4.6 Server Side Includes

Server Side Includes (SSI) are "directives you can place into an HTML document to execute other programs or to output data, such as file statistics or the contents of environment variables"(63). SSI are not really CGI applications, but they can be used as an alternative for simple tasks such as placing the current date or time in a static page. Not all servers support Server Side Includes yet, although this will probably change. Since the server has to parse each document that is encoded with SSI, there will be more work for the server(64), so it is best only to use Server Side Includes for simple, commonly used applications.

Browser Server

, Reques~ document

Figure 4.8: How a server processes a document containing Server Side Includes. From CGI Programming: Chapter 5 (http://www.ora.comlinfolcgilch05.htm)

74 Chapter 4 - Tools to improve a Web site

4.5 Java & JavaScript

4.5.1 Java

Java is an object oriented programming language (see 4.5.1.2), related to C++, that

can be used to write applications which can be transmitted over the Web. It can be used to produce two types of program: applications which run on a computer under the operating system, and applets which are "designed to be transmitted over the

Internet and executed by a Java-compatible Web browser"(65). These applets are the key to Java's usefulness.

4.5.1.1 The History of Java

Java was created by a team of programmers at Sun Microsystems. It was conceived in 1991, and the first version, then called Oak, was implemented in autumn of 1992.

Oak was originally designed to be a "platform-independent language that could be used to create software to be embedded in various consumer electronic devices, such as microwave ovens and remote controls"(66). When the World Wide Web came along, it was clear to the developers that Oak had potential for Web applications, so after further development it was released as Java in 1995. Since Java is platform­ independent, it is well suited to the Web, where browsers and servers can be running on many different computers and operating systems.

75 Chapter 4 - Tools to improve a Web site

4.5.1.2 How Java Works

As stated earlier, Java is an object oriented language. Computer programs consist of

code and data, and with object oriented programming a program is organised around

its data (i.e. objects) and a set of interfaces to that data. In effect, the data controls the

access to the code(67). Java, and other object oriented languages, such as C++, use

hierarchical classifications. A program is broken down into its component parts, or

objects, and these are put into classes. Each class defines the common structure and

behaviour of the objects it contains, and those of any sub-classes, although each sub­

class and each object in a class can have specific properties as well. Java comes with

built-in class libraries containing methods (sections of code performing specific tasks)

for many common functions such as input and output features, graphics facilities,

networking, and generating sounds( 68). These methods can be incorporated into

applets, greatly simplifying the task of programming.

In order to be platform-independent, the output of a Java compiler is not executable

code. Rather, it is bytecode, that is "a set of instructions designed to be executed by a

virtual machine that the Java run-time system emulates"(69). Applets are therefore

run within a Java compatible browser, which acts as an interpreter for the bytecode,

and never interact with the rest of the computer. Combined with integral bytecode

verification which ensures the code being received is the same as left the server, Java

protects the end-user's computer from viruses and prying parties. Since the bytecode

is interpreted, all that is required for full platform-independence is a suitable Java run­

time system which can interpret the code for the particular operating system.

76 Chapter 4 - Tools to improve a Web site

4.5.1.3 Uses for Java

Java can be used to develop Web pages beyond static information and introduce interactive applications. At the basic level, features such as text scrolling across the screen or animations become possible. These will not necessarily add to the quality of a site, but they will grab the attention of a potential viewer. Java can also control the behaviour of a suitable browser, for example, requesting the loading of other documents, and it can create connections to other machines over the Internet. This browser control could be useful in tandem with a search mechanism, since the applet could automatically download the relevant document.

HTML forms do not provide a particularly good interface, but Java can be used to produce better forms. It can also perform many of the functions of the Common

Gateway Interface, not only decreasing the amount of time spent waiting for a server to respond, but also freeing up the server for other requests. As the Java code for each application is downloaded at run-time, no software distribution is needed and many varied functions can be performed. Also, no software updates are needed for new versions, as it is the code that is changed and not the interpreter(70). This all means that a greater range of services can be provided to users with Java-enabled browsers.

There is huge potential for Java. Possible uses in the future include "sharing applications, exchanging files (such as spreadsheets or charts), and creating greater interactivity across the Internet"(71). Ford and Dixon write that "it is in the area of real time client-server applications that Java is likely to win out: with every browser a potential client, it will be possible to distribute online services to a sufficiently wide

77 Chapter 4 - Tools to improve a Web site market to justify the costs of the custom programming involved"(72). When Java­ enabled browsers have become commonplace, it should be possible to provide infonnation in all fonnats that computers can present, with dynamic and interactive content.

4.5.2 JavaScript

JavaScript is a scripting language that can be embedded in HTML documents.

Originally called 'Mocha', JavaScript was developed by Netscape, who wanted to create a scripting language which could be written within the tags