Computing Science 165-3 Study Guide •

Introduction to Multimedia and the Internet

by Greg Baker

Faculty of Applied Sciences Centre for Distance Education Continuing Studies • c Simon Fraser University, Summer 2004 ° 2 Contents

I Introduction 11

Course Introduction 13 Learning Resources ...... 13 Requirements ...... 17 Evaluation ...... 18 Schedule ...... 21 Getting through CMPT 165 ...... 21 About the Author ...... 23

II The Web and Web Pages 25

1 The World Wide Web 27 1.1 Basics of the Internet ...... 27 1.2 Protocols ...... 31 1.3 How Web Pages Travel ...... 32 1.4 MIME Types ...... 34 1.5 Fetching a Web Page ...... 36 Summary ...... 37

2 Markup and HTML 39 2.1 Describing Documents ...... 39 2.2 HTML Basics ...... 42 2.3 More HTML ...... 47 2.4 in HTML ...... 50 2.5 Images in HTML ...... 52

3 4 CONTENTS

2.6 More HTML ...... 53 2.7 Validating HTML ...... 55 Summary ...... 57

3 Text and Graphics 59 3.1 How Computers Store Data ...... 59 3.2 Text and Character Sets ...... 61 3.3 Graphics and Image Types ...... 64 3.4 Bitmap vs. Vector Images ...... 65 3.5 File Formats ...... 66 3.6 File Formats, Common ...... 71 Summary ...... 73

4 Cascading Style Sheets 75 4.1 CSS ...... 76 4.2 Classes and IDs ...... 78 4.3 CSS Properties ...... 82 4.4 Specifying Colours ...... 83 4.5 CSS example ...... 86 4.6 Logical versus Physical ...... 86 4.7 Why Logical HTML and CSS? ...... 90 Summary ...... 91

5 Design 93 5.1 General Design ...... 93 5.2 Design Principles and HTML/CSS ...... 98 5.3 Web Page Design ...... 100 5.4 Usability ...... 102 5.5 Web Site Design ...... 105 Summary ...... 106 CONTENTS 5

6 XML 109 6.1 What is XML? ...... 109 6.2 Some XML Languages ...... 111 6.3 Styling XML ...... 113 6.4 Validating XML ...... 115 6.5 XHTML and HTML ...... 116 Summary ...... 117

III Internet Programming 119

7 Programming Introduction 121 7.1 What is Programming? ...... 122 7.2 Starting with Python ...... 123 7.3 Example Program ...... 124 7.4 Expressions and Variables ...... 127 7.5 User Input ...... 129 7.6 Types ...... 130 7.7 Conditionals ...... 133 7.8 Python Libraries ...... 136 7.9 Debugging ...... 137 Summary ...... 141

8 Web Programming 143 8.1 Making Web Pages with Python ...... 144 8.2 HTML Forms ...... 147 8.3 CGI ...... 149 8.4 Debugging CGI Scripts ...... 150 Summary ...... 152

9 More Programming 153 9.1 Functions ...... 154 9.2 Local Variables ...... 156 9.3 Iteration ...... 158 9.4 Lists ...... 162 9.5 Handling Errors ...... 165 9.6 Solving Problems 1: Making Change ...... 167 9.7 Solving Problems 2: Displaying HTML Source ...... 171 6 CONTENTS

9.8 Coding Style ...... 176 Summary ...... 177

10 More Web Programming 179 10.1 Security Basics ...... 179 10.2 Cookies ...... 181 10.3 Performance ...... 182 Summary ...... 183

11 Internet Internals 185 11.1 HTTP ...... 185 11.2 HTTP Tricks ...... 188 11.3 DNS ...... 192 11.4 Security and Encryption ...... 193 11.5 URLs ...... 195 Summary ...... 197

IV Appendices 199

A Technical Instructions 201 A.1 Installing Software ...... 202 A.2 SFU Computing Account ...... 204 A.3 CMPT 165 Server Account ...... 204 A.4 Creating Web Pages ...... 205 A.5 Transferring Web Pages ...... 205

B Software 207 B.1 Mozilla ...... 208 B.2 TextPad ...... 208 B.3 The GIMP ...... 208 B.4 Python ...... 213 B.5 Secure File Transfer ...... 213 B.6 Validators ...... 214 List of Figures

1 CMPT 165 course schedule ...... 22

1.1 How information might get from the SFU web server to a home computer ...... 28 1.2 The conversation that a web client and web server might have when you view a web page ...... 30 1.3 The parts of a simple URL ...... 34 1.4 Some example MIME types ...... 35

2.1 An XHTML document ...... 44 2.2 The display of Figure 2.1 in a browser ...... 44 2.3 HTML terms to output “gar¸con” ...... 48 2.4 A sample page from the XHTML reference ...... 49 2.5 A hyperlink on a web page ...... 51 2.6 URLs from http://www.sfu.ca/∼somebody/pics/index.html . . 52 2.7 Some non-valid XHTML ...... 56 2.8 Part of the validation results of Figure 2.7 ...... 57

3.1 Prefixes for storage units ...... 61 3.2 Scaling (a) a vector image and (b) a bitmapped image . . . . 66 3.3 Colour dithering ...... 69 3.4 An image with a low-quality lossy compression ...... 70 3.5 Various types of transparency in images ...... 71

4.1 A cascading style sheet ...... 77 4.2 HTML source of a page that references a style sheet . . . . . 87 4.3 The style sheet, style., referenced by Figure 4.2 ...... 87 4.4 The display of Figure 4.2, with the style sheet ...... 88

7 8 LIST OF FIGURES

4.5 The display of Figure 4.2, without the style sheet ...... 88 4.6 Two possible appearances for contents of or . 89

5.1 A design illustrating proximity ...... 95 5.2 A design illustrating alignment ...... 96 5.3 A design illustrating repetition ...... 97 5.4 A design illustrating contrast ...... 99

6.1 A recipe markup up in XML ...... 110 6.2 A sample SVG file that represents a happy face ...... 112 6.3 The display of Figure 6.2 ...... 112 6.4 Part of the style for Figure 6.1 ...... 114 6.5 Part of the display of Figure 6.1 after the application of CSS from Figure 6.4 ...... 114

7.1 An example Python program ...... 125 7.2 Three sample executions of the program in Figure 7.1 . . . . 126 7.3 A Python program that gets input from the user ...... 129 7.4 A program with type conversion ...... 132 7.5 A program with an if block ...... 134 7.6 Guessing game: testing random ...... 138 7.7 Guessing game: testing user input ...... 139 7.8 Guessing game: checking their guess ...... 139 7.9 Guessing game: checking greater or less ...... 140

8.1 A program that generates a simple web page ...... 144 8.2 A web script that does a little more ...... 145 8.3 Sample output from Figure 8.2 ...... 145 8.4 Sample display of Figure 8.2 in a browser ...... 146 8.5 Figure 8.1 rewritten with triple-quoted strings ...... 146 8.6 The body of an HTML file with a form ...... 147 8.7 The display of Figure 8.6 in a browser ...... 148 8.8 An HTML form for simple user input ...... 149 8.9 A web script that uses the input from Figure 8.8 ...... 150

9.1 A program with a function defined ...... 155 9.2 A program that takes advantage of local variables ...... 157 9.3 Using a for loop ...... 158 9.4 Using a while loop ...... 159 LIST OF FIGURES 9

9.5 A version of the guessing game using a while loop ...... 160 9.6 A version of the guessing game using a for loop ...... 161 9.7 Using a list to store some integers ...... 164 9.8 Catching an exception ...... 165 9.9 Asking until we get correct input ...... 166 9.10 Making change: first attempt ...... 169 9.11 Making change: fixed multiple coins ...... 170 9.12 Making change: details filled in ...... 170 9.13 Displaying source: basic CGI skeleton ...... 172 9.14 Displaying source: using urllib ...... 173 9.15 Displaying source: inserting entities ...... 174 9.16 Sample display of Figure 9.15 in a browser ...... 175 9.17 Displaying source: fixed entity replacement ...... 175

B.1 The main toolbox in the GIMP ...... 209 B.2 The GIMP’s right-click menu with the option for saving the current file selected ...... 210 B.3 The main tool buttons in the GIMP ...... 211 10 LIST OF FIGURES Part I Introduction

11

Course Introduction

Welcome to CMPT 165, Multimedia and the Internet. This course is an introduction to the internet and WWW for non-computer science majors. It isn’t intended to teach you how to use your computer, starting with “Turn it on.” You are expected to have a basic knowledge of how to use your com- puter. You should be comfortable using it for simple tasks such as running programs, finding and opening files, and so forth. You should also be able to use the Internet. Here are some of the goals set out for students in this course. By the end of the course, you should be able to Explain some of the underlying technologies of multimedia and the • Internet. Create well-designed web sites using modern web technologies that can • be viewed in any . Design computer programs to complete a specific task. • Use your programming skills to create dynamically generated web sites. • Begin to combine these skills to develop full-featured web sites. • Keep these goals in mind as you progress through the course.

Learning Resources

Study Guide The Study Guide is intended to guide you through this course. It will help you learn the content and determine what information to focus on in the texts.

13 14

The readings for each section are listed at the beginning of the units. You should also look at the key terms listed at the end of each unit. In some places, there are references to other sections. A reference to “Topic 3.4,” for example, means Topic 4 in Unit 3. In the Study Guide, side-notes are indicated with this symbol. They are meant to replace the asides that usually happen in lec- tures when students ask questions. They aren’t strictly part of the “course.”

Required Texts The only required text for this course is How to Think Like a Computer Scientist: Learning With Python. It will be included in your course package and will arrive with your study guide. This book is a nice introduction to Python programming, which we will use in the second half of the course. The title of the book is a little misleading. The book does not discuss computer science; it only covers computer programming.

Recommended Texts There are several other texts recommended for this course. Whether or not you buy or read these is up to you—you certainly aren’t expected to buy all of them. They are all available in the reserve section of the Library—you can look at them there if you’re not sure if you want to buy them. The recommended texts provide more information than we can put in the Study Guide. If you are particularly interested in specific topics, you might consider buying the corresponding recommended texts. The Internet Book by Douglas E. Comer provides more background on the internals of the Internet, which are explored in Units 1 and 11. Jeffrey Zeldman’s Designing with covers design of web pages with XHTML and CSS which are discussed in Unit 2 and Unit 4. This is an excellent book for those who are really interested in web page design. It is one of the few (at least one of the first) books that discuss web page design with techniques made possible by newer web browsers. The Non-Designer’s Design Book by Robin Williams (no, not that Robin Williams) is recommended for the material in Unit 5, which is also relevant to Assignment 2. Either the first or second edition of this book is acceptable. 15

If you aren’t near campus, you can use the library’s Telebook service. Telebook services will mail you books, journal articles, and other library material required to complete course assignments. For more information about how to use Telebook, see http://www.lib.sfu.ca/kiosk/other/telebook. htm . Also, see the Telebook material in the red folder.

Web Materials The address for the web materials for this course can be found in your Red Folder. To access some parts of the site, you will need to log in to WebCT using your SFU Unix ID. Use the same username and password you use for Webmail and my.sfu.ca . Open University students should check their course package for instructions on how to get a WebCT account. If you feel that you need more information about WebCT, look at the material in your course package. In addition, the SFU Library offers intro- ductory sessions in using WebCT that are available to students in any course that uses WebCT. The course web site has several main sections: Administrative Information • In this section, you will find information about the course, the course supervisor, and your tutor markers. You will also find information on due dates and grading. References • Here, you find information about the various references for this course, including links to the online references that you are expected to use for the course. Tools • The “Tools” section contains links to all of the software tools you will need for the course. It lists software that you can download and install on your computer as well as tools that you will use online. More details on the required software can be found in Appendices A and B Inside “Tools,” you will find a link to “Discussions,” where you can discuss the course topics with other students. The course supervisor and TMs do not generally respond to discussions; they are intended for student-to-student communication. See “Getting Help,” below for more information. 16

Materials • This section contains supplementary material directly related to the course content. You can download all of the code that is used in this guide if you’d like to try it for yourself or modify it. You will also find an archive of the course email list and some sample exams. Links • The links are organized into categories corresponding to units in this guide. You can use them to supplement the material the guide and other readings present. If you have suggestions for additions to this list, feel free to email the course supervisor. Assignments • Here, you will find all of the assignments and exercises for this course.

Online References There are several online references that are as important as the texts. You can find links to them on the course web site section titled “References.” These resources are very important to your success in this course. They aren’t meant to be read from beginning to end like the readings in the text- book. You should use them to get an overall picture of the topic and as references as you do the assignments.

The Red Folder The red folder in your Centre for Distance Education package contains the assignment deadlines and exam dates for the course. It also contains impor- tant information from the Centre for Distance Education, in particular, the Student Handbook, which includes information about how to set up your SFU computing account.

Email Communications The TMs and course supervisor will use email to send announcements and tips during the semester. You should read your SFU email account regularly. 17

You can also contact the TMs and course supervisor by email if you need help. See the “Getting Help” section at the end of the Introduction for details.

CMPT 165 Web Server Throughout the course, you will be creating web pages. You have web space available on a web server set up specifically for this course. Appendix A contains information on working with this server. You must use this server for all web pages in this course. Work on other web servers will not be marked. Open University students will not have accounts automatically set up on this server. They must immediately email the course supervisor, giving their name and email address so their account can be activated. Not doing so will not be accepted as an excuse for late exercises/assignments.

Requirements

Computer Requirements You need to have access to a computer with the minimum requirements noted on the back of the course outline. You will also need a connection to the Internet through a dialup connec- tion (one is provided with your SFU registration) or through another type of connection like a cable modem or a DSL line. A high-speed connection will make your life easier, but it isn’t required.

Software Requirements All of the software that you need for this course can be downloaded free. Links to obtain the course software are on the course web site under “Tools.” Appendix A contains instructions for installing the software. Appendix B contains other instructions you need for assignments.

A web browser: You will be creating web pages in this course. To • see them and the other pages you need to read, you need to have a web browser installed on your computer. We strongly recommend that you 18

use Mozilla or 7. These browsers are the only ones that have near-perfect support for the style sheets we will be using. Netscape 7 and Mozilla are actually the same browser with some minor differences. Mozilla has been released by a group of developers for free use. The same program was modified slightly and released by Netscape. In this course, we recommend Mozilla because is has fewer superfluous features. and earlier versions of Netscape don’t support style sheets as well as the recommended browsers so they are harder to work with. Since we use them a lot, you will find other browsers harder to work with. A text editor: You will be making your web pages and style sheets • with a text editor. For Windows, we recommend TextPad; for a Mac- intosh, BBEdit. You could get by with Notepad or Simpletext, but a better editor will make your life easier. A graphics program: To create graphics for your web pages, you will • use a graphics editor. You should use GIMP for Windows or Graphic- Converter for a Mac. You won’t be able to use Windows Paint. If you already have a program like Photoshop or PhotoPaint, you can use it, but the TMs will not have access to them, so they won’t be able to help with any software problems. Python: The second half of the course will involve programming with • the Python programming language. Python is a good language to work with when you’re learning to program, and it can be downloaded free. Other software: There is some other software that you may need to • transfer files and install certain software. See Appendix A for instruc- tions and descriptions.

Evaluation

Marking You mark in this course will be based on your exercises, assignments, and two exams. The marks will be weighted as follows: 19

Exercises 10% Assignments 20% Midterm exam 20% Final exam 50% Each of the exercises and assignments will be weighted equally. You must attain an overall passing grade on the weighted average of the exams in order to pass the course.

Exercises and Assignments There are four sets of exercises and four assignments in this course. You can find them on the course web site. The due dates and instructions on how to submit them are also there. The exercises consists of short problems. They will make sure you’re on the right track and have the basics down. They shouldn’t take you too long to complete. The assignments are more work. You will have to figure out more on your own and explore the concepts on the course. You will submit both exercises and assignments by email. Sending the web address where you have put your web page(s) and any other information required by the assignment. The exact email address that you should send your work to will be indicated on the exercise set or assignment. You must place your assignments in the web space provided for the course. You can find information on uploading the files in Appendix A. Late exercises and assignments will be penalized 10 percent per day (in- cluding weekends). The lateness will be assessed on the basis of the later of the email sent to the TM and the “last modified” date on the files you’ve put in your web space. So, you shouldn’t change your web pages after the due date, or they will be counted as late. You also shouldn’t delete any of your assignments until the end of the course. Exercises will not be accepted more than three days after the due date since solutions have been posted by then.

Exams There will be two examinations in this course. They are closed-book and you aren’t allowed any notes, calculators, or other aids. The exams have a 20 mixture of short and longer answer questions. The 50-minute midterm exam will be held in week 7 or 8. You will be responsible for the material in Units 1 to 6 in this guide and the readings given for those units. The final exam will be three hours long and will cover all of the course material.

Academic Dishonesty We take academic dishonesty very seriously in this course. Academic dishon- esty includes (but is not limited to) the following: copying another student’s assignment; allowing others to complete work for you; allowing others to use your work; copying part of an assignment from an outside source; cheating in any way on a test or examination. If you are unclear on what academic dishonesty is, please read Policy 10.02. It can be found in the “Policies & Procedures” section in the SFU web site,. Cheating on an assignment will result in a mark of 0 on the assignment and a recommendation for a further mark penalty in the course. At the course supervisor’s option, an F may recommend for the course. Any aca- demic dishonesty will also be recorded in your file at the University. Any academic dishonesty on the midterm or final will result in a recom- mendation that an F be given for the course and possibly a recommendation that the student be suspended or expelled from the University.

Copyright When you create web pages, you must keep in mind other people’s copyright. Whenever someone publishes something, it is illegal for others to copy the material except with the permission of the copyright owner. That means you can’t just copy text or images from any web site. When you are creating your pages (except where the assignment states otherwise), you can use images from other sites that indicate that you are allowed to do so or if you have sought and received permission. Some sites, particularly clip art sites and other graphics collections, state that you’re allowed to use their images for your own sites. If you do use anything from other sites for this course, you must indicate where it came from by providing a link to the original source. Failure to do so is academic dishonesty and will be treated as described above. 21

Schedule

To give you an idea of the pace at which you should be going through the material, please look at the course schedule in Figure 1. Exact assignment due dates are in your Red Folder. The readings listed here aren’t the whole story. Specific chapters and sections are listed in each unit.

Getting through CMPT 165

Self-directed courses require a lot of motivation. You will be mostly on your own as you work your way through this course. For each Unit, the following order of activities is suggested: 1. Read the unit in the Study Guide and do the “Check-Up Questions.” 2. Do the readings listed at the start of the unit. Also have a look at the relevant links for the unit on the web site. 3. Have a look at Appendices A and B to see if they discuss any technical skills that are relevant to the material. 4. If there’s an assignment or exercises, do it. After doing the readings, look at the Summary at the end of the unit. It will give you an idea of what you should have learned. The list of terms is particularly useful. These terms are italicized when they are first used. Also, look at the Learning Outcomes listed at the start of the unit for a more detailed list. You should also pay attention to the course web site. There are useful resources there that can give you help if you need it. If the content of the web site is updated during the course, email will be sent to the course list.

Getting Help There are several ways for you to get help in this course. You can check the discussion board on the course web site. You may find that someone else has asked the same question and that it has been answered. 22

Part Unit Week Topic Resources Submit I 1 Course Study Guide Introduction II 1 1 The World Study Guide, Wide Web Internet Book 2 2,3 Markup and Study Guide, online Exer 1 HTML HTML reference 3 4 Text and Study Guide Assign 1 Graphics 4 5,6 More HTML Study Guide, online Exer 2 and Style CSS reference and Sheets colour chart 5 6,7 Design Study Guide, Part 1 Assign 2 of Design Book 6 7,8 XML Study Guide 7–8 Midterm Exam III 7 9,10 Programming Study Guide, Think Exer 3 Introduction Python 8 11 Web Study Guide Assign 3 Programming Introduction 9 11,12 More Study Guide, Think Exer 4 Programming Python 10 12 More Web Study Guide Programming 11 13 Internet Study Guide, Assign 4 Internals Internet Book 14–15 Final Exam

Figure 1: CMPT 165 course schedule 23

You should also feel free to contact the TMs by email or phone. You can phone them during office hours or email at any time. Information on contacting the TMs will be in your course package. You can also contact the course supervisor (preferably by email) for ad- ministrative problems only. You should feel free to contact the supervisor if you’re having serious problems.

About the Author

I hesitated about writing this section. It seems sort of self-congratulatory— a few paragraphs about who I am aren’t so bad, but they set a dangerous precedent. The next thing you know, I’ll have 8 10 glossy photos of myself and I’ll start writing an autobiography. × On the other hand, I’m reminded that students in an on-campus class would learn something about me over the course of the semester. In an effort to duplicate that experience, I relented. . . . I’m a lecturer at SFU in the School of Computing Science. I started in September of 2000. I finished my M.Sc. in Computing Science at SFU just before that (and I do mean just). My undergraduate degree is in Math and Computer Science from Queen’s University (Kingston, not Belfast). I am neither a professor nor a doctor, which causes problems for students who are accustomed to using these titles. For the record, I prefer “Greg.” When I’m not working, I enjoy food and cooking. Unfortunately, I’m getting to an age where enjoyment of food is going to cause buying new pants more often than I like. Vancouver is a great place to be into food. The mix of cultures and available ingredients makes for very interesting cuisine. I’m always interested in good restaurant suggestions if you have any you would like to make. I’d like to thank Tim Beamish and Rong She, the TAs for the summer 2001 offering of CMPT 165, for helping me prepare this Study Guide. I’d also like to thank the students in the summer 2001 offering who suffered through its draft version. If you have any comments on the course or restaurant suggestions, please feel free to contact me at [email protected]. 24 Part II The Web and Web Pages

25

Unit 1 The World Wide Web

Learning Outcomes Describe the basic structure of the Internet. • Define some basic terms from the Internet and WWW. • Give examples of protocols used on the Internet. • Describe the way a web page gets to your computer. • Understand error and status messages you get when using the Internet. • Learning Activities Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • (optional) Have a look at the relevant chapters in The Internet Book: • Chapters 2 and 10 to 13. Do Exercise 1. •

Topic 1.1 Basics of the Internet

As I mentioned in the Introduction, we will assume that you have some familiarity with computers and the Internet. In particular, we’re going to assume that you have used the Internet and know what the World Wide Web (WWW ) is. You don’t have to be an expert, but you should be comfortable using your computer and navigating the web.

27 28 UNIT 1. THE WORLD WIDE WEB

UBC web home server SFU web computer server

ISP's other SFU servers, other gateway computers gateways, etc. nearby other computers at SFU or elsewhere in Vancouver

Figure 1.1: How information might get from the SFU web server to a home computer

Connecting to the Internet So, what is the Internet anyway? Basically, it is a huge network of con- nected computers. One computer is connected to the next, and they pass information along from one to another. That’s really all there is to it—many different computers, all passing information around so it eventually gets to its destination. When a desktop computer is connected to the Internet, it is probably only connected to one other computer, its gateway. The gateway will be connected to one or more other computers and will pass along whatever information is sent or received. The gateway and other computers that form the Internet’s infrastructure or backbone may be connected to many other machines. Their job is to receive information and pass it along in the right direction. They might be computers that are more-or-less like desktop PCs, or they might be specialized devices called routers designed specifically for this job. For example, Figure 1.1 shows the possible route that a web page might travel to get from SFU’s web server (www.sfu.ca) to a computer connected to a cable modem or ADSL. The actual route from www.sfu.ca to a home computer would have about 8 to 10 steps. Routes to other computers on the Internet might have 20 or even more steps. The various computers on the Internet can be connected to each other in many different ways. When a home computer is connected to the Internet, it is probably connected by one of the following methods: modem: Information is carried to and from your computer over your • phone line, the same way a voice call is transmitted. The modem at 1.1. BASICS OF THE INTERNET 29

either end of the phone call translates the sound of the phone call to and from computer data. ADSL: Information is also transmitted over the phone line but it does • not use a traditional “phone call.” Information is encoded differently than it is in a modem connection, which lets ADSL transmit data faster than a modem. cable modem: With a cable modem, the data to and from your com- • puter is carried over your TV cable. When a computer connects to a network in an office building or at SFU (or if you have a “home network” set up), different connection methods are used. These are typically faster than the “home” connections listed above. ethernet: The connector for Ethernet looks like a fat phone jack. Since • it uses wires, it’s generally used for desktop computers that don’t have to move around. It is several times faster than either ADSL or a cable modem. wi-fi: Wireless networking has recently become affordable and so more • popular. Wi-fi (Wireless Fidelity) is also known as wireless LAN (Wire- less Local Area Network), AirPort, and 802.11. There are different ver- sions (eg. 820.11b and 802.11g) with different data transmission rates. Wi-fi is often used for laptops, which makes it possible to move the ma- chine without having to worry about network cables. Wi-fi is available at SFU and increasingly in other locations. There are also many other connection types used to make connections between buildings and across cities and over long distances across countries.

Clients and Servers You may have noticed that in the discussion above, the home computer, gate- way, and the SFU web server are all described only as computers connected to the Internet. That’s true—all of them are connected to the Internet in fundamentally the same way, only the speed of the connection is different. So, why is www.sfu.ca a web server when your computer isn’t? The only real difference is that www.sfu.ca will answer requests for web pages. The gateway and your home computer won’t. What makes it possible is the web server software installed on the server. This program runs all the 30 UNIT 1. THE WORLD WIDE WEB

1. Please send me the web page

¥¡¥¡¥

¦¡¦¡¦ £¡£¡£

2. Here it is ¤¡¤¡¤

£¡£¡£

¤¡¤¡¤ ¡ ¡ ¡

3. Send me this image ¢¡¢¡¢

¡ ¡ ¡

¢¡¢¡¢

¡ ¡ ¡

¢¡¢¡¢ ¡ ¡ ¡ 4. Here it is ¢¡¢¡¢

Your Computer Web Server

Figure 1.2: The conversation that a web client and web server might have when you view a web page time and answers requests for web pages. If you ran similar software on your computer, people could web pages on it as well. If you did install web server software on your computer, you’d run into some problems. First, the SFU web server has an easy to remember name: www.sfu.ca. When home computers are connected to the Internet, they tend to have names like akjx74wuc23nf.bc.hsia.telus.net or h24-84-78-194.vc. shawcable.net—a lot harder to remember and type in correctly Second, www.sfu.ca has a very fast connection to the Internet, enough to serve up a lot of web pages at the same time. It’s also on all the time, which your computer might not be. When you want to access web pages, you use a web browser. A web browser is one example of client software. A web client (like Mozilla or Internet Explorer) is the software you use to make the request to a web server. The client has to transmit the request to the server, receive the response, and then process the information so you can use it. Client and server software let people use computers to transfer informa- tion over the Internet. Everything that you transfer is done by interacting with client software for that type of connection. For example, when you load a web page with Mozilla (a web client), it will ask the web server for the web page itself and then for any images or other files that it needs to display the page. Figure 1.2 shows the conversation that might happen to load a web page with one image on it. 1.2. PROTOCOLS 31

The term “web server” may be a little confusing here because it’s used to refer to both the computer itself (www.sfu.ca is a computer sitting in a closet somewhere in Strand Hall) and the software that’s running on the computer (www.sfu.ca runs the Apache web server).

Check-Up Question I Find out what kind of connection is used between your computer and your ISP. If you have a home network, does it use ethernet or wi-fi?

Topic 1.2 Protocols

When a client and server talk to each other, they have to agree on how they will exchange the various pieces of information that are required to get the job done. The “language” that the client and server use to exchange this information is called a protocol. For example, when they are exchanging web pages, the client and server need to encode the requests and responses shown in Figure 1.2. They also must be able to indicate errors and other behind-the-scenes messages (like “Page not found” or “Page moved to here”). Web pages are transferred using a protocol called the HyperText Transfer Protocol (HTTP). We will discuss HTTP more in Topic 1.3 and in Unit 11.

Information on the Internet There are many different kinds of information that travel over the Internet. The one you’re probably most familiar with is web traffic—web pages and all of the graphics, sounds, and other files that go with them. But, a lot of other stuff travels around as well—the web is just one of many ways that information can travel across the Internet. Here are some other ways information can be exchanged between com- puters on the Internet that you might be familiar with: Email. Even if you check your email on the web (with SFU’s Webmail • or something similar), the mail itself still has to get from the sender to receiver. If someone at Hotmail sends email to your SFU account, it has to travel from the computers at Hotmail to the mail server at SFU. 32 UNIT 1. THE WORLD WIDE WEB

Instant Messaging. The various instant messaging services each have • their own protocols (which is why they don’t generally work together). These include ICQ, AOL Instant Messenger, Yahoo! Messenger, and MSN Messenger. Peer-to-peer file transfer. These methods of file transfer sidestep the • traditional client-server model and let people transfer files directly from one client to another. (Technically, the ”client” software is performing the duties of a client and a server.) Because of their dubious legality, they tend to come and go, but they have included Napster, Gnutella, Audiogalaxy and Kazaa. FTP. FTP stands for “File Transfer Protocol.” It is an older method • of transferring files but it is still often used to do things like installing web pages onto a web server. network gaming. Games that allow network play generally act as a • client; a server is run by the company that produced the game. The client and server exchange information about the game: moves that are made, changes in the game’s “map,” and other information to make sure the game works properly for all users. Each of the examples above uses a different protocol to exchange informa- tion. That is why you need different client software for each of them—they need clients that speak different languages. There are many other services that you might not think of. There are protocols for file sharing that allow you to use files and printers as if they were on your machine (like Windows File/Print Sharing). Other protocols for things like clock synchronization, and remote access to computers you might never run across.

Check-Up Question I Are there more examples of protocols/client software that you use?

Topic 1.3 How Web Pages Travel

We have covered web servers and web clients in Topic 1.1. The two computers have to talk to each other in order to deliver web pages. 1.3. HOW WEB PAGES TRAVEL 33

As noted in Topic 1.2, the “language” a client and server use to talk to each other is called a protocol, and the protocol used to transfer web pages is called HTTP. You may have noticed that web addresses start with http:// ; this part of the address tells your web browser that you want it to use HTTP to talk to the server. A web server’s job is to communicate with HTTP to any clients that connect to it.

URLs

A URL (Uniform Resource Locator) is the proper name for an Internet ad- dress. URLs are also sometimes called URIs (Uniform Resource Identifiers). Here are some examples: http://www.sfu.ca/ http://www.sfu.ca/∼somebody/page.html http://www.w3.org/Addressing/ ://my.sfu.ca/ ftp://ftp.mozilla.org/pub/mozilla/releases/ mailto:[email protected] (Actually, all of these are “absolute” URLs. Relative URLs will be introduced in Unit 2.) The first three URLs are HTTP URLs—they refer to information that is on the WWW. The others refer to information that you access using different protocols. The fourth one, https://my.sfu.ca/ works almost like a web page, but it uses a “secure” connection to transmit the data. All of the information passed back and forth between the client and the server is encrypted so none of the computers in between can eavesdrop. HTTPS is often used for sensitive information like banking, passwords, and email. The last two items in the list are URLs for information accessed by other protocols. They are an FTP URL (information on an FTP server) and an email URL (a link that will let you send an email to that address). The protocol that should be used to access a particular URL is indicated by its scheme. The scheme is the part of the URL before the colon (:). For web (HTTP) URLs, the scheme is “http” or “https” for secure web traffic. The URL schemes “ftp” and “mailto” indicate FTP and email URLs. There are many other less common URL schemes used by specific applications. 34 UNIT 1. THE WORLD WIDE WEB

http://www.sfu.ca/~somebody/page.html

scheme server path

Figure 1.3: The parts of a simple URL

For HTTP and FTP URLs, the first part after the scheme indicates the server that should be contacted to exchange information. The rest of the URL after the server is the path of the file. We will discuss the other parts of an HTTP URL in Topic 2.4.

Check-Up Question I Find the URL of some web pages that you visit often (in the location bar of Mozilla). Identify the scheme, server, and path of each.

Topic 1.4 MIME Types

Any type of data can be transferred over HTTP, including web pages them- selves (which are written in a language called HTML, as we will see in Unit 2). Graphics on web pages are also sent by HTTP (in formats called GIF, PNG, and JPEG, which will be discussed in Unit 3). All of the other files you get from the web are also sent by HTTP: video, audio, Office documents, and so on. When your web browser receives these files, it has to know what to do with them. It treats graphics data very differently from a text file. The browser also has to know what program to open for files it can’t handle itself, like Acrobat files, Office documents, and MP3 audio files. If you use a Windows computer, you are probably used to looking at the file extension to figure out what type of file you have. For example, MS Word documents are usually named something like essay.doc; the .doc indicates that it is a Word document. Other extensions include .html or .htm for web pages and .pdf for Acrobat files. Unfortunately, the file extension can’t be used on the web to decide the type of the file. For one thing, some operating systems don’t use file exten- sions (e.g. MacOS), so we don’t know for sure they will be available. It’s also 1.4. MIME TYPES 35

MIME type File Contents How it might be handled text/html HTML (web page) display in browser as a web page in browser application/pdf Acrobat file open in Adobe Acrobat application/msword MS Word document open in Word image/jpeg JPEG image display in browser audio/mpeg MP3 audio open with WinAmp video/quicktime Quicktime video open with Windows Media

Figure 1.4: Some example MIME types possible that the browser wouldn’t even know a file name when the data is sent. We will also see other reasons later in the course. The above information applies to files sent as email attachments as well. Since email is older than the web, when HTTP was created, it could use the solutions that were already in use for email. Instead of using file extensions, the type of data sent by email or HTTP is indicated by a MIME type. MIME stands for Multipurpose Internet Mail Extensions. The MIME type indicates what kind of file is being sent. It tells the web browser (or email program) how it should handle the data. MIME types are made up of two parts. The first is the type, which indicates the overall kind of information: text, audio, video, image, and so on. The second is the subtype, which is the specific kind of information. For example, a GIF format image will have the MIME type image/gif— indicating that it is an image (which gives the browser a hint what to do with it, even if it doesn’t know the subtype) and the particular type of image is GIF. Most web browsers know how to display GIF images themselves, so they wouldn’t have to use another application to handle it. There are more examples in Figure 1.4.

Microsoft’s Internet Explorer browser doesn’t handle MIME types correctly. It often ignores them and then tries to guess the type itself (and is sometimes wrong). This is one of the reasons that we suggest you don’t use IE for this course.

When you put a file on a web server, the server usually determines the MIME type based on the file’s extension. This isn’t always the case. Some web servers handle things differently, but for the moment the MIME type 36 UNIT 1. THE WORLD WIDE WEB will come from the file’s extension. However, you should remember that the type and the extension are different things.

Check-Up Questions I You can determine the MIME type of a file with Mozilla in the “Page Info” window (in the “View” menu). Try going to a web page and have a look—you should see text/html. I When you’re viewing an image you will see the image’s MIME type if you right-click and “View Image.” Try it out with a few images.

Topic 1.5 Fetching a Web Page

Let’s look back at Figure 1.2, now that we know a few more details about the process. Suppose that the web page that is being requested is at the URL http: //www.sfu.ca/about/index.html and that one of the graphics on the page is the JPEG image at http://www.sfu.ca/hp/images/sfu.jpg . We can now give a more detailed description of what happens in the conversation indicated by each of the four arrows in Figure 1.2. 1. The web browser contacts the server specified in the URL, www.sfu.ca. It asks for the file with path /about/index.html . 2. The server responds with an “OK” message, indicating that the page has been found and will be sent. It indicates that the MIME type of the file is text/html—it’s an HTML page. Then it sends the contents of the file, so the browser can display it. 3. The browser notices that the web page contains an image with URL http://www.sfu.ca/hp/images/sfu.jpg . It contacts www.sfu.ca and asks for the path /hp/images/sfu.jpg . 4. The server again responds with an “OK” and gives the MIME type image/jpeg, which indicates a JPEG format image. Then it sends the actual contents of the image file. Once the web browser has the HTML for the web page and all the graphics that go with it, it can draw the page on the screen for you to see. 1.5. FETCHING A WEB PAGE 37

Summary

This unit gives you a brief introduction to how the WWW and how the Internet works. By now you should have a better understanding of what goes on behind the scenes when you use the Internet for various tasks. This information will help you later in the course and should be useful anytime you’re using the Internet and things go wrong—you can often fix or work around problems if you understand what’s going on. We will be coming back to many of the same topics in Unit 11. By then, we will be able to go into more technical detail.

Key Terms World Wide Web (WWW) protocol • • Internet HTTP • • gateway URL • • client scheme • • server path • • web server MIME type • • client software subtype • • web browser • 38 UNIT 1. THE WORLD WIDE WEB Unit 2 Markup and HTML

Learning Outcomes Discuss various ways of describing documents. • Compare them for use in various tasks. • Create a web page that includes links and images using HTML. • List some common HTML tags. • Carry out validation on an HTML page. •

Learning Activities Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Review the XHTML Reference pages found in the “Online References” • section of the course web site.

Topic 2.1 Describing Documents

There are two basic ways of editing documents on a computer. The first is “What You See Is What You Get” or WYSIWYG (pronounced wissy-wig). You’re probably already familiar with it even if you don’t know the term. In a WYSIWYG program, what you see on the screen when you are editing the document looks the same as the finished product. Application

39 40 UNIT 2. MARKUP AND HTML programs like MS Word and WordPerfect are WYSIWYG; you view and edit the document on the screen, and it looks exactly as it will when printed. The other way to edit a document is to use markup. With markup, you use a “markup language,” which looks a lot like a programming language. It has special codes to indicate the appearance of the document. The editing is usually done in a text editor. You can’t see the finished form of the document until processed, and then you can look at it with a viewer. Some examples of markup are HTML (for web pages, which we will use later), XML A (used behind-the-scenes in many programs to store data), LTEX (often used in mathematics and computer science), and WordPerfect’s “Reveal Codes” feature. Most people use WYSIWYG editing for most of their work. Some of the most important reasons they do are that it’s easier to learn, and • the user gets immediate feedback. • There are also good reasons for using markup instead: No information about the document is hidden. • There can be more than meets the eye (e.g., bibliography references, • different display for the screen, printers, etc.). It is easy to separate content from appearance. • All you need to edit it is a text editor. • HTML is the markup language used for web pages. There are several web page editors available for HTML that claim to be WYSIWYG, but there is no such thing as a WYSIWYG web editor. HTML doesn’t describe every aspect of the document’s appearance, so it can never be WYSIWYG. Web page editors that claim that they are only give authors a false sense of security.

Physical vs. Logical Markup We will soon see two distinct forms of markup. The first is physical markup (or visual markup). Physical markup directly describes how the document should appear. That is, you specify a particular appearance for part of the document. For example, markup that indicates that text should be bold or in a 16pt font is visual markup. 2.1. DESCRIBING DOCUMENTS 41

The other form of markup is logical markup (or structural markup). Log- ical markup describes the purpose or meaning of the text. For example, we might indicate that some text is the heading for a new chapter or para- graph. When you use logical markup, you should describe the content of the document without worrying about its appearance. To determine the appearance of logical markup, you use a style. A style is a set of rules like “a section heading should be in a bold, 16pt font, with some space above and below it.” Once an appropriate style has been created, you only need to indicate where the section headings are in the document.

Benefits of Logical Markup

Using logical markup with styles has several benefits. The main one is that if you want to change the appearance of the document, you only need to change the style; the changes should then be made throughout the document. In the example, if we wanted to change the way section headings look, we would change the style file to something like “section headings should be a 18pt font, centred, with some space above and below.” If we had used physical markup, we would have to hunt through the entire document and make the change to each section heading. When you use logical markup, it is also possible to have different styles for different situations. For example, a document could have an on-screen style for viewing on a monitor, a “large-type” style for the visually impaired, and another style for use when it is printed. It is also possible to have a “style” for speaking the document out loud, which could be used by a text-to-speech program for the blind. HTML has both logical and physical markup (as we will see). MS Word actually allows logical formatting with it’s underused and underappreciated “Styles” feature. On the web, logical markup can be used by programs that aren’t tra- ditional web browsers. For example, visually impaired people often use a speech browser. Logical markup can be used to tell the browser to use an appropriate tone of voice or pause at appropriate times. Also, some search engines use logical markup to figure out what your pages are actually about. Using logical markup properly can help these pro- grams categorize your page correctly and bring you visitors who are interested in your topic. 42 UNIT 2. MARKUP AND HTML

Check-Up Questions I How do you usually create your documents? Using WYSIWYG or markup? Do you use physical or logical formatting? I Does your word processor have a feature that allows you to see the markup language that underlies your document? I Do you know how to use any logical formatting features of your word processor (like Word’s “Styles”)?

Topic 2.2 HTML Basics

The documents that we create in this course will be done in HTML (Hy- perText Markup Language). It’s really a very descriptive name: HTML is a markup language that can be used to describe hypertext pages (documents with links). In particular, HTML is used to describe web pages. The version of HTML we will be using in this course is XHTML 1.0. XHTML stands for “eXtensible HyperText Markup Language.” For the mo- ment, we will use “HTML” and “XHTML” synonymously. If you have used HTML before, you might not be familiar with XHTML. It is a newer version of HTML and has a few syntax differences, but it is basically the same. We will discuss the dif- ference between XHTML and the older HTML standards more in Topic 6.5.

If you haven’t written web pages in HTML before, you can think of “HTML” and “XHTML” as the same thing, at least until we get to Topic 6.5. The markup of an HTML page is done with tags. HTML tags consist of one or a few letters, surrounded by triangular braces (<>). Some tags are: bold text

a paragraph

a level 1 (the largest) heading HTML tags have both an opening and closing version. For the bold text tag, is the opening tag and is the closing tag. The text that should be modified is put in the tags. So, this HTML: 2.2. HTML BASICS 43

this is bold text.

Will be displayed like this:

this is bold text.

The material between the opening and closing tags is referred to as the tags contents. In the example above, the tag has contents “bold text.”

Some HTML Tags

Here are some common HTML tags:

. . .

a level 1 heading—every page should probably have one of these at the top.

. . .

the second largest heading—sections within the main document.
. . .
the smallest heading.

. . .

a paragraph—these tags should be wrapped around each paragraph. . . . italic text (visual markup). . . . emphasized text—text that should stand out, usually displayed in italics (structural markup). . . . the HTML document—should be wrapped around the whole document. . . . the header—information about the page, only a few tags, like are allowed in the <head>. <title>. . . the page’s title—usually displayed in the browser’s title bar, must be in the . . . . the body of the page—contains the material that should be displayed on the web page itself. You should also have a look at the full list of XHTML tags. There is a link to it in the “References” section of the course web site. We will be using it as our definitive resource for tags and entities. 44 UNIT 2. MARKUP AND HTML

The page’s title

Sample page

This is a paragraph, with some bold text.

Figure 2.1: An XHTML document

Figure 2.2: The display of Figure 2.1 in a browser

A Full XHTML Document Figure 2.1 is a full XHTML document (although we will be making some additions to it in Topic 2.7). The text you see there is typed into a text editor and saved as a .html file. Its display in Mozilla 1.6 under Windows is shown in Figure 2.2. Remember that other browsers might display this same HTML differently.

XHTML documents could also be saved with a . extension, but that can cause problems for browsers that don’t know about XHTML (Internet Explorer, in particular). Stick with .html for this course. 2.2. HTML BASICS 45

This Figure can be found in the “Examples” section of the course web page.

More on Closing Tags In older versions of HTML (before XHTML), it was possible to leave out closing tags in certain circumstances. In XHTML, you can’t omit them: all tags must be closed. There are a few XHTML tags that aren’t allowed to have any contents. For example, the
tag inserts a line break into your page; the next words after
appear on the next line. It doesn’t make any sense to have anything between the
and the closing tag,
. So, the two must always appear right beside each other:

. Since empty tags occur frequently in XHTML, there is a short form for the closing tag that can be used for empty tags. These are exactly equivalent in XHTML:


Finishing the opening tag with the slash indicates that you want to close the tag right away. The online XHTML reference indicates which tags are empty and can use this short form for the closing tag. The reference is described later, in Topic 2.3.

Nesting Tags The closing tags must be arranged to ensure the contents are properly nested. It is illegal to write something like the following: bold, italic text Instead, the tags should look like this: bold, italic text so that the . . . is entirely within the . . . , not partially over- lapping. While we’re on the subject of nesting tags, note that some tags can only occur inside others. For example, a paragraph,

, must be inside the body, 46 UNIT 2. MARKUP AND HTML

. The tag

  • , which stands for “list item,” must go in one of the list structures, for example
      , which indicates an ordered list. These tags are used in this way:
      1. item 1
      2. item 2
      That would be rendered in most browsers as: 1. item 1 2. item 2 Again, the online XHTML reference in the “Online References” section of the course web site indicates which tags can contain which others.

      HTML Pitfalls There are a few more points about HTML that you should be aware of: All tags in XHTML must be in lower case. So, is okay, but • and aren’t. Older versions of HTML allowed uppercase tags, but XHTML doesn’t. The browser ignores spacing in your HTML document. It doesn’t mat- • ter if you press return to jump to the next line. All spacing on the finished web page must be done with markup. Any combination of spaces, returns, and tabs in your HTML will be treated the same as a single space by the browser. HTML is sent over the Internet to the user’s web browser. As a result, • the author doesn’t get to decide what the page looks like—the user’s browser does. Authors shouldn’t assume that the page will look the same for everyone as it does for them. For example, pages might look very different in Mozilla, Internet Explorer, WebTV, and so on. If you use the tags correctly, the page should still get its point across. This is the reason WYSIWYG editors don’t fit well with HTML— HTML is never WYSIWYG. If a browser doesn’t know what to do with a particular tag, it ignores it • entirely. The contents of the tag will be displayed, without any change in their appearance, which can cause unexpected results in different browsers. Again, if you’ve used the HTML tags as they are intended, your page should still be readable. 2.3. MORE HTML 47

      Check-Up Questions I Type in Figure 2.1 using a text editor or download it from the “Examples” section of the course web site. Save it with the file name first.html, and then open it in your web browser. It should look something like Figure 2.2. I Make some changes to your file first.html, and click the “Reload” or “Re- fresh” button in your browser to view the changes. Try some of the other tags mentioned above.

      Topic 2.3 More HTML

      Attributes Some tags can be modified with attributes. To modify the way a tag be- haves, attributes and their associated values are put in the opening tag, for example: . (The word “attribute” here should be pronounced with the stress on the first syllable, not on the second as in the verb that is spelled the same way.) For example, the tag


      is used to put a horizontal rule (line) on a page. By default, the rule goes all the way across the page. Its length can be modified with the width attribute. So,
      will draw a line across half of the screen. Note that
      is another empty tag, so we can put the closing slash in the opening tag. Note also that the value must be enclosed in quotes. Also, a tag can have several attributes, for example: The order of the attributes does not matter. The attributes that each tag can have are described in the online XHTML reference. See the link in the “References” section of the course web site.

      Entities All HTML tags are enclosed in <>. So, if we want a < on a web page, we can’t just type it into the HTML because the browser will assume that it is indicating the start of a tag. To print this symbol or other special characters, we use entities. 48 UNIT 2. MARKUP AND HTML

      garçon

      opening tag closing tag attribute value contents entity

      Figure 2.3: HTML terms to output “gar¸con”

      All entities start with an “&” and end with a “;”—the entity for a less- than sign is <. So, to display “7 < 10” on a web page, we would type 7 < 10 in our HTML document. There are many other entities, for example: > produces a “>” • & produces an “&” • " produces a “"” • á produces an “´a.” • Figure 2.3 shows the parts of an HTML fragment and how they fit to- gether. In the “Online References” section of the course web site, you can find a list of all of the entities and see how they display in your browser.

      Comments It is possible to include comments in HTML. Comments are never displayed by the browser. They are used to place notes for yourself or others in the HTML itself. A comment starts with : Comments can also be used to disable some HTML that you don’t want displayed but don’t want to delete either.

      XHTML Reference There are many more XHTML tags and entities than have been mentioned here, and you should explore them further. You can visit the definitive reference from the course web site from the “References” section. Figure 2.4 shows part of the reference page for the

        tag. 2.3. MORE HTML 49

        Figure 2.4: A sample page from the XHTML reference

        The Syntax line gives the general usage of the tag. Empty tags will be displayed with the short-form closing tag. The Attribute Specifications lines give a list of possible attributes for this tag and their values. The Contents line indicates what you may be put inside the tag. In Figure 2.4, we see that “one or more”

      • tags can go in a
          . The Contained in line gives the tags that they can be placed in. Figure 2.4 indicates that
            can be placed inside of the .
            , . . . tags. The text below these lines describes what the tag is for, how it should be used, and its attributes. As you explore these pages, note that some of the tags and attributes are set in lighter coloured text, like “type” and “compact” in Figure 2.4 (if 50 UNIT 2. MARKUP AND HTML you can see this subtle difference after the printing and copying process). These tags and attributes are deprecated, which means that using them is discouraged because they will be removed from future versions of XHTML. There are better ways to accomplish the same task, often with style sheets, which are discussed in Unit 4 There is also a list of the entities on these pages. Some browsers do not support all of the entities, so you should be somewhat conservative about which ones you use. If you can, look at these pages in a few different web browsers to see which entities they display correctly.

            Check-Up Questions I Modify your first.html so you use some entities. I Have a look at the XHTML reference pages and familiarize yourself with some of the tags that are available. While you’re there, have a look at the entities. I Modify your first.html so you use two or three tags with attributes.

            Topic 2.4 Links in HTML

            The links (or hyperlinks) on web pages are the key to the way the web works. Links are usually displayed in a different colour of text and underlined (see Figure 2.5). In HTML, you use the tag to create a link. The contents (see Fig- ure 2.3) of the tag are the text of the link. The href attribute indicates the URL of the link destination. So, Figure 2.5 would be created with the following HTML: This is what a link usually looks like. In this case, the link would take the user to SFU’s main web page. It is also possible to put some other tags inside a link: This is a slightly more interesting link. Here, the word “slightly” would be emphasized in the link. 2.4. LINKS IN HTML 51

            Figure 2.5: A hyperlink on a web page

            Relative URLs

            The value of the href attribute is a URL, which we discussed in Topic 1.3. All of the URLs that we have seen so far are absolute URLs because they start with a scheme. Absolute URLs indicate everything needed about the location of a document that is needed to fetch it. You can imagine that it would be a lot of work to type the absolute URL for every link on every web page on a web site. It would also be a problem if you wanted to move a web site to another location—you would have to fix every single link’s URL. Both of these problems are addressed by relative URLs. Instead of having to type the full URL every time, with relative URLs let you can just indicate the changes from the current URL. Relative URLs don’t start with a scheme name (http://) and don’t specify a server either. The destination of a link with a relative URL depends on the URL of the page it’s on. Suppose we are currently looking at the page http://www.sfu. ca/∼somebody/pics/index.html . Assume that the examples of relative URLs below are on that page. First, drop the filename from the current URL (everything after the last slash), so we have http://www.sfu.ca/∼somebody/pics/ . Then, the following rules are applied: If the relative URL starts with a slash, drop the rest of the path as • well: http://www.sfu.ca . For every ../ at the start of the relative URL, drop another directory • from the end of the URL. So, if there is one, we’d have: http://www. sfu.ca/∼somebody/ . Put the rest of the relative URL on the end of the current URL. • Figure 2.6 contains examples of URLs that could be links on the page http://www.sfu.ca/∼somebody/pics/index.html and the URL the user would be taken to if they clicked on it. 52 UNIT 2. MARKUP AND HTML

            Link URL Destination img.png http://www.sfu.ca/∼somebody/pics/img.png ../file.html http://www.sfu.ca/∼somebody/file.html /test.html http://www.sfu.ca/test.html ../../test.html http://www.sfu.ca/test.html dir/img.png http://www.sfu.ca/∼somebody/pics/dir/img.png http://www.cs.sfu.ca/ http://www.cs.sfu.ca/ (absolute URL)

            Figure 2.6: URLs from http://www.sfu.ca/∼somebody/pics/index.html

            Whenever possible, you should use relative URLs on your web pages. Then, if you move your page to a different location (say, from your hard drive to your web space), the links will still work. It also takes less typing.

            Check-Up Question I Put some links on your first.html page. Try both an absolute and a relative link.

            Topic 2.5 Images in HTML

            We will discuss creating and editing images in Unit 3. For the moment, we will just look at how to put an image that has already been created on a web page. Images are inserted with the tag. This tag has two required at- tributes: that is, an tag without them is illegal. The first required attribute is src, which is used to indicate the URL where the image can be found. The second is alt, which is used to specify alternate text for the image. The alternate text is used in several situations. It can be displayed if the image cannot be loaded for some reason (network congestion, bad URL, etc.), if the image hasn’t been downloaded yet, or if the browser does not support images. Browsers that don’t support images are rare, but there some still in use. For example, the visually impaired, who cannot use a graphical browser, often use a text browser and a speech synthesizer. 2.6. MORE HTML 53

            The tag is empty, so the closing tag is unnecessary. You insert an image in the following way (for an image like Figure 2.5): When you are creating alt text, you should try to write text that gives users as much of the meaning of the image as possible. Note that is empty, so you can close it with the short-form slash. You can specify the size of the image, in pixels, using the height and width attributes. With this information, the browser can display the page before the images are downloaded, since it knows how much space it needs to leave for them. As a result, your page will be displayed faster, especially for people with a slow connection, and it is a good idea. So, we might do something like this: When an image is inserted in this way, the browser treats it like a charac- ter (admittedly, a funny shaped one) in the current paragraph. If you want an image that “floats” along the left or right margin, you should use style sheets (see Unit 4).

            Check-Up Question I Put an image on your first.html page. You can download an image from any web site for this (right-click or shift-click on the image and select “Save”). Have a look at the discussion of copyright in the Introduction before you start taking images from outside sources.

            Topic 2.6 More HTML

            Tag Types As we mentioned earlier, some tags cannot go inside others. At first, which ones can go inside which others may seem arbitrary. You will understand the reasoning better if you think about the four classes of tags. 54 UNIT 2. MARKUP AND HTML

            Top level tags: The top level tags are the ones that define the overall • document structure. The ones we’ve discussed are , , and . Head tags: The head tags go inside the page’s tag. These • tags give information about the page (rather than things to display on the page itself). We’ve seen and there are a few others, for example, <link> and <meta>. Block level tags: Block level tags are placed inside the <body>. The • contents of these tags take up some vertical space on the page: that is, they are placed below the previous block level group, not beside it. Some block level tags are <p>, <hr>, <h1>, <blockquote>, and <ol>. Inline tags: The inline tags are placed inside the block level tags. They • affect the way part of a line looks; they are placed beside the previous text. Some inline tags are <em>, <tt>, <img>, and <a>.</p><p>Most of the restrictions on tag placement are the result of these distinc- tions: block level tags go inside the body; inline tags go inside block level tags.</p><p>The lang attribute The lang attribute can be applied to most HTML tags. It is used to specify the language of the text in the element. Specifying the language is important for several reasons. Search en- gines can use the language to categorize your pages properly. It also al- lows browsers to render the text properly or even to provide an automatic translation for the user. You can specify the language of almost any tag, but using it with the <html> tag will specify the language for the whole page. You might also want to specify the language of a paragraph (<p>), a quote (<q>), or a long quote (<blockquote>). Some of the language codes you might use are en (English), fr (French), zh (Chinese), and ja (Japanese). So, to indicate that an entire web page is in English, you should use <html lang="en">; to indicate a paragraph in French, use <p lang="fr">. A full list of language codes can be found in the “Other Links” section of the course web site. 2.7. VALIDATING HTML 55</p><p>Versions of XHTML after 1.0 will use the attribute <a href="/tags/XML/" rel="tag">xml</a>:lang to specify the language. You should probably use both to be safe (both are valid in XHTML 1.0): <html lang="en" xml:lang="en"></p><p>Check-Up Questions I Have a look at the XHTML reference and see if you can categorize the tags as head/block/inline. (Note: there are a few tags that don’t really fit into these categories.) I Have a look at the language codes reference, and find the codes for any other languages you know.</p><p>Topic 2.7 Validating HTML</p><p>Web browsers should always do their best to display a page, even if everything isn’t exactly correct. For example, most pages will properly display the following as bold and italics, even though it isn’t correct: <b><i>improperly nested tags</b></i> Browsers want to be able to display as many pages as possible. Because they let some errors pass, many people creating web pages wrongly assume that every web browser will display incorrect HTML that they produce and so they develop bad habits. When you are creating web pages, you should assume that web browsers are very picky about what they display—that will give you the best chance that your pages will work in all browsers. To help solve this problem, several people have developed HTML valida- tors (or just validators). These programs check your page against the formal definition of HTML and report any errors. Putting your web page through a validator will help you find errors that your browser let pass and discourage bad habits. For a validator to work, you must tell it what version of HTML you are using (there have been several). To do so, you insert a document type declaration in your HTML file as the first line, even before the <html>. The document type we will be using is: 56 UNIT 2. MARKUP AND HTML</p><p><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Non-Validating HTML

            Non-Validating HTML

            This is not the way to make a bold paragraph.

            Here are some badly nested tags

            Figure 2.7: Some non-valid XHTML

            (This declaration can be on one line or split across two. You can copy and paste it from one of the examples on the web site—you don’t have to type it yourself or memorize it.) This line indicates that the document is written in XHTML version 1.0 as defined by the World Wide Web Consortium (W3C ). You should add this line to the start of every web page from now on. In addition, for XHTML documents, you have to specify a namespace for the document. To do so, you add an attribute to the tag:

            The namespace is new with XHTML; it wasn’t present in older HTML versions. You don’t need to worry about why it’s there; just make sure you include it. Once the HTML version and namespace are specified on your web page, you can visit one of the HTML validators on the web and give it the URL of your page. Links to validators can be found on the course web site in the “Online Tools” section. The validator will retrieve your web page (either from a web server or from your computer) and check it for errors. See Figure 2.7 2.7. VALIDATING HTML 57

            Figure 2.8: Part of the validation results of Figure 2.7 for an example of some non-valid HTML and Figure 2.8 for a validator’s output. Figure 2.7 can be found in the “Examples” section under “Study Tools” on the course web site. Links have been added so you can click it and go to the validator’s output. You can find some instructions for working with the validators in Topic B.6.

            Check-Up Question I Make a copy of your HTML from exercise set 1 and try to validate it. If necessary, try to fix the errors so that it does validate.

            Summary

            After doing this unit, you should be comfortable creating a valid web pages using a text editor. You should know some of the basic tags and be learning more. 58 UNIT 2. MARKUP AND HTML

            Key Terms WYSIWYG entity • • markup link • • structural/logical markup absolute and relative URLs • • visual/physical markup head-level tag • • style block-level tag • • HTML inline tag • • XHTML doctype • • tag valid HTML • • opening/closing tags validator • • attribute • Unit 3 Text and Graphics

            Learning Outcomes Describe how data is stored in a computer. • Describe how text is stored and what a character set is. • Compare the basic types of computer graphics. • Identify an appropriate format for a given image. • Learning Activities Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Install your graphics program as described in Appendix A. • Do Assignment 1. •

            Topic 3.1 How Computers Store Data

            When computers store information on a hard drive or in memory, the only things that they can store literally are bits. A bit can be either a zero or a one. A computer’s memory is just a string of bits, for example, 001011010100. . . . All data must be represented by a string of ones and zeros when it is stored or transmitted by a computer. Luckily, it is possible to represent a wide variety of information using bits. We will explore some of the more common types of information in this unit.

            59 60 UNIT 3. TEXT AND GRAPHICS

            Numbers

            The first thing that we need to consider is how computers represent num- bers. Once we know how to store numbers in a computer, we can represent everything else as a groups of numbers and then use the method described here to convert it to bits. First, let’s look more closely about the way you count with regular num- bers: 1, 2, 3, 4, 5. . . . Consider the number 165. We know what each one of the digits in that number means. The 1 is one hundred, 6 is six tens, and 5 is five ones: 165 = (1 100) + (6 10) + (5 1). × × × When you go from one place to the next, the value it represents is mul- tiplied by 10. Each digit represents the number of 1s, 10s, 100s, 1000s. . . . The reason the values increase by a factor of 10 is that there are ten possible digits in each place: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. This is called decimal or base 10 arithmetic. (The “dec-” prefix in latin means 10.) We can apply the same logic and get a counting system with bits, binary or base 2 (“bin-” means 2). The rightmost bit will be the number of 1s, the next will be the number of 2s, then 4s, 8s, 16s, and so on. Binary values are often written with a little 2 (a subscript), to indicate that they are base 2 values: 1012. To convert binary values to decimal, we can do the same thing we did above, substituting 2s for the 10s:

            7 6 5 4 101001012 = (1 2 ) + (0 2 ) + (1 2 ) + (0 2 )+ × × × × (0 23) + (1 22) + (0 21) + (1 20) × × × × = 128 + 32 + 4 + 1 = 165 .

            So, 10100101 is the base 2 representation of the number 165. We can repre- sent any whole number this way. Usually, computers look at bits in fixed groups. One common size to look at is an eight bit group, called a byte. A single byte can represent numbers from 0 (000000002) to 255 (111111112). You should be able to convince yourself that if we look at a group of n bits, there are 2n possible values that can be stored in those bits. So, n bits can represent any number from 0 to 2n 1. Other common groupings are of 16 bits (0–65535) and 32 bits (0–4294967295).− 3.2. TEXT AND CHARACTER SETS 61

            Prefix Symbol Factor (no prefix) 20 = 1 kilo- k 210 = 1024 103 mega- M 220 = 1048576≈ 106 giga- G 230 = 1073741824≈ 109 tera- T 240 = 1099511627776≈ 1012 ≈

            Figure 3.1: Prefixes for storage units

            There are some tricks used to represent negative numbers and numbers with fractional parts (like 13.25 or -0.675) using bits. We won’t go into those in this course, but you should know that it’s possible. When measuring how much storage, the number of bits or bytes quickly becomes large. Figure 3.1 show the prefixes that are used for storage units and what they mean. For example, “12 megabytes” is

            12 220 bytes = 12582912 bytes = 12582912 8 bits = 100663296 bits . × × Note that the values in Figure 3.1 are slightly different than the usual meaning of the metric prefixes. One kilometre is exactly 1000 metres, not 1024 metres. When we measure values in computers, the 1024 version of the metric prefixes is usually used. That statement isn’t entirely true. Hard drive makers, for instance, generally use units of 1000 because people would generally prefer a “60 gigabyte” drive to a “55.88 gigabyte” drive (60 1012 = 55.88 230). × × Topic 3.2 Text and Character Sets

            Of course, computers can store text (like the text that makes up and HTML page or the text of an essay). In order to do so, we must be able to convert text to bits. We convert the text to numbers and then convert the numbers to bits, as above. There are different ways to translate between characters and numbers. Such a translation is called a character set. The most common is ASCII (pronounced ask-key, it stands for American Standard Code for Information 62 UNIT 3. TEXT AND GRAPHICS

            Interchange). ASCII indicates, for example, that the letter ‘A’ corresponds to the number 65; ‘a’ translates to 97; ‘;’ to 59, and so on. You can find a table of all of the characters in the ASCII character set linked from the course web site. Every PC uses ASCII, which defines 127 characters. There are variants of the ASCII code for different regions. There are also several extensions to ASCII that define 255 characters, which allow accented characters and a few other things missing from the original specification. This is the highest number of characters that can be defined if we limit ourselves to one byte per character. More and more application support the Unicode character set. The goal of Unicode was to create a character set that can be used for any language. Unicode allows up to 232 characters and includes support for the standard ASCII characters along with characters for writing Chinese, Japanese, Greek, and many other languages and special symbols. The current Unicode stan- dard defines 95221 characters. Unicode support varies depending on the application and operating sys- tem. You can use Unicode characters in XHTML pages. The problem is often in entering them; your keyboard probably doesn’t have a key for ‘ψ’ (low- ercase Greek letter psi). If your operating system has support for entering languages with different character sets, you can enter them that way. If not, you can use HTML entities to create the characters. For example, a times symbol, ‘ ,’ is Unicode character 215. It can be created with the entity × . You× can use character entities for any Unicode character in XHTML.

            Text Files When you save a file in a word processor, you really have no way of knowing how that file is converted to bits to be written to the hard disk. The word processor has to encode all of the formatting, text, figures, and everything else that can be included in a word processor document. As a result, word processors have very complicated sets of rules for storing information in files so that it can be opened later, preserving all of the formatting and everything else. In this course, you have been using a text editor to work with your HTML files. Text editors are very different from word processors. In a text editor, there is no formatting or any other data types, just characters. The reason 3.2. TEXT AND CHARACTER SETS 63 is that when a text editor opens a file, it takes one byte at a time from the file and converts it to a character in the editor. That means that when you’re using a text editor, you are directly editing the characters corresponding to the bytes stored on disk. You can actually open up any kind of file in a text editor. If you open up a MS Word document with a text editor, you will see the characters that correspond to the actual bytes stored by Word when it saved the file. (Many text editors, however, won’t actually let you do this, or at least they will warn you that something isn’t right. They are assuming that opening a Word file in a text editor is a stupid thing to do. They’re probably right.) The term text file generally refers to any file that you can edit with a text editor. This includes HTML files and the CSS and Python we will be writing later in the course. This is the reason you don’t need a special application to edit HTML files. Any text editor can read the characters from the disk and let you change them. What you see in the text editor is exactly what gets transmitted to the web browser when you view the web page.

            Fonts

            Character sets allow the computer to keep track of characters internally by storing them as numbers. In order to display a character on the screen or anywhere else, the computer has to know what that character looks like. There are, of course, many ways to draw a letter. A font (or typeface) is a collection of drawings, one for each character. Some fonts that you might know of are Times, Arial, Helvetica, and Courier. A font must contain an image of each character. Each of these is called a glyph, so a font is actually a collection of glyphs. With large character sets like Unicode, it’s difficult for a single font to represent every possible character. It’s also not very practical—do you really want to take up the space on your hard drive needed for every font to have a full set of glyphs for Linear B and Ugartic? Generally, systems that support Unicode will try their best to find a glyph for the characters that are being used, even if it has to look in another font for it. Modern operating systems generally come with (or you can download) fonts that can fill in some of the common Unicode characters that aren’t in most fonts (like Chinese and Korean characters and mathematical symbols). 64 UNIT 3. TEXT AND GRAPHICS

            Check-Up Question

            I Have a look a the “Code Charts” on the Unicode web site (linked from the “Other Links” section of the course web site). If you happen to speak/write a language that is usually hard to represent in a computer, see if it be written with Unicode.

            Topic 3.3 Graphics and Image Types

            The term computer graphics refers to using a computer to create or manip- ulate any kind of picture, image, or diagram. There are many different ways to create computer graphics, and you should choose a program that fits your needs. We cannot cover them all here, so we will discuss some of the common concepts and terms. There are two basic ways to store an image in a computer: vector graphics and bitmapped graphics. You are probably most familiar with bitmapped graphics (sometimes cal- led raster graphics). Bitmapped graphics are used in paint programs. When you use bitmapped graphics, the image is stored in an array of dots or pixels. Each pixel in the grid is assigned a colour. If there are enough pixels, the image will closely resemble the intended image. The other basic image type is vector graphics. Vector graphics are used in drawing programs. A vector format stores a description of the image, using various shapes like circles, lines, curves, and so on. For example, a vector image could contain the information “a green line from here to here, a red filled circle with centre here and with this radius.” Some of the bitmap graphics programs you might know are Photoshop, Photo-Paint, Paint Shop Pro, GIMP, and Graphic Converter. Since most web graphics are bitmap graphics, these are the programs you might want to use for this course. Some vector programs are Corel Draw, Adobe Illustrator, and Acrobat. Most vector graphics programs have the ability to include bitmaps in vector images, but it doesn’t make them paint programs—the bitmaps are just one type of object that can be inserted into the vector image. 3.4. BITMAP VS. VECTOR IMAGES 65

            Check-Up Questions I What graphics programs do you have on your computer? What have you used in the past? Do those programs use bitmap or vector images? I Open up the bitmap graphics program you’re using for this course (probably the GIMP; you can find instructions in Appendices A and B). Open a new image and draw something.

            Topic 3.4 Bitmap vs. Vector Images

            Bitmap and vector images each have strengths and weaknesses when it comes to storing various kinds of images and to being manipulated in certain ways. Because of the way they store their images, they are also edited differently.

            Bitmap Images are very flexible—can represent any image • are created by scanners, digital cameras, and similar devices • are used very commonly (on the web, etc.) • can be displayed on the screen directly if one image pixel is to be the • same size as one screen pixel take a lot of memory—the colour of each pixel must be stored. Can be • compressed when saved.

            Vector Images can change parts of the image easily since they are stored as separate • shapes can rotate, change size, colour, line width, and so on smoothly (see • Figure 3.2) are limited to shapes the program was designed to handle • usually take less memory and disk space than a bitmap • must be converted to a bitmap for display. • 66 UNIT 3. TEXT AND GRAPHICS

            (a) (b)

            Figure 3.2: Scaling (a) a vector image and (b) a bitmapped image

            Check-Up Questions

            I If you have created any computer graphics in the past, did you use a bitmap or vector program? Looking at these strengths and weaknesses, do you think you made the right choice?

            Topic 3.5 File Formats

            There are many ways to store images. Each is called a file format. Once we have decided to use a bitmap or vector format, the program we’re using must still be told how to store and represent the information we need on the disk. Since there are many choices to make here, many different image formats have been created. Each file format has a different way of converting the image information into a string of bits that can be stored on a disk or transmitted over the Internet. We won’t discuss the details of how each format works. The process is far too complicated for this course. We will just discuss some of the capabilities of common formats. The differences in file formats are the reason you can’t take a GIF file, rename it .jpg , and expect things to work. To 3.5. FILE FORMATS 67 convert a file from one type to another, a program must convert the image data from one type to the other. The program Graphic Converter for the Macintosh can handle 145 differ- ent graphics formats. Some have more strengths than others and are used more often. Some common bitmap formats are GIF (graphics interchange format), JPEG (joint photographic experts group), PNG (portable network graphic), BMP (Windows bitmap), and TIFF (tagged image file format); most bitmap programs can read and write all of these examples. Some common vector formats are SVG (scalable vector graphics), EPS (encapsulated postscript), CMX (Corel meta exchange), PICT (Macintosh Picture), and WMF (Windows metafile). Most vector editing programs can read and write most of these types. Since there are so many formats to choose from, it can be difficult to decide on one. Here are some things to keep in mind: type: does the format store the type of image you want to use? (bitmap • or vector) portability: can others use images in this format? • colour depth: does the format store the number of colours you need? • compression: compression can make a smaller file, at the cost of the • compression/decompression time. transparency: did you need some parts of the image to be clear? • The first two of these considerations should be self-explanatory. The other three need some explanation.

            Colour Depth Image formats can only store so many different colour values, depending on the number of bits assigned to each pixel. The format must store the level of each of the three primary colours (or components) in order to specify the colour (we will explore this aspect more in Unit 4). The bit depth indicates the number of colours that can be used in the image. A 24-bit image can indicate any one of 224 colours for each pixel. Each of the components is stored on a scale to 28 = 256. This is usually enough colours because most people cannot distinguish between two colours that 68 UNIT 3. TEXT AND GRAPHICS differ only by one unit. Thus, full-colour photos look good when stored in 24-bit colour. Similarly, 15-bit colour can have any one of 215 colours, with 25 possible values for each component in each pixel. It’s sometimes also referred to as 16-bit colour. Full colour images still look good in 15-bit colour, but you can usually tell the difference between the same image in 15- and 24-bit. Formats with fewer colours are usually paletted or indexed images. Here, some colours are chosen to be in the image’s palette. The colours in the palette are the only ones that can be used in the image. For each pixel, we only need to store a number that refers to a position in the palette. For example, an 8-bit image has a palette of 28 = 256 colours. An 8-bit paletted image can use up to 256 colours chosen from the 224 possible 24-bit colours. A 1-bit image can have 21 = 2 colours, usually just black and white. Using an image format with more colours gives you more flexibility, but it usually also means a bigger image file as well because more information must be stored for each pixel. It is often desirable to work in 24-bit colour and then convert to a lower bit depth to create a smaller file for storage or transmission. If you try to store an image in a file that can’t hold as many colours as the image has, some decisions must be made. Usually, the graphics program will choose a set of colours that closely matches the ones in your image. For colours that aren’t found in the palette, the program can either pick the closest colour or do dithering. Dithering is the process of creating a pattern of pixels that fools the eye into seeing a colour that is between the two. See Figure 3.3 for an example, using black and white pixels to make a gray square. If you hold the page back, the dithered square on the right should look gray. If everything has gone well in the printing process, it should be about the same colour as the pure gray square on the left. If you look very closely at the pure gray square, you can probably see some fine dithering in it as well; this dithering is done during the printing process since printers and copiers only have black ink to work with.

            Compression

            It is also possible to create a smaller file with compression. Without com- pression, bitmap images can be very large. For example, a 640 480 pixel × 3.5. FILE FORMATS 69

            Figure 3.3: Colour dithering image with 24-bit colour depth would take

            640 480 24 bits = 7372800 bits = 900 kB . × × Images this large would take a long time to transfer over the Internet. This is the reason the uncompressed Windows BMP format is not used on the web. There are many different ways to compress bitmap image data. The various methods fall into two categories: lossless compression and lossy com- pression. Most image compression techniques are lossless. When data is compressed and then uncompressed with a lossless algorithm, it is exactly the same. When an image is compressed with a lossy algorithm and then uncom- pressed, it may be slightly different. This technique is often acceptable for images, particularly if the changes are so slight that they can’t be seen with the naked eye. The JPEG format was created to store photographs at a high quality in very small files. It uses a lossy compression format. JPEG images might change some small details, but the changes can’t usually be seen, except at the lowest quality settings. See Figure 3.4 for an example of a small image (a) that has been compressed and uncompressed with a low-quality lossy format (b). Note that this isn’t a full-colour image; that’s why the JPEG compression has done such a bad job with it—it wasn’t made for that job. The GIF format uses the LZW compression algorithm, and the Unisys corporation has a patent on this method. The PNG format was created so there would be a free lossless format available for use on the web. In this course, we recommend that you use PNGs so that you do not have to worry about misusing this patented format. 70 UNIT 3. TEXT AND GRAPHICS

            (a) (b)

            Figure 3.4: An image with a low-quality lossy compression

            Transparency

            Another feature that is important for some images is transparency. Some image formats can indicate that part of the image is to be transparent so whatever is behind it will show through. This technique is often used on web pages so that images appear to be shapes other than rectangles—the image is still a rectangle, but the background shows through some of it, making it seem otherwise.

            Most image formats don’t support transparency, so all images appear square, as in Figure 3.5(a).

            Some formats, GIF in particular, support simple transparency. Some of the pixels in the image are marked as transparent, so the background is visible through those parts of the image. This format can be used to make an image appear any shape; an example can be seen in Figure 3.5(b).

            Finally, some image formats support more a general method of trans- parency called an alpha channel. The image has the image information and a mask, called the alpha channel, that indicates how transparent each part of the image is.

            Formats and programs that support alpha channel transparency can do partial transparency, as seen in Figure 3.5(c). The PNG format supports full transparency, as do graphics programs like Photoshop and GIMP. Some web browsers that support PNG graphics can do full transparency; some can’t and so they ignore any advanced transparency information in the image. 3.6. FILE FORMATS, COMMON 71

            (a) (b) (c)

            Figure 3.5: Various types of transparency in images

            Topic 3.6 Common File Formats

            Since some image formats are used more commonly than others, we will describe some of them and what they are usually used for. These formats are the ones common on the World Wide Web:

            JPEG: JPEG (Joint Photographic Experts Group, pronounced jay- • peg) is a bitmap format with lossy compression that is intended for either 24-bit colour or 8-bit gray-scale photographs—the kind of thing what would come out of a digital camera or from scanning a photo- graph. It does an excellent job of storing this kind of information but a poor job with other kinds of images. Figure 3.4 is an example of what happens when an image that isn’t a photograph is compressed with JPEG. GIF: The GIF (pronounced jiff ) bitmap format uses a lossless com- • pression algorithm. GIF is well supported on the web and can be used for simple animations and simple transparency. It is, however, limited to 256 colours in an image (8-bit colour). It is also burdened by a patent on its compression algorithm. PNG: The PNG (pronounced ping) bitmap format was created as a • free replacement for GIF. It uses a better and free compression algo- rithm. It can also support up to 24-bit colour and full transparency. The PNG format came along after GIF, so it wasn’t supported in some 72 UNIT 3. TEXT AND GRAPHICS

            older browsers. All recent browsers support PNG, so you should be able to use it safely.

            The following formats, while not common on the web, are often used in other types of digital multimedia: BMP: The Windows BMP format is a bitmap format. It is usually • uncompressed (for speed), but it can use a simple compression algo- rithm. TIFF: TIFF (Tagged Image File Format) is a common bitmap format • among people who do publishing and graphic design. It can be com- pressed in several different ways and and can hold colour depths up to 24-bits. EPS: EPS (Encapsulated Postscript) is a vector format. It is quite • widely supported and can be used in most drawing programs. SVG: SVG (Scalable Vector Graphics) is a vector format that was cre- • ated by the WWW Consortium in the hopes of making vector graphics possible on web pages. SVG is discussed further in Topic 6.2. Most computer graphics programs also have a file format of their own, usually called native formats. For example, Photoshop uses PSD files, the GIMP uses XCF files, Corel Draw uses CDR, and so on. These file formats use either no compression or a lossless compression scheme. They are de- signed so that they can store any of the information that particular program can produce. So, you should be able to save these files and open them back up without losing any information. That is a good reason to use them while you are working on an image. When you’re ready to publish it or pass it along to others, you should probably convert it into one of the more universal formats described above.

            Check-Up Questions

            I If you have created any computer graphics in the past, what file format(s) did you use? Do you think you made the right choice(s)? I What is the native format that your image editing program uses (if any)? 3.6. FILE FORMATS, COMMON 73

            Summary

            This unit isn’t intended to teach you everything about image editing or even how to use a particular program. You should know some of the basic terms and ideas behind computer graphics. With this background knowledge, you can easily learn more about graphics and how to use a graphics program for your assignments.

            Key Terms bit vector graphics • • byte colour depth • • binary paletted or indexed colour • • character set dithering • • ASCII compression • • Unicode lossless compression • • text file lossy compression • • font transparency • • glyph alpha channel • • bitmap graphics • 74 UNIT 3. TEXT AND GRAPHICS Unit 4 Cascading Style Sheets

            Learning Outcomes

            Create and apply style sheets. • Specify colours for HTML and style sheets. • List some basic CSS properties. • Use some some more advanced HTML on web pages. • Design web pages with valid, logical XHTML and create a visual style • with CSS.

            Learning Activities

            Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Browse the CSS reference pages in “Online Tools.” • Have a look at the RGB colour chart. • (optional) Review Designing with Web Standards, Chapters 1 to 3, 5 • to 7, and 9.

            Do Exercise 2. • 75 76 UNIT 4. CASCADING STYLE SHEETS

            Topic 4.1 CSS

            In Topic 2.1, we discussed logical markup and how its appearance is deter- mined with styles. The styles used in HTML are called cascading style sheets or CSS. Style sheets were added to HTML after it was initially created. Before that, you had to accept the default style of the web browser. CSS is now quite well supported by the major browsers (at least CSS version 1 is). Netscape 6 and later, and Mozilla have near-perfect support for CSS features, which is why they are recommended for this course. has good support for CSS1. Each browser gives a default style to each tag, which it uses unless you override it. That’s why you didn’t have to create a style sheet in order to view HTML you created. A CSS file us usually placed in a separate file from your HTML document so that many HTML pages can reference it. That way, if you want to change the appearance of an entire site, you only need to change the style sheet, not each individual file. A style can be applied to any tag, like

            ,

            , and so forth. Apply- ing style rules to parts of your HTML page will allow you to change the appearance of your structural markup.

            Writing Style Sheets A CSS is a collection of rules that make changes to the browser’s default style. There are three parts of each rule: a selector, some properties, and values. Those parts are put together like this: selector { property: value; property: value; } There can be any number of property: value; parts in each rule. As in HTML, spacing in style sheets doesn’t matter—the lines above could have been written on a single line and the effect would be the same. If you’re putting your style sheet into a separate file, you should write only your rules in a text editor and save them, ending with “.css”. This 4.1. CSS 77

            h1 { text-align: center; } code { font-family: sans-serif; font-weight: bold; }

            Figure 4.1: A cascading style sheet extension tells the web server that it should be sent with the right MIME type, text/css. The selector specifies what the rule will be modifying. The selector can be the name of a tag, like h1 or hr. These selectors will make the rule change the appearance of all

            or
            tags. The property indicates what is to be changed. For example, color is used to change the colour of whatever is selected, and font-size is used to change the size of the text. Finally, the value indicates what the property should be changed to. Each property has a different set of possible values. For color, we could use the value red and for font-size we could use large. A full listing of the CSS properties and their values can be found on the CSS reference pages, which are linked from the course web site in the “Online References” section. Figure 4.1 shows a sample style sheet. This CSS will cause all

            lines to be centred and all to be displayed in a bold sans-serif font.

            Applying a style sheet There are several ways of applying a style sheet to a web page. If the CSS is in a separate file, which we suggest for this course, you can add this HTML inside the of your page: This line of code assumes that the style sheet is called style.css and is in the same directory as the HTML file. The tag is used to reference a file that is related to this HTML page. The rel attribute gives the type of relation; in this case, the relation is “a style sheet for this page.” The href attribute is the URL where the file can be found; it can be a relative link 78 UNIT 4. CASCADING STYLE SHEETS

            (as in the example) or an absolute link. Finally, the type attribute gives the MIME type of file since not all web servers send the right one for CSS files. It’s also possible to indicate an “alternate” stylesheet that the user could select if they don’t like your default: In Mozilla, alternate stylesheets can be selected from “Use Style” in the “View” menu. CSS rules can also be inserted into an HTML document with the