Computing Science 165-3 Study Guide •
Introduction to Multimedia and the Internet
by Greg Baker
Faculty of Applied Sciences Centre for Distance Education Continuing Studies • c Simon Fraser University, Summer 2004 ° 2 Contents
I Introduction 11
Course Introduction 13 Learning Resources ...... 13 Requirements ...... 17 Evaluation ...... 18 Schedule ...... 21 Getting through CMPT 165 ...... 21 About the Author ...... 23
II The Web and Web Pages 25
1 The World Wide Web 27 1.1 Basics of the Internet ...... 27 1.2 Protocols ...... 31 1.3 How Web Pages Travel ...... 32 1.4 MIME Types ...... 34 1.5 Fetching a Web Page ...... 36 Summary ...... 37
2 Markup and HTML 39 2.1 Describing Documents ...... 39 2.2 HTML Basics ...... 42 2.3 More HTML ...... 47 2.4 Links in HTML ...... 50 2.5 Images in HTML ...... 52
3 4 CONTENTS
2.6 More HTML ...... 53 2.7 Validating HTML ...... 55 Summary ...... 57
3 Text and Graphics 59 3.1 How Computers Store Data ...... 59 3.2 Text and Character Sets ...... 61 3.3 Graphics and Image Types ...... 64 3.4 Bitmap vs. Vector Images ...... 65 3.5 File Formats ...... 66 3.6 File Formats, Common ...... 71 Summary ...... 73
4 Cascading Style Sheets 75 4.1 CSS ...... 76 4.2 Classes and IDs ...... 78 4.3 CSS Properties ...... 82 4.4 Specifying Colours ...... 83 4.5 CSS example ...... 86 4.6 Logical versus Physical ...... 86 4.7 Why Logical HTML and CSS? ...... 90 Summary ...... 91
5 Design 93 5.1 General Design ...... 93 5.2 Design Principles and HTML/CSS ...... 98 5.3 Web Page Design ...... 100 5.4 Usability ...... 102 5.5 Web Site Design ...... 105 Summary ...... 106 CONTENTS 5
6 XML 109 6.1 What is XML? ...... 109 6.2 Some XML Languages ...... 111 6.3 Styling XML ...... 113 6.4 Validating XML ...... 115 6.5 XHTML and HTML ...... 116 Summary ...... 117
III Internet Programming 119
7 Programming Introduction 121 7.1 What is Programming? ...... 122 7.2 Starting with Python ...... 123 7.3 Example Program ...... 124 7.4 Expressions and Variables ...... 127 7.5 User Input ...... 129 7.6 Types ...... 130 7.7 Conditionals ...... 133 7.8 Python Libraries ...... 136 7.9 Debugging ...... 137 Summary ...... 141
8 Web Programming 143 8.1 Making Web Pages with Python ...... 144 8.2 HTML Forms ...... 147 8.3 CGI ...... 149 8.4 Debugging CGI Scripts ...... 150 Summary ...... 152
9 More Programming 153 9.1 Functions ...... 154 9.2 Local Variables ...... 156 9.3 Iteration ...... 158 9.4 Lists ...... 162 9.5 Handling Errors ...... 165 9.6 Solving Problems 1: Making Change ...... 167 9.7 Solving Problems 2: Displaying HTML Source ...... 171 6 CONTENTS
9.8 Coding Style ...... 176 Summary ...... 177
10 More Web Programming 179 10.1 Security Basics ...... 179 10.2 Cookies ...... 181 10.3 Performance ...... 182 Summary ...... 183
11 Internet Internals 185 11.1 HTTP ...... 185 11.2 HTTP Tricks ...... 188 11.3 DNS ...... 192 11.4 Security and Encryption ...... 193 11.5 URLs ...... 195 Summary ...... 197
IV Appendices 199
A Technical Instructions 201 A.1 Installing Software ...... 202 A.2 SFU Computing Account ...... 204 A.3 CMPT 165 Server Account ...... 204 A.4 Creating Web Pages ...... 205 A.5 Transferring Web Pages ...... 205
B Software 207 B.1 Mozilla ...... 208 B.2 TextPad ...... 208 B.3 The GIMP ...... 208 B.4 Python ...... 213 B.5 Secure File Transfer ...... 213 B.6 Validators ...... 214 List of Figures
1 CMPT 165 course schedule ...... 22
1.1 How information might get from the SFU web server to a home computer ...... 28 1.2 The conversation that a web client and web server might have when you view a web page ...... 30 1.3 The parts of a simple URL ...... 34 1.4 Some example MIME types ...... 35
2.1 An XHTML document ...... 44 2.2 The display of Figure 2.1 in a browser ...... 44 2.3 HTML terms to output “gar¸con” ...... 48 2.4 A sample page from the XHTML reference ...... 49 2.5 A hyperlink on a web page ...... 51 2.6 URLs from http://www.sfu.ca/∼somebody/pics/index.html . . 52 2.7 Some non-valid XHTML ...... 56 2.8 Part of the validation results of Figure 2.7 ...... 57
3.1 Prefixes for storage units ...... 61 3.2 Scaling (a) a vector image and (b) a bitmapped image . . . . 66 3.3 Colour dithering ...... 69 3.4 An image with a low-quality lossy compression ...... 70 3.5 Various types of transparency in images ...... 71
4.1 A cascading style sheet ...... 77 4.2 HTML source of a page that references a style sheet . . . . . 87 4.3 The style sheet, style.css, referenced by Figure 4.2 ...... 87 4.4 The display of Figure 4.2, with the style sheet ...... 88
7 8 LIST OF FIGURES
4.5 The display of Figure 4.2, without the style sheet ...... 88 4.6 Two possible appearances for contents of or . 89
5.1 A design illustrating proximity ...... 95 5.2 A design illustrating alignment ...... 96 5.3 A design illustrating repetition ...... 97 5.4 A design illustrating contrast ...... 99
6.1 A recipe markup up in XML ...... 110 6.2 A sample SVG file that represents a happy face ...... 112 6.3 The display of Figure 6.2 ...... 112 6.4 Part of the style for Figure 6.1 ...... 114 6.5 Part of the display of Figure 6.1 after the application of CSS from Figure 6.4 ...... 114
7.1 An example Python program ...... 125 7.2 Three sample executions of the program in Figure 7.1 . . . . 126 7.3 A Python program that gets input from the user ...... 129 7.4 A program with type conversion ...... 132 7.5 A program with an if block ...... 134 7.6 Guessing game: testing random ...... 138 7.7 Guessing game: testing user input ...... 139 7.8 Guessing game: checking their guess ...... 139 7.9 Guessing game: checking greater or less ...... 140
8.1 A program that generates a simple web page ...... 144 8.2 A web script that does a little more ...... 145 8.3 Sample output from Figure 8.2 ...... 145 8.4 Sample display of Figure 8.2 in a browser ...... 146 8.5 Figure 8.1 rewritten with triple-quoted strings ...... 146 8.6 The body of an HTML file with a form ...... 147 8.7 The display of Figure 8.6 in a browser ...... 148 8.8 An HTML form for simple user input ...... 149 8.9 A web script that uses the input from Figure 8.8 ...... 150
9.1 A program with a function defined ...... 155 9.2 A program that takes advantage of local variables ...... 157 9.3 Using a for loop ...... 158 9.4 Using a while loop ...... 159 LIST OF FIGURES 9
9.5 A version of the guessing game using a while loop ...... 160 9.6 A version of the guessing game using a for loop ...... 161 9.7 Using a list to store some integers ...... 164 9.8 Catching an exception ...... 165 9.9 Asking until we get correct input ...... 166 9.10 Making change: first attempt ...... 169 9.11 Making change: fixed multiple coins ...... 170 9.12 Making change: details filled in ...... 170 9.13 Displaying source: basic CGI skeleton ...... 172 9.14 Displaying source: using urllib ...... 173 9.15 Displaying source: inserting entities ...... 174 9.16 Sample display of Figure 9.15 in a browser ...... 175 9.17 Displaying source: fixed entity replacement ...... 175
B.1 The main toolbox in the GIMP ...... 209 B.2 The GIMP’s right-click menu with the option for saving the current file selected ...... 210 B.3 The main tool buttons in the GIMP ...... 211 10 LIST OF FIGURES Part I Introduction
11
Course Introduction
Welcome to CMPT 165, Multimedia and the Internet. This course is an introduction to the internet and WWW for non-computer science majors. It isn’t intended to teach you how to use your computer, starting with “Turn it on.” You are expected to have a basic knowledge of how to use your com- puter. You should be comfortable using it for simple tasks such as running programs, finding and opening files, and so forth. You should also be able to use the Internet. Here are some of the goals set out for students in this course. By the end of the course, you should be able to Explain some of the underlying technologies of multimedia and the • Internet. Create well-designed web sites using modern web technologies that can • be viewed in any web browser. Design computer programs to complete a specific task. • Use your programming skills to create dynamically generated web sites. • Begin to combine these skills to develop full-featured web sites. • Keep these goals in mind as you progress through the course.
Learning Resources
Study Guide The Study Guide is intended to guide you through this course. It will help you learn the content and determine what information to focus on in the texts.
13 14
The readings for each section are listed at the beginning of the units. You should also look at the key terms listed at the end of each unit. In some places, there are references to other sections. A reference to “Topic 3.4,” for example, means Topic 4 in Unit 3. In the Study Guide, side-notes are indicated with this symbol. They are meant to replace the asides that usually happen in lec- tures when students ask questions. They aren’t strictly part of the “course.”
Required Texts The only required text for this course is How to Think Like a Computer Scientist: Learning With Python. It will be included in your course package and will arrive with your study guide. This book is a nice introduction to Python programming, which we will use in the second half of the course. The title of the book is a little misleading. The book does not discuss computer science; it only covers computer programming.
Recommended Texts There are several other texts recommended for this course. Whether or not you buy or read these is up to you—you certainly aren’t expected to buy all of them. They are all available in the reserve section of the Library—you can look at them there if you’re not sure if you want to buy them. The recommended texts provide more information than we can put in the Study Guide. If you are particularly interested in specific topics, you might consider buying the corresponding recommended texts. The Internet Book by Douglas E. Comer provides more background on the internals of the Internet, which are explored in Units 1 and 11. Jeffrey Zeldman’s Designing with Web Standards covers design of web pages with XHTML and CSS which are discussed in Unit 2 and Unit 4. This is an excellent book for those who are really interested in web page design. It is one of the few (at least one of the first) books that discuss web page design with techniques made possible by newer web browsers. The Non-Designer’s Design Book by Robin Williams (no, not that Robin Williams) is recommended for the material in Unit 5, which is also relevant to Assignment 2. Either the first or second edition of this book is acceptable. 15
If you aren’t near campus, you can use the library’s Telebook service. Telebook services will mail you books, journal articles, and other library material required to complete course assignments. For more information about how to use Telebook, see http://www.lib.sfu.ca/kiosk/other/telebook. htm . Also, see the Telebook material in the red folder.
Web Materials The address for the web materials for this course can be found in your Red Folder. To access some parts of the site, you will need to log in to WebCT using your SFU Unix ID. Use the same username and password you use for Webmail and my.sfu.ca . Open University students should check their course package for instructions on how to get a WebCT account. If you feel that you need more information about WebCT, look at the material in your course package. In addition, the SFU Library offers intro- ductory sessions in using WebCT that are available to students in any course that uses WebCT. The course web site has several main sections: Administrative Information • In this section, you will find information about the course, the course supervisor, and your tutor markers. You will also find information on due dates and grading. References • Here, you find information about the various references for this course, including links to the online references that you are expected to use for the course. Tools • The “Tools” section contains links to all of the software tools you will need for the course. It lists software that you can download and install on your computer as well as tools that you will use online. More details on the required software can be found in Appendices A and B Inside “Tools,” you will find a link to “Discussions,” where you can discuss the course topics with other students. The course supervisor and TMs do not generally respond to discussions; they are intended for student-to-student communication. See “Getting Help,” below for more information. 16
Materials • This section contains supplementary material directly related to the course content. You can download all of the code that is used in this guide if you’d like to try it for yourself or modify it. You will also find an archive of the course email list and some sample exams. Links • The links are organized into categories corresponding to units in this guide. You can use them to supplement the material the guide and other readings present. If you have suggestions for additions to this list, feel free to email the course supervisor. Assignments • Here, you will find all of the assignments and exercises for this course.
Online References There are several online references that are as important as the texts. You can find links to them on the course web site section titled “References.” These resources are very important to your success in this course. They aren’t meant to be read from beginning to end like the readings in the text- book. You should use them to get an overall picture of the topic and as references as you do the assignments.
The Red Folder The red folder in your Centre for Distance Education package contains the assignment deadlines and exam dates for the course. It also contains impor- tant information from the Centre for Distance Education, in particular, the Student Handbook, which includes information about how to set up your SFU computing account.
Email Communications The TMs and course supervisor will use email to send announcements and tips during the semester. You should read your SFU email account regularly. 17
You can also contact the TMs and course supervisor by email if you need help. See the “Getting Help” section at the end of the Introduction for details.
CMPT 165 Web Server Throughout the course, you will be creating web pages. You have web space available on a web server set up specifically for this course. Appendix A contains information on working with this server. You must use this server for all web pages in this course. Work on other web servers will not be marked. Open University students will not have accounts automatically set up on this server. They must immediately email the course supervisor, giving their name and email address so their account can be activated. Not doing so will not be accepted as an excuse for late exercises/assignments.
Requirements
Computer Requirements You need to have access to a computer with the minimum requirements noted on the back of the course outline. You will also need a connection to the Internet through a dialup connec- tion (one is provided with your SFU registration) or through another type of connection like a cable modem or a DSL line. A high-speed connection will make your life easier, but it isn’t required.
Software Requirements All of the software that you need for this course can be downloaded free. Links to obtain the course software are on the course web site under “Tools.” Appendix A contains instructions for installing the software. Appendix B contains other instructions you need for assignments.
A web browser: You will be creating web pages in this course. To • see them and the other pages you need to read, you need to have a web browser installed on your computer. We strongly recommend that you 18
use Mozilla or Netscape 7. These browsers are the only ones that have near-perfect support for the style sheets we will be using. Netscape 7 and Mozilla are actually the same browser with some minor differences. Mozilla has been released by a group of developers for free use. The same program was modified slightly and released by Netscape. In this course, we recommend Mozilla because is has fewer superfluous features. Internet Explorer and earlier versions of Netscape don’t support style sheets as well as the recommended browsers so they are harder to work with. Since we use them a lot, you will find other browsers harder to work with. A text editor: You will be making your web pages and style sheets • with a text editor. For Windows, we recommend TextPad; for a Mac- intosh, BBEdit. You could get by with Notepad or Simpletext, but a better editor will make your life easier. A graphics program: To create graphics for your web pages, you will • use a graphics editor. You should use GIMP for Windows or Graphic- Converter for a Mac. You won’t be able to use Windows Paint. If you already have a program like Photoshop or PhotoPaint, you can use it, but the TMs will not have access to them, so they won’t be able to help with any software problems. Python: The second half of the course will involve programming with • the Python programming language. Python is a good language to work with when you’re learning to program, and it can be downloaded free. Other software: There is some other software that you may need to • transfer files and install certain software. See Appendix A for instruc- tions and descriptions.
Evaluation
Marking You mark in this course will be based on your exercises, assignments, and two exams. The marks will be weighted as follows: 19
Exercises 10% Assignments 20% Midterm exam 20% Final exam 50% Each of the exercises and assignments will be weighted equally. You must attain an overall passing grade on the weighted average of the exams in order to pass the course.
Exercises and Assignments There are four sets of exercises and four assignments in this course. You can find them on the course web site. The due dates and instructions on how to submit them are also there. The exercises consists of short problems. They will make sure you’re on the right track and have the basics down. They shouldn’t take you too long to complete. The assignments are more work. You will have to figure out more on your own and explore the concepts on the course. You will submit both exercises and assignments by email. Sending the web address where you have put your web page(s) and any other information required by the assignment. The exact email address that you should send your work to will be indicated on the exercise set or assignment. You must place your assignments in the web space provided for the course. You can find information on uploading the files in Appendix A. Late exercises and assignments will be penalized 10 percent per day (in- cluding weekends). The lateness will be assessed on the basis of the later of the email sent to the TM and the “last modified” date on the files you’ve put in your web space. So, you shouldn’t change your web pages after the due date, or they will be counted as late. You also shouldn’t delete any of your assignments until the end of the course. Exercises will not be accepted more than three days after the due date since solutions have been posted by then.
Exams There will be two examinations in this course. They are closed-book and you aren’t allowed any notes, calculators, or other aids. The exams have a 20 mixture of short and longer answer questions. The 50-minute midterm exam will be held in week 7 or 8. You will be responsible for the material in Units 1 to 6 in this guide and the readings given for those units. The final exam will be three hours long and will cover all of the course material.
Academic Dishonesty We take academic dishonesty very seriously in this course. Academic dishon- esty includes (but is not limited to) the following: copying another student’s assignment; allowing others to complete work for you; allowing others to use your work; copying part of an assignment from an outside source; cheating in any way on a test or examination. If you are unclear on what academic dishonesty is, please read Policy 10.02. It can be found in the “Policies & Procedures” section in the SFU web site,. Cheating on an assignment will result in a mark of 0 on the assignment and a recommendation for a further mark penalty in the course. At the course supervisor’s option, an F may recommend for the course. Any aca- demic dishonesty will also be recorded in your file at the University. Any academic dishonesty on the midterm or final will result in a recom- mendation that an F be given for the course and possibly a recommendation that the student be suspended or expelled from the University.
Copyright When you create web pages, you must keep in mind other people’s copyright. Whenever someone publishes something, it is illegal for others to copy the material except with the permission of the copyright owner. That means you can’t just copy text or images from any web site. When you are creating your pages (except where the assignment states otherwise), you can use images from other sites that indicate that you are allowed to do so or if you have sought and received permission. Some sites, particularly clip art sites and other graphics collections, state that you’re allowed to use their images for your own sites. If you do use anything from other sites for this course, you must indicate where it came from by providing a link to the original source. Failure to do so is academic dishonesty and will be treated as described above. 21
Schedule
To give you an idea of the pace at which you should be going through the material, please look at the course schedule in Figure 1. Exact assignment due dates are in your Red Folder. The readings listed here aren’t the whole story. Specific chapters and sections are listed in each unit.
Getting through CMPT 165
Self-directed courses require a lot of motivation. You will be mostly on your own as you work your way through this course. For each Unit, the following order of activities is suggested: 1. Read the unit in the Study Guide and do the “Check-Up Questions.” 2. Do the readings listed at the start of the unit. Also have a look at the relevant links for the unit on the web site. 3. Have a look at Appendices A and B to see if they discuss any technical skills that are relevant to the material. 4. If there’s an assignment or exercises, do it. After doing the readings, look at the Summary at the end of the unit. It will give you an idea of what you should have learned. The list of terms is particularly useful. These terms are italicized when they are first used. Also, look at the Learning Outcomes listed at the start of the unit for a more detailed list. You should also pay attention to the course web site. There are useful resources there that can give you help if you need it. If the content of the web site is updated during the course, email will be sent to the course list.
Getting Help There are several ways for you to get help in this course. You can check the discussion board on the course web site. You may find that someone else has asked the same question and that it has been answered. 22
Part Unit Week Topic Resources Submit I 1 Course Study Guide Introduction II 1 1 The World Study Guide, Wide Web Internet Book 2 2,3 Markup and Study Guide, online Exer 1 HTML HTML reference 3 4 Text and Study Guide Assign 1 Graphics 4 5,6 More HTML Study Guide, online Exer 2 and Style CSS reference and Sheets colour chart 5 6,7 Design Study Guide, Part 1 Assign 2 of Design Book 6 7,8 XML Study Guide 7–8 Midterm Exam III 7 9,10 Programming Study Guide, Think Exer 3 Introduction Python 8 11 Web Study Guide Assign 3 Programming Introduction 9 11,12 More Study Guide, Think Exer 4 Programming Python 10 12 More Web Study Guide Programming 11 13 Internet Study Guide, Assign 4 Internals Internet Book 14–15 Final Exam
Figure 1: CMPT 165 course schedule 23
You should also feel free to contact the TMs by email or phone. You can phone them during office hours or email at any time. Information on contacting the TMs will be in your course package. You can also contact the course supervisor (preferably by email) for ad- ministrative problems only. You should feel free to contact the supervisor if you’re having serious problems.
About the Author
I hesitated about writing this section. It seems sort of self-congratulatory— a few paragraphs about who I am aren’t so bad, but they set a dangerous precedent. The next thing you know, I’ll have 8 10 glossy photos of myself and I’ll start writing an autobiography. × On the other hand, I’m reminded that students in an on-campus class would learn something about me over the course of the semester. In an effort to duplicate that experience, I relented. . . . I’m a lecturer at SFU in the School of Computing Science. I started in September of 2000. I finished my M.Sc. in Computing Science at SFU just before that (and I do mean just). My undergraduate degree is in Math and Computer Science from Queen’s University (Kingston, not Belfast). I am neither a professor nor a doctor, which causes problems for students who are accustomed to using these titles. For the record, I prefer “Greg.” When I’m not working, I enjoy food and cooking. Unfortunately, I’m getting to an age where enjoyment of food is going to cause buying new pants more often than I like. Vancouver is a great place to be into food. The mix of cultures and available ingredients makes for very interesting cuisine. I’m always interested in good restaurant suggestions if you have any you would like to make. I’d like to thank Tim Beamish and Rong She, the TAs for the summer 2001 offering of CMPT 165, for helping me prepare this Study Guide. I’d also like to thank the students in the summer 2001 offering who suffered through its draft version. If you have any comments on the course or restaurant suggestions, please feel free to contact me at [email protected]. 24 Part II The Web and Web Pages
25
Unit 1 The World Wide Web
Learning Outcomes Describe the basic structure of the Internet. • Define some basic terms from the Internet and WWW. • Give examples of protocols used on the Internet. • Describe the way a web page gets to your computer. • Understand error and status messages you get when using the Internet. • Learning Activities Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • (optional) Have a look at the relevant chapters in The Internet Book: • Chapters 2 and 10 to 13. Do Exercise 1. •
Topic 1.1 Basics of the Internet
As I mentioned in the Introduction, we will assume that you have some familiarity with computers and the Internet. In particular, we’re going to assume that you have used the Internet and know what the World Wide Web (WWW ) is. You don’t have to be an expert, but you should be comfortable using your computer and navigating the web.
27 28 UNIT 1. THE WORLD WIDE WEB
UBC web home server SFU web computer server
ISP's other SFU servers, other gateway computers gateways, etc. nearby other computers at SFU or elsewhere in Vancouver
Figure 1.1: How information might get from the SFU web server to a home computer
Connecting to the Internet So, what is the Internet anyway? Basically, it is a huge network of con- nected computers. One computer is connected to the next, and they pass information along from one to another. That’s really all there is to it—many different computers, all passing information around so it eventually gets to its destination. When a desktop computer is connected to the Internet, it is probably only connected to one other computer, its gateway. The gateway will be connected to one or more other computers and will pass along whatever information is sent or received. The gateway and other computers that form the Internet’s infrastructure or backbone may be connected to many other machines. Their job is to receive information and pass it along in the right direction. They might be computers that are more-or-less like desktop PCs, or they might be specialized devices called routers designed specifically for this job. For example, Figure 1.1 shows the possible route that a web page might travel to get from SFU’s web server (www.sfu.ca) to a computer connected to a cable modem or ADSL. The actual route from www.sfu.ca to a home computer would have about 8 to 10 steps. Routes to other computers on the Internet might have 20 or even more steps. The various computers on the Internet can be connected to each other in many different ways. When a home computer is connected to the Internet, it is probably connected by one of the following methods: modem: Information is carried to and from your computer over your • phone line, the same way a voice call is transmitted. The modem at 1.1. BASICS OF THE INTERNET 29
either end of the phone call translates the sound of the phone call to and from computer data. ADSL: Information is also transmitted over the phone line but it does • not use a traditional “phone call.” Information is encoded differently than it is in a modem connection, which lets ADSL transmit data faster than a modem. cable modem: With a cable modem, the data to and from your com- • puter is carried over your TV cable. When a computer connects to a network in an office building or at SFU (or if you have a “home network” set up), different connection methods are used. These are typically faster than the “home” connections listed above. ethernet: The connector for Ethernet looks like a fat phone jack. Since • it uses wires, it’s generally used for desktop computers that don’t have to move around. It is several times faster than either ADSL or a cable modem. wi-fi: Wireless networking has recently become affordable and so more • popular. Wi-fi (Wireless Fidelity) is also known as wireless LAN (Wire- less Local Area Network), AirPort, and 802.11. There are different ver- sions (eg. 820.11b and 802.11g) with different data transmission rates. Wi-fi is often used for laptops, which makes it possible to move the ma- chine without having to worry about network cables. Wi-fi is available at SFU and increasingly in other locations. There are also many other connection types used to make connections between buildings and across cities and over long distances across countries.
Clients and Servers You may have noticed that in the discussion above, the home computer, gate- way, and the SFU web server are all described only as computers connected to the Internet. That’s true—all of them are connected to the Internet in fundamentally the same way, only the speed of the connection is different. So, why is www.sfu.ca a web server when your computer isn’t? The only real difference is that www.sfu.ca will answer requests for web pages. The gateway and your home computer won’t. What makes it possible is the web server software installed on the server. This program runs all the 30 UNIT 1. THE WORLD WIDE WEB
1. Please send me the web page
¥¡¥¡¥
¦¡¦¡¦ £¡£¡£
2. Here it is ¤¡¤¡¤
£¡£¡£
¤¡¤¡¤ ¡ ¡ ¡
3. Send me this image ¢¡¢¡¢
¡ ¡ ¡
¢¡¢¡¢
¡ ¡ ¡
¢¡¢¡¢ ¡ ¡ ¡ 4. Here it is ¢¡¢¡¢
Your Computer Web Server
Figure 1.2: The conversation that a web client and web server might have when you view a web page time and answers requests for web pages. If you ran similar software on your computer, people could surf web pages on it as well. If you did install web server software on your computer, you’d run into some problems. First, the SFU web server has an easy to remember name: www.sfu.ca. When home computers are connected to the Internet, they tend to have names like akjx74wuc23nf.bc.hsia.telus.net or h24-84-78-194.vc. shawcable.net—a lot harder to remember and type in correctly Second, www.sfu.ca has a very fast connection to the Internet, enough to serve up a lot of web pages at the same time. It’s also on all the time, which your computer might not be. When you want to access web pages, you use a web browser. A web browser is one example of client software. A web client (like Mozilla or Internet Explorer) is the software you use to make the request to a web server. The client has to transmit the request to the server, receive the response, and then process the information so you can use it. Client and server software let people use computers to transfer informa- tion over the Internet. Everything that you transfer is done by interacting with client software for that type of connection. For example, when you load a web page with Mozilla (a web client), it will ask the web server for the web page itself and then for any images or other files that it needs to display the page. Figure 1.2 shows the conversation that might happen to load a web page with one image on it. 1.2. PROTOCOLS 31
The term “web server” may be a little confusing here because it’s used to refer to both the computer itself (www.sfu.ca is a computer sitting in a closet somewhere in Strand Hall) and the software that’s running on the computer (www.sfu.ca runs the Apache web server).
Check-Up Question I Find out what kind of connection is used between your computer and your ISP. If you have a home network, does it use ethernet or wi-fi?
Topic 1.2 Protocols
When a client and server talk to each other, they have to agree on how they will exchange the various pieces of information that are required to get the job done. The “language” that the client and server use to exchange this information is called a protocol. For example, when they are exchanging web pages, the client and server need to encode the requests and responses shown in Figure 1.2. They also must be able to indicate errors and other behind-the-scenes messages (like “Page not found” or “Page moved to here”). Web pages are transferred using a protocol called the HyperText Transfer Protocol (HTTP). We will discuss HTTP more in Topic 1.3 and in Unit 11.
Information on the Internet There are many different kinds of information that travel over the Internet. The one you’re probably most familiar with is web traffic—web pages and all of the graphics, sounds, and other files that go with them. But, a lot of other stuff travels around as well—the web is just one of many ways that information can travel across the Internet. Here are some other ways information can be exchanged between com- puters on the Internet that you might be familiar with: Email. Even if you check your email on the web (with SFU’s Webmail • or something similar), the mail itself still has to get from the sender to receiver. If someone at Hotmail sends email to your SFU account, it has to travel from the computers at Hotmail to the mail server at SFU. 32 UNIT 1. THE WORLD WIDE WEB
Instant Messaging. The various instant messaging services each have • their own protocols (which is why they don’t generally work together). These include ICQ, AOL Instant Messenger, Yahoo! Messenger, and MSN Messenger. Peer-to-peer file transfer. These methods of file transfer sidestep the • traditional client-server model and let people transfer files directly from one client to another. (Technically, the ”client” software is performing the duties of a client and a server.) Because of their dubious legality, they tend to come and go, but they have included Napster, Gnutella, Audiogalaxy and Kazaa. FTP. FTP stands for “File Transfer Protocol.” It is an older method • of transferring files but it is still often used to do things like installing web pages onto a web server. network gaming. Games that allow network play generally act as a • client; a server is run by the company that produced the game. The client and server exchange information about the game: moves that are made, changes in the game’s “map,” and other information to make sure the game works properly for all users. Each of the examples above uses a different protocol to exchange informa- tion. That is why you need different client software for each of them—they need clients that speak different languages. There are many other services that you might not think of. There are protocols for file sharing that allow you to use files and printers as if they were on your machine (like Windows File/Print Sharing). Other protocols for things like clock synchronization, and remote access to computers you might never run across.
Check-Up Question I Are there more examples of protocols/client software that you use?
Topic 1.3 How Web Pages Travel
We have covered web servers and web clients in Topic 1.1. The two computers have to talk to each other in order to deliver web pages. 1.3. HOW WEB PAGES TRAVEL 33
As noted in Topic 1.2, the “language” a client and server use to talk to each other is called a protocol, and the protocol used to transfer web pages is called HTTP. You may have noticed that web addresses start with http:// ; this part of the address tells your web browser that you want it to use HTTP to talk to the server. A web server’s job is to communicate with HTTP to any clients that connect to it.
URLs
A URL (Uniform Resource Locator) is the proper name for an Internet ad- dress. URLs are also sometimes called URIs (Uniform Resource Identifiers). Here are some examples: http://www.sfu.ca/ http://www.sfu.ca/∼somebody/page.html http://www.w3.org/Addressing/ https://my.sfu.ca/ ftp://ftp.mozilla.org/pub/mozilla/releases/ mailto:[email protected] (Actually, all of these are “absolute” URLs. Relative URLs will be introduced in Unit 2.) The first three URLs are HTTP URLs—they refer to information that is on the WWW. The others refer to information that you access using different protocols. The fourth one, https://my.sfu.ca/ works almost like a web page, but it uses a “secure” connection to transmit the data. All of the information passed back and forth between the client and the server is encrypted so none of the computers in between can eavesdrop. HTTPS is often used for sensitive information like banking, passwords, and email. The last two items in the list are URLs for information accessed by other protocols. They are an FTP URL (information on an FTP server) and an email URL (a link that will let you send an email to that address). The protocol that should be used to access a particular URL is indicated by its scheme. The scheme is the part of the URL before the colon (:). For web (HTTP) URLs, the scheme is “http” or “https” for secure web traffic. The URL schemes “ftp” and “mailto” indicate FTP and email URLs. There are many other less common URL schemes used by specific applications. 34 UNIT 1. THE WORLD WIDE WEB
http://www.sfu.ca/~somebody/page.html
scheme server path
Figure 1.3: The parts of a simple URL
For HTTP and FTP URLs, the first part after the scheme indicates the server that should be contacted to exchange information. The rest of the URL after the server is the path of the file. We will discuss the other parts of an HTTP URL in Topic 2.4.
Check-Up Question I Find the URL of some web pages that you visit often (in the location bar of Mozilla). Identify the scheme, server, and path of each.
Topic 1.4 MIME Types
Any type of data can be transferred over HTTP, including web pages them- selves (which are written in a language called HTML, as we will see in Unit 2). Graphics on web pages are also sent by HTTP (in formats called GIF, PNG, and JPEG, which will be discussed in Unit 3). All of the other files you get from the web are also sent by HTTP: video, audio, Office documents, and so on. When your web browser receives these files, it has to know what to do with them. It treats graphics data very differently from a text file. The browser also has to know what program to open for files it can’t handle itself, like Acrobat files, Office documents, and MP3 audio files. If you use a Windows computer, you are probably used to looking at the file extension to figure out what type of file you have. For example, MS Word documents are usually named something like essay.doc; the .doc indicates that it is a Word document. Other extensions include .html or .htm for web pages and .pdf for Acrobat files. Unfortunately, the file extension can’t be used on the web to decide the type of the file. For one thing, some operating systems don’t use file exten- sions (e.g. MacOS), so we don’t know for sure they will be available. It’s also 1.4. MIME TYPES 35
MIME type File Contents How it might be handled text/html HTML (web page) display in browser as a web page in browser application/pdf Acrobat file open in Adobe Acrobat application/msword MS Word document open in Word image/jpeg JPEG image display in browser audio/mpeg MP3 audio open with WinAmp video/quicktime Quicktime video open with Windows Media
Figure 1.4: Some example MIME types possible that the browser wouldn’t even know a file name when the data is sent. We will also see other reasons later in the course. The above information applies to files sent as email attachments as well. Since email is older than the web, when HTTP was created, it could use the solutions that were already in use for email. Instead of using file extensions, the type of data sent by email or HTTP is indicated by a MIME type. MIME stands for Multipurpose Internet Mail Extensions. The MIME type indicates what kind of file is being sent. It tells the web browser (or email program) how it should handle the data. MIME types are made up of two parts. The first is the type, which indicates the overall kind of information: text, audio, video, image, and so on. The second is the subtype, which is the specific kind of information. For example, a GIF format image will have the MIME type image/gif— indicating that it is an image (which gives the browser a hint what to do with it, even if it doesn’t know the subtype) and the particular type of image is GIF. Most web browsers know how to display GIF images themselves, so they wouldn’t have to use another application to handle it. There are more examples in Figure 1.4.
Microsoft’s Internet Explorer browser doesn’t handle MIME types correctly. It often ignores them and then tries to guess the type itself (and is sometimes wrong). This is one of the reasons that we suggest you don’t use IE for this course.
When you put a file on a web server, the server usually determines the MIME type based on the file’s extension. This isn’t always the case. Some web servers handle things differently, but for the moment the MIME type 36 UNIT 1. THE WORLD WIDE WEB will come from the file’s extension. However, you should remember that the type and the extension are different things.
Check-Up Questions I You can determine the MIME type of a file with Mozilla in the “Page Info” window (in the “View” menu). Try going to a web page and have a look—you should see text/html. I When you’re viewing an image you will see the image’s MIME type if you right-click and “View Image.” Try it out with a few images.
Topic 1.5 Fetching a Web Page
Let’s look back at Figure 1.2, now that we know a few more details about the process. Suppose that the web page that is being requested is at the URL http: //www.sfu.ca/about/index.html and that one of the graphics on the page is the JPEG image at http://www.sfu.ca/hp/images/sfu.jpg . We can now give a more detailed description of what happens in the conversation indicated by each of the four arrows in Figure 1.2. 1. The web browser contacts the server specified in the URL, www.sfu.ca. It asks for the file with path /about/index.html . 2. The server responds with an “OK” message, indicating that the page has been found and will be sent. It indicates that the MIME type of the file is text/html—it’s an HTML page. Then it sends the contents of the file, so the browser can display it. 3. The browser notices that the web page contains an image with URL http://www.sfu.ca/hp/images/sfu.jpg . It contacts www.sfu.ca and asks for the path /hp/images/sfu.jpg . 4. The server again responds with an “OK” and gives the MIME type image/jpeg, which indicates a JPEG format image. Then it sends the actual contents of the image file. Once the web browser has the HTML for the web page and all the graphics that go with it, it can draw the page on the screen for you to see. 1.5. FETCHING A WEB PAGE 37
Summary
This unit gives you a brief introduction to how the WWW and how the Internet works. By now you should have a better understanding of what goes on behind the scenes when you use the Internet for various tasks. This information will help you later in the course and should be useful anytime you’re using the Internet and things go wrong—you can often fix or work around problems if you understand what’s going on. We will be coming back to many of the same topics in Unit 11. By then, we will be able to go into more technical detail.
Key Terms World Wide Web (WWW) protocol • • Internet HTTP • • gateway URL • • client scheme • • server path • • web server MIME type • • client software subtype • • web browser • 38 UNIT 1. THE WORLD WIDE WEB Unit 2 Markup and HTML
Learning Outcomes Discuss various ways of describing documents. • Compare them for use in various tasks. • Create a web page that includes links and images using HTML. • List some common HTML tags. • Carry out validation on an HTML page. •
Learning Activities Read this unit and do the “Check-Up Questions.” • Browse through the links for this unit on the course web site. • Review the XHTML Reference pages found in the “Online References” • section of the course web site.
Topic 2.1 Describing Documents
There are two basic ways of editing documents on a computer. The first is “What You See Is What You Get” or WYSIWYG (pronounced wissy-wig). You’re probably already familiar with it even if you don’t know the term. In a WYSIWYG program, what you see on the screen when you are editing the document looks the same as the finished product. Application
39 40 UNIT 2. MARKUP AND HTML programs like MS Word and WordPerfect are WYSIWYG; you view and edit the document on the screen, and it looks exactly as it will when printed. The other way to edit a document is to use markup. With markup, you use a “markup language,” which looks a lot like a programming language. It has special codes to indicate the appearance of the document. The editing is usually done in a text editor. You can’t see the finished form of the document until processed, and then you can look at it with a viewer. Some examples of markup are HTML (for web pages, which we will use later), XML A (used behind-the-scenes in many programs to store data), LTEX (often used in mathematics and computer science), and WordPerfect’s “Reveal Codes” feature. Most people use WYSIWYG editing for most of their work. Some of the most important reasons they do are that it’s easier to learn, and • the user gets immediate feedback. • There are also good reasons for using markup instead: No information about the document is hidden. • There can be more than meets the eye (e.g., bibliography references, • different display for the screen, printers, etc.). It is easy to separate content from appearance. • All you need to edit it is a text editor. • HTML is the markup language used for web pages. There are several web page editors available for HTML that claim to be WYSIWYG, but there is no such thing as a WYSIWYG web editor. HTML doesn’t describe every aspect of the document’s appearance, so it can never be WYSIWYG. Web page editors that claim that they are only give authors a false sense of security.
Physical vs. Logical Markup We will soon see two distinct forms of markup. The first is physical markup (or visual markup). Physical markup directly describes how the document should appear. That is, you specify a particular appearance for part of the document. For example, markup that indicates that text should be bold or in a 16pt font is visual markup. 2.1. DESCRIBING DOCUMENTS 41
The other form of markup is logical markup (or structural markup). Log- ical markup describes the purpose or meaning of the text. For example, we might indicate that some text is the heading for a new chapter or para- graph. When you use logical markup, you should describe the content of the document without worrying about its appearance. To determine the appearance of logical markup, you use a style. A style is a set of rules like “a section heading should be in a bold, 16pt font, with some space above and below it.” Once an appropriate style has been created, you only need to indicate where the section headings are in the document.
Benefits of Logical Markup
Using logical markup with styles has several benefits. The main one is that if you want to change the appearance of the document, you only need to change the style; the changes should then be made throughout the document. In the example, if we wanted to change the way section headings look, we would change the style file to something like “section headings should be a 18pt font, centred, with some space above and below.” If we had used physical markup, we would have to hunt through the entire document and make the change to each section heading. When you use logical markup, it is also possible to have different styles for different situations. For example, a document could have an on-screen style for viewing on a monitor, a “large-type” style for the visually impaired, and another style for use when it is printed. It is also possible to have a “style” for speaking the document out loud, which could be used by a text-to-speech program for the blind. HTML has both logical and physical markup (as we will see). MS Word actually allows logical formatting with it’s underused and underappreciated “Styles” feature. On the web, logical markup can be used by programs that aren’t tra- ditional web browsers. For example, visually impaired people often use a speech browser. Logical markup can be used to tell the browser to use an appropriate tone of voice or pause at appropriate times. Also, some search engines use logical markup to figure out what your pages are actually about. Using logical markup properly can help these pro- grams categorize your page correctly and bring you visitors who are interested in your topic. 42 UNIT 2. MARKUP AND HTML
Check-Up Questions I How do you usually create your documents? Using WYSIWYG or markup? Do you use physical or logical formatting? I Does your word processor have a feature that allows you to see the markup language that underlies your document? I Do you know how to use any logical formatting features of your word processor (like Word’s “Styles”)?
Topic 2.2 HTML Basics
The documents that we create in this course will be done in HTML (Hy- perText Markup Language). It’s really a very descriptive name: HTML is a markup language that can be used to describe hypertext pages (documents with links). In particular, HTML is used to describe web pages. The version of HTML we will be using in this course is XHTML 1.0. XHTML stands for “eXtensible HyperText Markup Language.” For the mo- ment, we will use “HTML” and “XHTML” synonymously. If you have used HTML before, you might not be familiar with XHTML. It is a newer version of HTML and has a few syntax differences, but it is basically the same. We will discuss the dif- ference between XHTML and the older HTML standards more in Topic 6.5.
If you haven’t written web pages in HTML before, you can think of “HTML” and “XHTML” as the same thing, at least until we get to Topic 6.5. The markup of an HTML page is done with tags. HTML tags consist of one or a few letters, surrounded by triangular braces (<>). Some tags are: bold text
a paragraph
a level 1 (the largest) heading HTML tags have both an opening and closing version. For the bold text tag, is the opening tag and is the closing tag. The text that should be modified is put in the tags. So, this HTML: 2.2. HTML BASICS 43
this is bold text.
Will be displayed like this:
this is bold text.
The material between the opening and closing tags is referred to as the tags contents. In the example above, the tag has contents “bold text.”
Some HTML Tags
Here are some common HTML tags:
. . .
a level 1 heading—every page should probably have one of these at the top.. . .
the second largest heading—sections within the main document.. . .
the smallest heading.. . .
a paragraph—these tags should be wrapped around each paragraph. . . . italic text (visual markup). . . . emphasized text—text that should stand out, usually displayed in italics (structural markup). . . . the HTML document—should be wrapped around the whole document. . . . the header—information about the page, only a few tags, like
Sample page
This is a paragraph, with some bold text.
Figure 2.1: An XHTML document
Figure 2.2: The display of Figure 2.1 in a browser
A Full XHTML Document Figure 2.1 is a full XHTML document (although we will be making some additions to it in Topic 2.7). The text you see there is typed into a text editor and saved as a .html file. Its display in Mozilla 1.6 under Windows is shown in Figure 2.2. Remember that other browsers might display this same HTML differently.
XHTML documents could also be saved with a .xhtml extension, but that can cause problems for browsers that don’t know about XHTML (Internet Explorer, in particular). Stick with .html for this course. 2.2. HTML BASICS 45
This Figure can be found in the “Examples” section of the course web page.
More on Closing Tags In older versions of HTML (before XHTML), it was possible to leave out closing tags in certain circumstances. In XHTML, you can’t omit them: all tags must be closed. There are a few XHTML tags that aren’t allowed to have any contents. For example, the
tag inserts a line break into your page; the next words after
appear on the next line. It doesn’t make any sense to have anything between the
and the closing tag, . So, the two must always appear right beside each other:
. Since empty tags occur frequently in XHTML, there is a short form for the closing tag that can be used for empty tags. These are exactly equivalent in XHTML:
Finishing the opening tag with the slash indicates that you want to close the tag right away. The online XHTML reference indicates which tags are empty and can use this short form for the closing tag. The reference is described later, in Topic 2.3.
Nesting Tags The closing tags must be arranged to ensure the contents are properly nested. It is illegal to write something like the following: bold, italic text Instead, the tags should look like this: bold, italic text so that the . . . is entirely within the . . . , not partially over- lapping. While we’re on the subject of nesting tags, note that some tags can only occur inside others. For example, a paragraph,
, must be inside the body, 46 UNIT 2. MARKUP AND HTML
. The tag- , which indicates an ordered list. These tags are used in this way:
- item 1
- item 2
- tags can go in a
- . The Contained in line gives the tags that they can be placed in. Figure 2.4 indicates that
- can be placed inside of the
HTML Pitfalls There are a few more points about HTML that you should be aware of: All tags in XHTML must be in lower case. So,
is okay, but • and aren’t. Older versions of HTML allowed uppercase tags, but XHTML doesn’t. The browser ignores spacing in your HTML document. It doesn’t mat- • ter if you press return to jump to the next line. All spacing on the finished web page must be done with markup. Any combination of spaces, returns, and tabs in your HTML will be treated the same as a single space by the browser. HTML is sent over the Internet to the user’s web browser. As a result, • the author doesn’t get to decide what the page looks like—the user’s browser does. Authors shouldn’t assume that the page will look the same for everyone as it does for them. For example, pages might look very different in Mozilla, Internet Explorer, WebTV, and so on. If you use the tags correctly, the page should still get its point across. This is the reason WYSIWYG editors don’t fit well with HTML— HTML is never WYSIWYG. If a browser doesn’t know what to do with a particular tag, it ignores it • entirely. The contents of the tag will be displayed, without any change in their appearance, which can cause unexpected results in different browsers. Again, if you’ve used the HTML tags as they are intended, your page should still be readable. 2.3. MORE HTML 47Check-Up Questions I Type in Figure 2.1 using a text editor or download it from the “Examples” section of the course web site. Save it with the file name first.html, and then open it in your web browser. It should look something like Figure 2.2. I Make some changes to your file first.html, and click the “Reload” or “Re- fresh” button in your browser to view the changes. Try some of the other tags mentioned above.
Topic 2.3 More HTML
Attributes Some tags can be modified with attributes. To modify the way a tag be- haves, attributes and their associated values are put in the opening tag, for example:
is used to put a horizontal rule (line) on a page. By default, the rule goes all the way across the page. Its length can be modified with the width attribute. So,
will draw a line across half of the screen. Note that
is another empty tag, so we can put the closing slash in the opening tag. Note also that the value must be enclosed in quotes. Also, a tag can have several attributes, for example:
Entities All HTML tags are enclosed in <>. So, if we want a < on a web page, we can’t just type it into the HTML because the browser will assume that it is indicating the start of a tag. To print this symbol or other special characters, we use entities. 48 UNIT 2. MARKUP AND HTML
garçon
opening tag closing tag attribute value contents entity
Figure 2.3: HTML terms to output “gar¸con”
All entities start with an “&” and end with a “;”—the entity for a less- than sign is <. So, to display “7 < 10” on a web page, we would type 7 < 10 in our HTML document. There are many other entities, for example: > produces a “>” • & produces an “&” • " produces a “"” • á produces an “´a.” • Figure 2.3 shows the parts of an HTML fragment and how they fit to- gether. In the “Online References” section of the course web site, you can find a list of all of the entities and see how they display in your browser.
Comments It is possible to include comments in HTML. Comments are never displayed by the browser. They are used to place notes for yourself or others in the HTML itself. A comment starts with : Comments can also be used to disable some HTML that you don’t want displayed but don’t want to delete either.
XHTML Reference There are many more XHTML tags and entities than have been mentioned here, and you should explore them further. You can visit the definitive reference from the course web site from the “References” section. Figure 2.4 shows part of the reference page for the
- tag. 2.3. MORE HTML 49
Figure 2.4: A sample page from the XHTML reference
The Syntax line gives the general usage of the tag. Empty tags will be displayed with the short-form closing tag. The Attribute Specifications lines give a list of possible attributes for this tag and their values. The Contents line indicates what you may be put inside the tag. In Figure 2.4, we see that “one or more”