Characterizing Web Spam Using Content and HTTP Session Analysis

Total Page:16

File Type:pdf, Size:1020Kb

Characterizing Web Spam Using Content and HTTP Session Analysis Characterizing Web Spam Using Content and HTTP Session Analysis Steve Webb James Caverlee Calton Pu College of Computing College of Computing College of Computing Georgia Institute of Georgia Institute of Georgia Institute of Technology Technology Technology Atlanta, GA 30332 Atlanta, GA 30332 Atlanta, GA 30332 [email protected] [email protected] [email protected] ABSTRACT vious studies [1, 4, 6] have shown that only about two thirds Web spam research has been hampered by a lack of statis- of all web pages are unique; thus, we expected to find a sim- tically significant collections. In this paper, we perform the ilar degree of duplication among our web spam pages. To first large-scale characterization of web spam using content evaluate duplication in the corpus, we constructed clusters and HTTP session analysis techniques on the Webb Spam (equivalence classes) of duplicate or near-duplicate pages. Corpus – a collection of about 350,000 web spam pages. Our Based on the sizes of these equivalence classes, we discov- content analysis results are consistent with the hypothesis ered that duplication is twice as prevalent among web spam that web spam pages are different from normal web pages, pages (i.e., only about one third of the pages are unique). showing far more duplication of physical content and URL The second part of the content analysis focuses on a cat- redirections. An analysis of session information collected egorization of web spam pages. Specifically, we identify five during the crawling of the Webb Spam Corpus shows signif- important categories of web spam: Ad Farms, Parked icant concentration of hosting IP addresses in two narrow Domains, Advertisements, Pornography, and Redi- ranges as well as significant overlaps among session header rection. The Ad Farms and Parked Domains cate- values. These findings suggest that content and HTTP ses- gories consist of pages that are comprised exclusively of ad- sion analysis may contribute a great deal towards future vertising links. These pages exist solely to generate traf- efforts to automatically distinguish web spam pages from fic for other sites and money for web spammers (through normal web pages. pay-per-click advertising programs). The Advertisements category contains pages that advertise specific products and services, and the pages in the Pornography category are 1. INTRODUCTION pornographic in nature. The Redirection category con- Web spam has grown to a significant percentage of all sists of pages that employ various redirection techniques. web pages (between 13.8% and 22.1% of all web pages [2, Within the Redirection category, we identify seven redi- 8]), threatening the dependability and usefulness of web- rection techniques (HTTP-level redirects, 3 HTML-based based information in a manner similar to how email spam redirects, and 3 JavaScript-based redirects), and we find has affected email. Unfortunately, previous research on the that 43.9% of web spam pages use some form of HTML nature of web spam [2, 5, 8, 10, 11, 13] has suffered from or JavaScript redirection. the difficulties associated with manually classifying and sep- The third component of our research is an evaluation of arating web spam pages from legitimate pages. As a result, the HTTP session information associated with web spam. these previous studies have been limited to a few thousand First, we examine the IP addresses that hosted our web web spam pages, which is insufficient for an effective content spam pages and find that 84% of the web spam pages were analysis (as customarily performed in email spam research). hosted on the 63.* – 69.* and 204.* – 216.* IP address In this paper, we provide the first large-scale experimen- ranges. Then, we evaluate the most commonly used HTTP tal study of web spam pages by applying content and HTTP session headers and values. As a result of this evaluation, we session analysis techniques to the Webb Spam Corpus [12] – find that many web spam pages have similar values for nu- a collection of almost 350,000 web spam examples that is two merous headers. For example, we find that 94.2% of the web orders of magnitude larger than the collections used in pre- spam pages with a “Server” header were hosted by Apache vious evaluations. Our main hypothesis in this study is that (63.9%) or Microsoft IIS (30.3%). These results are partic- web spam pages are fundamentally different from “normal” ularly interesting because they suggest that HTTP session web pages. To evaluate this hypothesis, we characterize the information might be extremely valuable for automatically content and HTTP session properties of web spam pages distinguishing between web spam pages and normal pages. using a variety of methods. The web spam content analy- The rest of the paper is organized as follows. Section 2 de- sis is composed of two parts. The first part quantifies the scribes our web spam corpus and summarizes its collection amount of duplication present among web spam pages. Pre- methodology. In Section 3, we report the results of a content analysis of web spam, which consists of two parts. The first part evaluates the amount of duplication that appears in web CEAS 2007 - Fourth Conference on Email and Anti-Spam, August 2-3, 2007, Mountain View, California USA spam. The second part identifies concrete web spam cate- gories and provides an extensive description of the redirec- 100000 tion techniques being used by web spammers. In Section 4, we report the results of an analysis of web spam HTTP ses- sion information, which identifies the most common hosting 10000 IP addresses and HTTP header values associated with web spam. Section 5 summarizes related work, and Section 6 1000 concludes the paper and provides future research directions. 100 2. THE WEBB SPAM CORPUS Number of clusters In our previous research [12], we developed an automatic 10 technique for obtaining web spam examples that leverages the presence of URLs in email spam messages. Specifically, we extracted almost 1.2 million unique URLs from more 1 than 1.4 million email spam messages. Then, we built a 1 10 100 1000 10000 100000 crawler to obtain the web pages that corresponded to those Cluster size URLs. Our crawler attempted to access each of the URLs; Figure 1: Number and size of the shingling clusters. however, many of the URLs returned HTTP redirects (i.e., 3xx HTTP status codes). The crawler followed all of these redirects until it finally accessed a URL that did not return a redirect. In our previous work [12], we identified the existence of Our crawler obtained two types of information for every duplicate URLs in the corpus (i.e., multiple web spam pages successfully accessed URL (including those that returned a with the same URL), and we explained that these duplicate redirect): the HTML content of the page identified by the URLs are the result of multiple unique HTTP redirect chains URL and the HTTP session information associated with the that lead to the same destination. Specifically, we found page request transaction. As a result, we created a file for that the corpus contains 263,446 unique URLs, which means every successfully accessed URL that contains all of this in- about one fourth of the web spam pages have a URL that formation. After our crawling process was complete, we had is the same as one of the web spam pages in the remaining 348,878 web spam pages and 223,414 redirect files (i.e., files three fourths. that correspond to redirect responses). These files are col- To identify content duplication, we computed MD5 hashes lectively referred to as the Webb Spam Corpus1, and they for the HTML content of all of the web spam pages in our provide the basis for our analysis in this paper. For a more corpus. After evaluating these results, we found 202,208 detailed description of our collection methodology and the unique MD5 values. Thus, 146,670 of the web spam pages format of the files in the Webb Spam Corpus, please con- (42%) have the exact same HTML content as one of the sult [12]. pages in a collection of 202,208 unique web spam pages. We acknowledge that our collection of web spam exam- Many of these duplicates are explained by the URL dupli- ples is not representative of all web spam; however, it is two cation that exists in the corpus (described above), but since orders of magnitude larger than any other available source each of the duplicate URLs represents a distinct entry point of web spam to date, and as such, it currently provides the (i.e., a unique HTTP redirect chain) to a given page, we most realistic snapshot of web spammer behavior. Thus, consider them to be functionally equivalent to content du- although the characteristics of our corpus might not be in- plicates. dicative of all web spam, our observations still provide ex- To evaluate the amount of near-duplication in our corpus, tremely useful insights about the techniques being employed we used the shingling algorithm that was developed by Fet- by web spammers. terly et al. [3, 4, 6] to construct equivalence classes of dupli- cate and near-duplicate web spam pages. First, we prepro- cessed every web spam page in the corpus. Specifically, the 3. CONTENT ANALYSIS HTML tags in each page were replaced by white space, and In this section, we provide the results of our large-scale every page was tokenized into a collection of words, where a analysis of web spam content. This analysis consists of two word is defined as an uninterrupted series of alphanumeric parts. The first part, discussed in Section 3.1, quantifies the characters.
Recommended publications
  • In-Depth Evaluation of Redirect Tracking and Link Usage
    Proceedings on Privacy Enhancing Technologies ; 2020 (4):394–413 Martin Koop*, Erik Tews, and Stefan Katzenbeisser In-Depth Evaluation of Redirect Tracking and Link Usage Abstract: In today’s web, information gathering on 1 Introduction users’ online behavior takes a major role. Advertisers use different tracking techniques that invade users’ privacy It is common practice to use different tracking tech- by collecting data on their browsing activities and inter- niques on websites. This covers the web advertisement ests. To preventing this threat, various privacy tools are infrastructure like banners, so-called web beacons1 or available that try to block third-party elements. How- social media buttons to gather data on the users’ on- ever, there exist various tracking techniques that are line behavior as well as privacy sensible information not covered by those tools, such as redirect link track- [52, 69, 73]. Among others, those include information on ing. Here, tracking is hidden in ordinary website links the user’s real name, address, gender, shopping-behavior pointing to further content. By clicking those links, or or location [4, 19]. Connecting this data with informa- by automatic URL redirects, the user is being redirected tion gathered from search queries, mobile devices [17] through a chain of potential tracking servers not visible or content published in online social networks [5, 79] al- to the user. In this scenario, the tracker collects valuable lows revealing further privacy sensitive information [62]. data about the content, topic, or user interests of the This includes personal interests, problems or desires of website. Additionally, the tracker sets not only third- users, political or religious views, as well as the finan- party but also first-party tracking cookies which are far cial status.
    [Show full text]
  • IBM Cognos Analytics - Reporting Version 11.1
    IBM Cognos Analytics - Reporting Version 11.1 User Guide IBM © Product Information This document applies to IBM Cognos Analytics version 11.1.0 and may also apply to subsequent releases. Copyright Licensed Materials - Property of IBM © Copyright IBM Corp. 2005, 2021. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM, the IBM logo and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at " Copyright and trademark information " at www.ibm.com/legal/copytrade.shtml. The following terms are trademarks or registered trademarks of other companies: • Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. • Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. • Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. • Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. • UNIX is a registered trademark of The Open Group in the United States and other countries. • Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
    [Show full text]
  • Download Forecheck Guide
    Forecheck Content 1 Welcome & Introduction 5 2 Overview: What Forecheck Can Do 6 3 First Steps 7 4 Projects and Analyses 9 5 Scheduler and Queue 12 6 Important Details 13 6.1 La..n..g..u..a..g..e..s.,. .C...h..a..r.a..c..t.e..r. .S..e..t.s.. .a..n..d.. .U..n..i.c..o..d..e....................................................................... 13 6.2 Ch..o..o..s..i.n..g.. .t.h..e.. .c.o..r..r.e..c..t. .F..o..n..t............................................................................................. 14 6.3 St.o..r.a..g..e.. .L..o..c..a..t.i.o..n.. .o..f. .D..a..t.a................................................................................................ 15 6.4 Fo..r.e..c..h..e..c..k. .U...s.e..r.-.A...g..e..n..t. .a..n..d.. .W...e..b.. .A..n..a..l.y..s.i.s.. .T..o..o..l.s............................................................. 15 6.5 Er.r.o..r.. .H..a..n..d..l.i.n..g................................................................................................................ 17 6.6 Ro..b..o..t.s...t.x..t.,. .n..o..i.n..d..e..x..,. .n..o..f.o..l.l.o..w......................................................................................... 18 6.7 Co..m...p..l.e..t.e.. .A..n..a..l.y..s..i.s. .o..f. .l.a..r.g..e.. .W....e..b..s.i.t.e..s............................................................................. 20 6.8 Fi.n..d..i.n..g.. .a..l.l. .p..a..g..e..s. .o..f. .a.. .W....e..b..s.i.t.e....................................................................................... 21 6.9 Go..o..g..l.e.. .A...n..a..l.y.t.i.c..s.
    [Show full text]
  • Lxmldoc-4.5.0.Pdf
    lxml 2020-01-29 Contents Contents 2 I lxml 14 1 lxml 15 Introduction................................................. 15 Documentation............................................... 15 Download.................................................. 16 Mailing list................................................. 17 Bug tracker................................................. 17 License................................................... 17 Old Versions................................................. 17 2 Why lxml? 18 Motto.................................................... 18 Aims..................................................... 18 3 Installing lxml 20 Where to get it................................................ 20 Requirements................................................ 20 Installation................................................. 21 MS Windows............................................. 21 Linux................................................. 21 MacOS-X............................................... 21 Building lxml from dev sources....................................... 22 Using lxml with python-libxml2...................................... 22 Source builds on MS Windows....................................... 22 Source builds on MacOS-X......................................... 22 4 Benchmarks and Speed 23 General notes................................................ 23 How to read the timings........................................... 24 Parsing and Serialising........................................... 24 The ElementTree
    [Show full text]
  • CSE 190 M (Web Programming), Spring 2008 University of Washington
    Extra Slides, week 2 CSE 190 M (Web Programming), Spring 2008 University of Washington Reading: Chapter 2, sections 2.4 - 2.6 Except where otherwise noted, the contents of this presentation are © Copyright 2008 Marty Stepp and Jessica Miller and are licensed under the Creative Commons Attribution 2.5 License. Additional XHTML Tags for adding metadata and icons to a page Web page metadata: <meta> information about your page (for a browser, search engine, etc.) <meta name="description" content="Authors' web site for Building Java Programs." /> <meta name="keywords" content="java, textbook" /> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> placed in the head of your XHTML page meta tags often have both the name and content attributes some meta tags use the http-equiv attribute instead of name meta element to aid browser / web server <meta http-equiv="Content-Type" content=" type of document (character encoding)" /> <meta http-equiv="refresh" content=" how often to refresh the page (seconds)" /> </head> using the Content-Type gets rid of the W3C "tentatively valid" warning <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> the meta refresh tag can also redirect from one page to another: <meta http-equiv="refresh" content="5;url=http://www.bjp.com " /> why would we want to do this? (example ) meta element to describe the page <head> <meta name="author" content=" web page's author " /> <meta name="revised" content=" web page version and/or last modification date " /> <meta name="generator"
    [Show full text]
  • Essbase Logs
    Oracle® Enterprise Performance Management System Installation and Configuration Troubleshooting Guide Release 11.1.2.1 Updated: August 2012 EPM System Installation and Configuration Troubleshooting Guide, 11.1.2.1 Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved. Authors: EPM Information Development Team Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT RIGHTS: Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are "commercial computer software" or "commercial technical data" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, duplication, disclosure, modification, and adaptation shall be subject to the restrictions and license terms set forth in the applicable Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial Computer Software License (December 2007).
    [Show full text]
  • CSE 190 M: HTML Page 1
    CSE 190 M: HTML Page 1 Lecture Notes 2: Hypertext Markup Language (HTML) CSE 190 M (Web Programming), Spring 2007 University of Washington Reading: Sebesta Ch. 2 sections 2.1 - 2.4.5, 2.4.7 - 2.4.9, 2.5 -2.6.1,2.7 -2.7.2 Hypertext Markup Language (HTML ) describes the content and structure of information on a web page not the same as the presentation (appearance on screen) surrounds text content with opening and closing tags each tag's name is called an element syntax: <element > content </element > example: <p>This is a paragraph</p> most whitespace is insignificant in HTML (it gets ignored or collapsed into a single space) More about HTML tags some tags can contain additional information called attributes syntax: <element attribute =" value " attribute =" value "> content </element > example: <a href="page2.html">Next page</a> some tags don't contain content; can be opened and closed in one tag syntax: <element attribute =" value " attribute =" value " /> example: <img src="bunny.jpg" alt="A bunny" /> example: <hr /> file://localhost/C:/Documents%20and%20Settings/stepp/My%20Documents/cse190m/07sp/... 05/03/2007 12:57:29 PM CSE 190 M: HTML Page 2 Structure of an HTML page <html> <head> information about the page </head> <body> page contents </body> </html> a header describes the page and a body contains the page's contents an HTML page is saved into a file ending with extension .html Example page XHTML a newer version of HTML, standardized in 2000 uses a markup format called XML (XML + HTML = XHTML) though the browser will accept some malformed HTML, we'll write "strict" XHTML that complies to the official web standards why use XHTML and web standards? more rigid and structured language more interoperable across different web browsers more likely that our pages will display correctly in the future can be interchanged with other XML data: SVG (graphics), MathML , MusicML , etc.
    [Show full text]
  • (ACM) Style Guide, WCAG 2.0 Level A
    Association of Computing Machinery (ACM) Style Guide, WCAG 2.0 Level A ACM 12/19/16 TABLE OF CONTENTS Overview ............................................................................................................................................................... 4 Media Types .......................................................................................................................................................... 4 Images ............................................................................................................................................................... 4 Best Practice: Provide Alternative Text for Images ..................................................................................... 4 Best Practice: Ensure complex images provide sufficient descriptions ....................................................... 5 Best Practice: Ensure CSS background images that convey meaning have textual and visible equivalents ...................................................................................................................................................................... 5 Color and Contrast ............................................................................................................................................ 6 Best Practice: Ensure color is not the sole means of communicating information or indicating error messages .....................................................................................................................................................
    [Show full text]
  • SC Tech Summary
    SC tech summary Failure of Success Criterion F65 1.1.1 due to omitting the alt attribute on img elements, area elements, and input elements of type image Using label elements to associate text labels with form H44 controls." category Using the title attribute to identify form controls when H65 the label element cannot be used Using alt attributes on images used as submit buttons" H36 category ARIA6 Using aria-label to provide labels for objects H37 Using alt attributes on img elements 1.1.1 Failure of Success Criterion F30 1.1.1 and 1.2.1 due to using text alternatives that are not alternatives Combining adjacent image and text links for the same H2 resource H35 Providing text alternatives on applet elements H53 Using the body of the object element" category Providing text alternatives for the area elements of image H24 maps" category Providing link text that describes the purpose of a link for H30 anchor elements H45 Using longdesc H46 Using noembed with embed Using null alt text and no title attribute on img elements H67 for images that AT should ignore H96 Using the track element to provide audio descriptions Providing an alternative for time-based media for audio- G158 only content Providing an alternative for time-based media for video- G195 1.2.1 only content Providing audio that describes the important video G166 content and describing it as such Failure of Success Criterion F30 1.1.1 and 1.2.1 due to using text alternatives that are not alternatives 1.2.2 H95 Using the track element to provide captions 1.2.3 H53 Using the body
    [Show full text]
  • Exploring the Ecosystem of Referrer-Anonymizing Services
    Exploring the Ecosystem of Referrer-Anonymizing Services Nick Nikiforakis, Steven Van Acker, Frank Piessens, and Wouter Joosen IBBT-DistriNet, KU Leuven, 3001 Leuven, Belgium [email protected] Abstract. The constant expansion of the World Wide Web allows users to enjoy a wide range of products and services delivered directly to their browsers. At the same time however, this expansion of functionality is usually coupled with more ways of attacking a user's security and privacy. In this arms race, certain web-services present themselves as privacy- preserving or privacy-enhancing. One type of such services is a Referrer- Anonymizing Service (RAS), a service which relays users from a source site to a destination site while scrubbing the contents of the referrer header from user requests. In this paper, we investigate the ecosystem of RASs and how they in- teract with web-site administrators and visiting users. We discuss their workings, what happens behind the scenes and how top Internet sites react to traffic relayed through such services. In addition, we present user statistics from our own Referrer-Anonymizing Service and show the leakage of private information by others towards advertising agencies as well as towards `curious' RAS owners. Keywords: referrer, anonymization, online ecosystem 1 Introduction In the infant stages of the Internet, privacy and anonymity were mostly unneces- sary due to the small size of the online community and the public nature of the available data. Today however, this has changed. People have online identities, are connected to the Internet almost permanently and they increasingly store their sensitive documents, photos and other data online in the cloud.
    [Show full text]
  • Package 'Htmlutils'
    Package ‘HTMLUtils’ February 19, 2015 Type Package Title Facilitates Automated HTML Report Creation Version 0.1.7 Date 2015-01-17 Depends R2HTML Suggests Author ``Markus Loecher, Berlin School of Eco- nomics and Law (BSEL)'' <[email protected]> Maintainer ``Markus Loecher, Berlin School of Eco- nomics and Law (BSEL)'' <[email protected]> Description Facilitates automated HTML report creation, in particular framed HTML pages and dynamically sortable tables. License GPL LazyLoad yes Repository CRAN Date/Publication 2015-01-17 16:21:32 NeedsCompilation no R topics documented: HTMLUtils-package . .2 BasicHTML . .2 FramedHTML . .6 HTMLhref . 12 HTMLsortedTable . 13 InstallJSC . 14 makePathName . 14 myHTMLInitFile . 15 MyReportBegin . 16 MyReportEnd . 17 Index 18 1 2 BasicHTML HTMLUtils-package Facilitates Automated HTML Report Creation Description Facilitates automated HTML report creation, in particular framed HTML pages and dynamically sortable tables. Details Package: HTMLUtils Type: Package Title: Facilitates Automated HTML Report Creation Version: 0.1.7 Date: 2015-01-17 Depends: R2HTML Suggests: Author: "Markus Loecher, Berlin School of Economics and Law (BSEL)" <[email protected]> Maintainer: "Markus Loecher, Berlin School of Economics and Law (BSEL)" <[email protected]> License: GPL LazyLoad: yes Packaged: 2012-05-17 21:56:35 UTC; mloecher Repository: CRAN Date/Publication: 2012-05-18 05:59:13 Author(s) "Markus Loecher, Berlin School of Economics and Law (BSEL)" <[email protected]> BasicHTML creates a basic HTML page displaying plots and annota Description Creates a basic HTML page displaying plots and annotations that can easily be navigated. The plots can be created either ’on the fly’ by passing the appropriate commands or beforehand in which case just the filenames need to be passed.
    [Show full text]
  • CSC 443: Web Programming Basic HTML
    1 CSC 443: Web Programming Haidar Harmanani Department of Computer Science and Mathematics Lebanese American University Byblos, 1401 2010 Lebanon CSC443: Web Programming 2 Basic HTML Haidar Harmanani Department of Computer Science and Mathematics Lebanese American University Byblos, 1401 2010 Lebanon CSC443: Web Programming Hypertext Markup Language (HTML) 3 ¨ Describes the content and structure of information on a web page ¨ Not the same as the presentation (appearance on screen) ¨ Surrounds text content with opening and closing tags ¨ Each tag's name is called an element ¤ syntax: <element> content </element> ¤ example: <p>This is a paragraph</p> CSC443: Web Programming XHTML 4 ¨ Uses a markup format called XML ¨ XML + HTML = XHTML ¨ Standardized in 2000 ¨ A strict XHTML page uses some different syntax and tags than HTML CSC443: Web Programming Structure of XHTML page 5 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> information about the page </head> <body> page contents </body> </html> HTML ¨ HTML is saved with extension .html ¨ Basic structure: tags that enclose content, i.e., elements ¨ Header describes the page ¨ Body contains the page’s contents CSC443: Web Programming Page Title <title> 6 … <head> <title> HARRY POTTER AND THE DEATHLY HALLOWS - PART 2 </title> </head> … HTML ¨ Placed within the head of the page ¨ Displayed in web browser’s title mark and when bookmarking the page CSC443: Web Programming Paragraph <p> 7 … <body> <p> Harry Potter and the Deathly Hallows, the last book in the series, begins directly after the events of the sixth book.
    [Show full text]