World-Wide Web WWW: Incentives

Total Page:16

File Type:pdf, Size:1020Kb

World-Wide Web WWW: Incentives What is Web Mining? The use of data mining techniques to automatically RECOMMENDATION MODELS discover and extract information from Web documents and services (Etzioni, 1996) FOR WEB USERS Web mining research integrate research from several research communities (Kosala and Blockeel, 2000) such as: Dr. Şule Gündüz Öğüdücü Database (DB) [email protected] Information Retrieval (IR) The sub-areas of machine learning (ML) Natural language processing (NLP) 1 2 World-wide Web WWW: Incentives Initiated at CERN (the European Organization for Nuclear Research) WWW is a huge, By Tim Berners-Lee widely distributed, GUIs global information Berners-Lee (1990) source for: Erwise and Viola(1992), Midas (1993) Information services: Mosaic (1993) news, advertisements, a hypertext GUI for the X-window system consumer information, HTML: markup language for rendering hypertext financial management, HTTP: hypertext transport protocol for sending HTML and other data over the Internet education, government, e- CERN HTTPD: server of hypertext documents commerce, health 1994 services, etc. Netscape was founded st 1 World Wide Web Conference Hyper-link information World Wide Web Consortium was founded by CERN and MIT http://www.touchgraph.com/TGGoogleBrowser.html Web page access and http://www.w3.org/ usage information Web site contents and Mining the Web Chakrabarti and Ramakrishnan organizations 3 4 Mining the World Wide Web Application Examples: e-Commerce Growing and changing very rapidly 6 December 2006 : 12.52 billion pages http://www.worldwidewebsize.com/ Only a small portion of information on the Web is truly relevant or useful to Web user WWW provides rich sources for data mining Goals include: Target potential customers for electronic commerce Enhance the quality and delivery of Internet information services to the end user Improve Web server system performance Facilitates personalization/adaptive sites Improve site design Fraud/intrusion detection Predict user’s actions 5 6 1 Challenges on WWW Interactions Web Mining Taxonomy Searching for usage patterns, Web structures, regularities and dynamics of Web contents Web Mining Finding relevant information 99% of info of no interest to 99% of people Creating knowledge from information available Limited query interface based on keyword – oriented Web Structure Web Content Web Usage search Mining Mining Mining Personalization of the information Limited customization to individual users 7 8 Web Usage Mining Terms in Web Usage Mining Web usage mining also known as Web log mining User: A single individual who is accessing files from one or more Web servers through a browser. The application of data mining techniques to discover usage patterns from the secondary data derived from Page file: The file that is served through a HTTP the interactions of users while surfing the Web protocol to a user. Page: The set of page files that contribute to a single Information scent display in a Web browser constitutes a Web page. Information: Food Click stream: The sequence of pages followed by a WWW: Forest user. User: Animal Server session (visit): A set of pages that a user requested from a single Web server during her single Understanding the behavior of an animal that looks for visit to the Web site. food in the forest 9 10 Methodology Recommendation Process Offline Process Online Process Input Output Pattern Recommendation Preparation Extraction Data Collection Cleaning User Subset of items: Usage Recommendation Representation Patterns Set •Movies •Books •CDs •User Identification •Statistical analysis Recommender •Session Identification •Association rules •Web documents System •Server level collection •Calculation of visiting •Clustering Set of items: : •Client level collection page time •Classification : •Proxy level collection •Cleaning •Sequential pattern •Movies •Books •CDs Data collection •Web documents Data preparation and cleaning : : The right information is Pattern extraction delivered to the right people at Recommendation the right time. 11 12 2 What information can be included in User Representation? Data Collection Server level collection The order of visited Web pages Client level collection The visiting page time Proxy level collection The content of the visited Web page The change of user behavior over time The difference in usage and behavior from different geographic areas Client User profile Web server Client 13 Proxy server 14 Web Server Log File Entries Use of Log Files Method/ IP Address User ID Timestamp Status Size URL Questions Who is visiting the Web site? What is the path users take through the Web pages? How much time users spend on each page? Where and when visitors are leaving the Web site? 15 16 Data Preprocessing (1) Data Preprocessing (2) Data cleaning Data transformation Remove log entries with filename suffixes such as gif, jpeg, GIF, JPEG, jpg, JPG User identification Remove the page requests made by the automated Users with the same client IP are identical agents and spider programs Session idendification Remove the log entries that have a status code of 400 and 500 series A new session is created when a new IP address is encountered or if the visiting page time exceeds 30 Normalize the URLs: Determine URLs that correspond to the same Web page minutes for the same IP-address Data integration Data reduction Merge data from multiple server logs sampling Integrate semantics (meta-data) dimensionality reduction Integrate registration data 17 18 3 Discovery of Usage Patterns Statistical Analysis Pattern Discovery is the key component of the Web Different kinds of statistical analysis (frequency, median, mean, etc.) of the session file, one can extract statistical information such mining, which converges the algorithms and as: techniques from data mining, machine learning, The most frequently accessed pages statistics and pattern recognition etc research Average view time of a page categories Average length of a path through a site Application: Separate subsections: Improving the system performance Statistical analysis Enhancing the security of the system Providing support for marketing decisions Association rules Examples: Clustering PageGather (Perkowitz et al., 1998) Classification Discovering of user profiles (Larsen et al., 2001) Sequential pattern 19 20 Association Rules Clustering A technique to group together objects with the Sets of pages that are accessed together with a similar characteristics support value exceeding some specified Clustering of sessions threshold Clustering of Web pages Application: Clustering of users Finding related pages that are accessed Application: together regardless of the order Facilitate the development and execution of Examples: future marketing strategies Web caching and prefetching (Yang et al., Examples: 2001) Clustering of user sessions (Gunduz et al., 2003) Clustering of individuals with mixture of Markov models (Sarukkai, 2000) 21 22 Classification Sequential Pattern The technique to map data item into one of Discovers frequent subsequences as patterns several predefined classes Applications: Application The analysis of customer purchase behavior Developing a usage profile belonging to a Optimization of Web site structure particular class or category Examples: Examples: WUM (Spiliopoulou and Faulstich, 1998) WebLogMiner (Zaiane et al., 1998) Longest Repeated Subsequence (Pitkow and Pirolli, 1999) 23 24 4 Online Module: Recommendation Probabilistic models of browsing behavior The discovered patterns are used by the online Useful to build models that describe the component of the model to provide dynamic browsing behavior of users recommendations to users based on their Can generate insight into how we use Web current navigational activity Provide mechanism for making predictions The produced recommendation set is added to Can help in pre-fetching and personalization the last requested page as a set of links before the page is sent to the client browser 25 26 Markov models for page prediction Markov models for page prediction General approach is to use a finite-state Markov chain For simplicity, consider order-dependent, time-independent finite-state Markov chain with M states Each state can be a specific Web page or a category of Web pages Let s be a sequence of observed states of length L. e.g. s = ABBCAABBCCBBAA with three states A, B and C. st is state at If only interested in the order of visits (and not in time), position t (1<=t<=L). In general, each new request can be modeled as a transition of L P(s) = P(s1)∏ P(st | st−1,..., s1) states t =2 Under a first-order Markov assumption, we have Issues L P (s) = P (s1 )∏ P (s t | s t −1 ) t = 2 Self-transition This provides a simple generative model to produce sequential Time-independence data 27 28 Markov models for page prediction Markov models for page prediction If we denote T = P(s = j|s = i), we can define a M x M transition ij t t-1 Tij = P(st = j|st-1 = i) now represent the probability matrix that an individual user’s next request will be from Properties category j, given they were in category i Strong first-order assumption We can add E, an end-state to the model Simple way to capture sequential dependence If each page is a state and if W pages, O(W2), W can be of the E.g. for three categories with end state: - 5 6 order 10 to 10 for a CS dept. of a university ⎛ P(1|1) P(2 |1) P(3|1_) P(E |1) ⎞ ⎜ ⎟ To alleviate, we
Recommended publications
  • World-Wide Web Proxies
    World-Wide Web Proxies Ari Luotonen, CERN Kevin Altis, Intel April 1994 Abstract 1.0 Introduction A WWW proxy server, proxy for short, provides access to The primary use of proxies is to allow access to the Web the Web for people on closed subnets who can only access from within a firewall (Fig. 1). A proxy is a special HTTP the Internet through a firewall machine. The hypertext [HTTP] server that typically runs on a firewall machine. server developed at CERN, cern_httpd, is capable of run- The proxy waits for a request from inside the firewall, for- ning as a proxy, providing seamless external access to wards the request to the remote server outside the firewall, HTTP, Gopher, WAIS and FTP. reads the response and then sends it back to the client. cern_httpd has had gateway features for a long time, but In the usual case, the same proxy is used by all the clients only this spring they were extended to support all the within a given subnet. This makes it possible for the proxy methods in the HTTP protocol used by WWW clients. Cli- to do efficient caching of documents that are requested by ents don’t lose any functionality by going through a proxy, a number of clients. except special processing they may have done for non- native Web protocols such as Gopher and FTP. The ability to cache documents also makes proxies attrac- tive to those not inside a firewall. Setting up a proxy server A brand new feature is caching performed by the proxy, is easy, and the most popular Web client programs already resulting in shorter response times after the first document have proxy support built in.
    [Show full text]
  • Site Map - Apache HTTP Server 2.0
    Site Map - Apache HTTP Server 2.0 Apache HTTP Server Version 2.0 Site Map ● Apache HTTP Server Version 2.0 Documentation ❍ Release Notes ■ Upgrading to 2.0 from 1.3 ■ New features with Apache 2.0 ❍ Using the Apache HTTP Server ■ Compiling and Installing Apache ■ Starting Apache ■ Stopping and Restarting the Server ■ Configuration Files ■ How Directory, Location and Files sections work ■ Server-Wide Configuration ■ Log Files ■ Mapping URLs to Filesystem Locations ■ Security Tips ■ Dynamic Shared Object (DSO) support ■ Content Negotiation ■ Custom error responses ■ Setting which addresses and ports Apache uses ■ Multi-Processing Modules (MPMs) ■ Environment Variables in Apache ■ Apache's Handler Use ■ Filters ■ suEXEC Support ■ Performance Hintes ■ URL Rewriting Guide ❍ Apache Virtual Host documentation ■ Name-based Virtual Hosts ■ IP-based Virtual Host Support ■ Dynamically configured mass virtual hosting ■ VirtualHost Examples ■ An In-Depth Discussion of Virtual Host Matching ■ File descriptor limitations ■ Issues Regarding DNS and Apache ❍ Apache Server Frequently Asked Questions http://httpd.apache.org/docs-2.0/sitemap.html (1 of 4) [5/03/2002 9:53:06 PM] Site Map - Apache HTTP Server 2.0 ■ Support ❍ Apache SSL/TLS Encryption ■ SSL/TLS Encryption: An Introduction ■ SSL/TLS Encryption: Compatibility ■ SSL/TLS Encryption: How-To ■ SSL/TLS Encryption: FAQ ■ SSL/TLS Encryption: Glossary ❍ Guides, Tutorials, and HowTos ■ Authentication ■ Apache Tutorial: Dynamic Content with CGI ■ Apache Tutorial: Introduction to Server Side Includes ■ Apache
    [Show full text]
  • F JUN 1 1 1996
    Mixed Retrieval and Virtual Documents on the World Wide Web by Christopher Alan Fuchs Submnitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science in Computer Science and Engineering and Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology May 2, 1996 Copyright 1996 Christopher Alan Fuchs. All rights reserved. The author hereby grants to M.I.T. permission to reproduce and to distribute copies of this thesis document in whole or in part, and to grant others the right to do so. Author V J .. f Department of Electrical Engineering and Computer Science May 2, 1996 Certified by Ci James S. Miller (I^ - Thesis Supervisor Accepted by. F.R. Morgenthaler . -'.s!-s rs u F Chairman, Department Conunittee on Graduate Theses OF TECHNOLOGY JUN 1 1 1996 ULIBRARIES Eng. Mixed Retrieval and Virtual Documents on the World Wide Web by Christopher Alan Fuchs Submitted to the Department of Electrical Engineering and Computer Science May 2, 1996 In Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Science and Engineering and Master of Engineering in Electrical Engineering and Computer Science ABSTRACT World Wide Web site managers are forced to serve data using static retrieval (from the file system) or dynamic retrieval (by executing programs) without the practical option of migrating between the two. It was found that dynaimnic retrieval through the CGI script mechanism has an overhead of 18-33% above its statically-retrieved counterpart. A mixed retrievalsystem allows Web documents to migrate between static and dynamic retrieval to gain ithe benefits of both methods and increase the overall performance of the server.
    [Show full text]
  • Load Testing of Containerised Web Services
    UPTEC IT 16003 Examensarbete 30 hp Mars 2016 Load Testing of Containerised Web Services Christoffer Hamberg Abstract Load Testing of Containerised Web Services Christoffer Hamberg Teknisk- naturvetenskaplig fakultet UTH-enheten Load testing web services requires a great deal of environment configuration and setup. Besöksadress: This is especially apparent in an environment Ångströmlaboratoriet Lägerhyddsvägen 1 where virtualisation by containerisation is Hus 4, Plan 0 used with many moving and volatile parts. However, containerisation tools like Docker Postadress: offer several properties, such as; application Box 536 751 21 Uppsala image creation and distribution, network interconnectivity and application isolation that Telefon: could be used to support the load testing 018 – 471 30 03 process. Telefax: 018 – 471 30 00 In this thesis, a tool named Bencher, which goal is to aid the process of load testing Hemsida: containerised (with Docker) HTTP services, is http://www.teknat.uu.se/student designed and implemented. To reach its goal Bencher automates some of the tedious steps of load testing, including connecting and scaling containers, collecting system metrics and load testing results to name a few. Bencher’s usability is verified by testing a number of hypotheses formed around different architecture characteristics of web servers in the programming language Ruby. With a minimal environment setup cost and a rapid test iteration process, Bencher proved its usability by being successfully used to verify the hypotheses in this thesis. However, there is still need for future work and improvements, including for example functionality for measuring network bandwidth and latency, that could be added to enhance process even further. To conclude, Bencher fulfilled its goal and scope that were set for it in this thesis.
    [Show full text]
  • World Wide Web
    World Wide Web Introduction to networking Dr. Klára Pešková, [email protected] Department of Software and Computer Science Education 1 / 32 Feel free to contact me: ● Join a Zoom meeting during an officially scheduled lecture ● E-mail me Assignment You can find the “Simple Static Web” assignment in ReCodEx. Your goal is to create a simple static web, the specification of this assignment in ReCodEx is quite detailed. You will be rewarded by one point if your solution meets the required criteria. Except for ReCodEx, the assignment has to be approved by myself. Your web page has to look like a real web page, it is not enough to just stack the mandatory elements with no concept. You will find my comments to your solution directly in ReCodEx. If your solution is approved, I will reward you one extra point. You need to get 1+1 points to be able to take the “Introduction to networking” test. WWW – World Wide Web ● WWW is the most used Internet service ● Started as an experiment in CERN ● Last year’s description – Platform for information and data exchange – Environment for applications, that are accessible from anywhere ● This year – Social interactions – Shopping – Culture – Studying – Entertainment … Introduction to networking (2020) 2 / 32 World Wide Web and Internet are not the same. Web is one of the Internet’s (most used) services. Ancient history ● 1945 – Vannevar Bush – Human brain works with associations – hypothetical electromechanical device Memex – "enlarged intimate supplement to one's memory", bookmark list of static microfilm pages ● ‘60s – Theodore Nelson first used the word hyper-text, i.e.
    [Show full text]
  • GDPR Assessment Evidence of Compliance
    33 GDPR Assessment Evidence of Compliance Prepared for: CONFIDENTIALITY NOTE: The information contained in this report document My Client Company is for the exclusive use of the organisation specified above and may contain confidential, privileged and non-disclosable information. If the recipient of this Prepared by: report is not the organisation or addressee, such recipient is strictly prohibited from reading, photocopying, distributing or otherwise using this report or its YourIT Company contents in any way. Scan Date: 1/18/2018 1/18/2018 Evidence of Compliance GDPR ASSESSMENT Table of Contents 1 - APPLICABLE LAW 2 - DATA PROTECTION OFFICER 3 - REPRESENTATIVE OF CONTROLLER OR PROCESSORS NOT ESTABLISHED IN THE UNION 4 - PRINCIPLES RELATING TO PROCESSING OF PERSONAL DATA 5 - PERSONAL DATA 5.1 - AUTOMATED SCAN FOR PERSONAL DATA 6 - CHILD CONSENT 7 - SPECIAL CATEGORIES OF PERSONAL DATA 8 - PRIVACY POLICY REVIEW 9 - PROCESSOR OR SUB-PROCESSOR 10 - IMPLEMENTATION OF CONTROLS FROM ISO 27001 11 - INFORMATION SECURITY POLICIES 12 - ORGANISATION OF INFORMATION SECURITY 13 - USER ACCESS MANAGEMENT 13.1 - TERMINATED USERS 13.2 - INACTIVE USERS 13.3 - SECURITY GROUPS 13.4 - GENERIC ACCOUNTS 13.5 - PASSWORD MANAGEMENT 14 - PHYSICAL AND ENVIRONMENTAL SECURITY 14.1 - SCREEN LOCK SETTINGS 15 - OPERATIONS SECURITY 15.1 - APPLICATION LIST 15.2 - OUTBOUND WEB FILTERING 15.3 - ENDPOINT SECURITY 15.4 - CORPORATE BACKUP 15.5 - ENDPOINT BACKUP 15.6 - LOGGING AND MONITORING 15.7 - CLOCK SYNCHRONIZATION 15.8 - TECHNICAL VULNERABILITY MANAGEMENT 16 - COMMUNICATION SECURITY Page 2 of 80 Evidence of Compliance GDPR ASSESSMENT 16.1 - NETWORK CONTROLS 16.2 - SEGREGATION IN NETWORKS 17 - SYSTEM ACQUISITION 17.1 - EXTERNAL APPLICATION SECURITY Page 3 of 80 Evidence of Compliance GDPR ASSESSMENT 1 - APPLICABLE LAW ISO 27001 (18.1.1): Identification of applicable legislation and contractual requirements We have identified the following laws, regulations and standards as being applicable to our business.
    [Show full text]
  • HTML5 Unbound: a Security & Privacy Drama Mike Shema Qualys a Drama in Four Parts
    HTML5 Unbound: A Security & Privacy Drama Mike Shema Qualys A Drama in Four Parts • The Meaning & Mythology of HTML5 • Security From Design • Security (and Privacy) From HTML5 • Design, Doom & Destiny “This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introducedIs my Geocities to help Web application site secure? authors, new elements are introducedHTML4 basedWeb on research 2.0 into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effortHTML5 to improve interoperability.” 2 The Path to HTML5 3350 B.C. Cuneiform enables stone markup languages. “Cyberspace. A consensual hallucination...” July 1984 Neuromancer, p. 0x33. Dec 1990 CERN httpd starts serving HTML. Nov 1995 HTML 2.0 standardized in RFC 1866. Dec 1999 HTML 4.01 finalized. HTML5. Meaning vs. Mythology <!doctype html> Social [_______] Cross Origin [_______] as a Service Resource Sharing Web 2.0++ WebSocket API Flash, Silverlight Web Storage CSRF Web Workers Clickjacking “Default Secure” Takes Time Mar 2012 Nov 2009 Apr 2002 “Default Insecure” Is Enduring Dec 2005 Mar 2012 “Developer Insecure” Is Eternal • Advanced Persistent Ignorance JavaScript: Client(?!) Code • The global scope of superglobals • The prototypes of mass assignment • The eval() of SQL injection • The best way to create powerful browser apps • The main accomplice to HTML5 History of Web Designs • Cookies – Implementation by
    [Show full text]
  • The Computer) Or the Software (The Computer Application) That Helps to Deliver Content That Can Be Accessed Through the Internet
    Introduction to web server A web server can be referred to as either the hardware (the computer) or the software (the computer application) that helps to deliver content that can be accessed through the Internet. The most common use of Web servers is to host Web sites but there are other uses like data storage or for running enterprise applications. Overview The primary function of a web server is to deliver web pages on the request to clients. This means delivery of HTML documents and any additional content that may be included by a document, such as images, style sheets and JavaScripts. A client, commonly a web browser or web crawler, initiates communication by making a request for a specific resource using HTTP and the server responds with the content of that resource or an error message if unable to do so. The resource is typically a real file on the server's secondary memory, but this is not necessarily the case and depends on how the web server is implemented. While the primary function is to serve content, a full implementation of HTTP also includes ways of receiving content from clients. This feature is used for submitting web forms, including uploading of files. Many generic web servers also support server-side scripting, e.g., Apache HTTP Server and PHP. This means that the behaviour of the web server can be scripted in separate files, while the actual server software remains unchanged. Usually, this function is used to create HTML documents "on-the-fly" as opposed to returning fixed documents.
    [Show full text]
  • World Wide Web
    World Wide Web PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sun, 02 Dec 2012 15:32:53 UTC Contents Articles Main article 1 World Wide Web 1 History 13 History of the World Wide Web 13 How it works 20 Ajax 20 Push technology 23 WWW prefix in Web addresses 27 Pronunciation of "www" 27 Standards 29 Web standards 29 Accessibility 32 Web accessibility 32 Link rot and Web archival 40 Link rot 40 Web pages 44 Web page 44 References Article Sources and Contributors 49 Image Sources, Licenses and Contributors 52 Article Licenses License 53 1 Main article World Wide Web World Wide Web The Web's logo designed by Robert Cailliau [1] Inventor Tim Berners-Lee Company CERN Availability Worldwide The World Wide Web (abbreviated as WWW or W3,[2] commonly known as the Web), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks. Using concepts from his earlier hypertext systems like ENQUIRE, British engineer, computer scientist and at that time employee of CERN, Sir Tim Berners-Lee, now Director of the World Wide Web Consortium (W3C), wrote a proposal in March 1989 for what would eventually become the World Wide Web.[1] At CERN, a European research organisation near Geneva situated on Swiss and French soil,[3] Berners-Lee and Belgian computer scientist Robert Cailliau proposed in 1990 to use hypertext "to link and access information of various kinds as a web of nodes in which the user can browse at will",[4] and they publicly introduced the project in December of the same year.[5] World Wide Web 2 History In the May 1970 issue of Popular Science magazine, Arthur C.
    [Show full text]
  • Tim Berners-Lee
    TIM BERNERS-LEE Cuiabá Março 2020 Bruna Keiko Hatakeyama Carlos Daniel de Souza Silva Ivanildo Cesar Neves Leony Tamio Hatakeyama Victor Hugo Maranholi TIM BERNERS-LEE Trabalho solicitado pela professor Patricia Cristiane de Souza como forma de avaliação da nota semestral da disciplina História da Computação, do curso Sistemas de Informação, localização no Instituto de Computação da Universidade Federal de Mato Grosso – Campus Cuiabá. Cuiabá Março 2020 1. BIOGRAFIA Timothy John Berners-Lee (TimBL ou TBL) é um físico britânico, cientista da computação e professor do MIT (Massachusetts Institute of Technology) - criador da World Wide Web com proposta inicial para sua criação a 12 de março de 1989. Entre suas criações que lhe concedem os créditos pela invenção da web estão: o Identificador Uniforme de Recurso (URI, na sigla em inglês) que é uma cadeia de caracteres compacta usada para identificar ou denominar um recurso na Internet. E o HTTP- Hypertext Transfer Protocol (protocolo de comunicação) e definiu o idioma para páginas web - HTML (linguagem de marcação utilizada na construção de páginas na Web). Nascido em Londres, Inglaterra, no dia 8 de junho de 1955, filho de Conway Berners-Lee e Mary Lee Woods. Estudou na escola primária Sheen Mount e depois na Emanuel School em Londres (1969 a 1973). Depois estudou no The Queen's College, em Oxford (1973 a 1976), graduando-se em Física. No período de junho a dezembro de 1980, enquanto trabalhava como contratado independente no CERN (Organisation Européenne pour la Recherche Nucléaire), Berners- Lee propôs um projeto baseado no conceito de hipertexto para facilitar a partilha e atualização de informações entre os pesquisadores.
    [Show full text]
  • Reducing the Disk I/O of Web Proxy Server Caches
    THE ADVANCED COMPUTING SYSTEMS ASSOCIATION The following paper was originally published in the Proceedings of the 1999 USENIX Annual Technical Conference Monterey, California, USA, June 6–11, 1999 Reducing the Disk I/O of Web Proxy Server Caches Carlos Maltzahn and Kathy J. Richardson Compaq Computer Corporation Dirk Grunwald University of Colorado © 1999 by The USENIX Association All Rights Reserved Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Reducing the Disk I/O of Web Proxy Server Caches Carlos Maltzahn and Kathy J. Richardson Compaq Computer Corporation Network Systems Laboratory Palo Alto, CA [email protected], [email protected] Dirk Grunwald University of Colorado Department of Computer Science Boulder, CO [email protected] Abstract between the corporate network and the Internet. The lat- ter two are commonly accomplished by caching objects on local disks. The dramatic increase of HTTP traffic on the Internet has resulted in wide-spread use of large caching proxy Apart from network latencies the bottleneck of Web servers as critical Internet infrastructure components. cache performance is disk I/O [2, 27]. An easy but ex- With continued growth the demand for larger caches and pensive solution would be to just keep the entire cache in higher performance proxies grows as well.
    [Show full text]
  • On Applying Controlled Natural Languages for Ontology Authoring and Semantic Annotation
    Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title On Applying Controlled Natural Languages for Ontology Authoring and Semantic Annotation Author(s) Davis, Brian Patrick Publication Date 2013-02-07 Item record http://hdl.handle.net/10379/4176 Downloaded 2021-09-25T01:14:11Z Some rights reserved. For more information, please see the item record link above. On Applying Controlled Natural Languages for Ontology Authoring and Semantic Annotation Brian Patrick Davis Submitted in fulfillment of the requirements for the degree of Doctor of Philosophy PRIMARY SUPERVISOR: SECONDARY SUPERVISOR: Prof. Dr. Siegfried Handschuh Professor Hamish Cunningham National University of Ireland Galway University of Sheffield INTERNAL EXAMINER: Prof. Dr. Manfred Hauswirth National University of Ireland Galway EXTERNAL EXAMINER: EXTERNAL EXAMINER: Professor Chris Mellish Professor Laurette Pretorius University of Aberdeen University of South Africa Digital Enterprise Research Institute, National University of Ireland Galway. February 2013 i Declaration I declare that the work covered by this thesis is composed by myself, and that it has not been submitted for any other degree or professional qualification except as specified. Brian Davis The research contributions reported by this thesis was supported(in part) by the Lion project supported by Science Foundation Ireland under Grant No. SFI/02/CE1/I131 and (in part) by the European project NEPOMUK No FP6-027705. ii iii Abstract Creating formal data is a high initial barrier for small organisations and individuals wishing to create ontologies and thus benefit from semantic technologies. Part of the so- lution comes from ontology authoring, but this often requires specialist skills in ontology engineering.
    [Show full text]