Reorganisation of Adaptive Websites Using Web Usage Mining Techniques Dr

Total Page:16

File Type:pdf, Size:1020Kb

Reorganisation of Adaptive Websites Using Web Usage Mining Techniques Dr International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 Reorganisation of Adaptive Websites using Web Usage Mining Techniques Dr. Ananthi Sheshasaayee#1, V.Vidyapriya#2 #1Head and Associate Professor, Department of Computer Science, Quaid-E-Millath Government College for Women (A), Chennai #2.Research Scholar, Department of Computer Science, Quaid-E-Millath Government College for Women (A), Chennai Abstract— Web Usage Mining is that area of Web Mining In practice, the three Web mining tasks above could which deals with the extraction of interesting knowledge be used in isolation or combined in an application, from logging information produced by Web servers. The especially in Web content and structure mining since motive of mining is to find users’ access models the Web documents might also contain links. Web automatically and quickly from the vast Web log data, such content mining is the process of extracting knowledge as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, from the content of documents or their descriptions. registration information and other relative information left Web document text mining, resource discovery based by user access can be mined with the user access mode on concepts indexing or agent; based technology may which will provide foundation for decision making of also fall in this category. Web structure mining is the organizations. Adaptive web sites are web sites that process of inferring knowledge from the World Wide automatically improve their organization and presentation Web organization and links between references and by learning from their user access patterns. User interaction referents in the Web. Finally, web usage mining, also patterns may be collected directly on the website or may be known as Web Log Mining, is the process of mined from Web server logs. Through this paper we present extracting interesting patterns in web access logs [2]. the various web usage mining techniques to extract the useful and relevant information on the web for adaptive web sites. Keywords - Web mining, Web usage mining, Web Log, Adaptive web site, Web site reorganisation. I. INTRODUCTION Web mining is a very interesting research topic which combines of the activated research areas: Data Mining and World Wide Web. With the huge Figure 1: Classification of Web mining amount of information available online, the World Wide Web is a fertile area for data mining research. II. ADAPTIVE WEB SITES The Web mining research relates to several research An adaptive website adjusts the structure, content, communities, such as database, information retrieval, or presentation of information in response to measured and AI. The World Wide Web is a popular and user interaction with the site, with the objective of interactive medium to disseminate information today. optimizing future user interactions. Adaptive websites The Web is huge, diverse, and dynamic and thus raises are web sites that automatically improve their the scalability, multimedia data, and temporal issues organization and presentation by learning from their respectively. user access patterns. User interaction patterns may be Web data mining can be defined as the discovery collected directly on the website or may be mined and analysis of useful information from the web log from Web server logs. A model or models are created file. Although Web mining puts down the roots deeply of user interaction using artificial intelligence and in data mining, it is not equivalent to data mining. The statistical methods. The models are used as the basis unstructured feature of Web data triggers more for tailoring the website for known and specific complexity in the process of Web mining. An patterns of user interaction. The adaptive web site is exponential growth in on-line information combined concerned with mining the log file of a Web site for with the almost unstructured web data necessitates the knowledge about the Web site and its users, and using development of powerful yet computationally efficient the knowledge to assist users to navigate and search web data mining tools. Web mining is the use of data the Web site effectively and efficiently. mining techniques to automatically discover and extract information from Web documents and services III. ADAPTIVE WEB SITES: USAGE MINING [1]. Web mining can be classified into three areas of Web usage mining focuses on techniques that could interest based on which part of predict user behaviour during the interaction of the the Web to mine: Web content mining, Web structure user with the web. mining, and Web usage mining as shown in Figure 1. A. Concept of web usage mining ISSN: 2249-2593 http://www.ijcotjournal.org Page 53 International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 Discovery of meaningful patterns from data IV. APPROACHES IN WEB USAGE MINING generated by client-server transactions on one or more The web usage mining generally includes the Web servers includes following sources of data: following several steps: data collection, data pre- Automatically generated data stored in server treatment or data pre-processing and knowledge access logs, referrer logs, agent logs, and discovery and pattern analysis [2]. Client-side cookies. A. Data Collection E-commerce and product-oriented user Data collection is the first step of web usage events. mining, the data authenticity and internality will User profiles and/or user ratings. directly affect the following works smoothly Meta-data, page attributs page content, site carrying on and the final recommendation of structure. characteristic service’s quality. Therefore it must use scientific, reasonable Web usage mining focuses on techniques that could and advanced technology to gather various data. predict user behaviour while the user interacts with the At present, towards web usage mining technology, Web. The mined data in this category are the the main data origin has three kinds: server data, secondary data on the Web as the result of client data and middle data. interactions. These data could range very widely but B. Data Pre-processing generally we could classify them into the usage data Some databases are insufficient, inconsistent that reside in the Web clients, proxy servers and and including noise. The data pre-treatment is to servers. The Web usage mining process can be carry on an unification transformation to those regarded as a three-phase process (as shown in Figure databases. The result is that the database will to 2), consisting of the data preparation or pre-processing, become integrate and consistent, thus establish the pattern discovery and pattern analysis phases. database which may mine. In the data pre- treatment work, mainly include data cleaning, user identification, session identification and path Web usage mining (Log Data) completion as shown in Figure 4. Server Session Pre- Pattern Discovery Pattern Path File User/Session Page View processing Analysis Raw Data Completion Indentification Indentification Usage Data Cleaning Figure 2: Phases of Web usage mining In the first phase, Web log data are pre-processed in Episode Indentification order to identify users, sessions, page views, and so on. Usage In the second phase, statistical methods, as well as Site Structure Statistics data mining methods (such as association rules, & Content sequential pattern discovery, clustering, and classification) are applied in order to detect interesting Episode File patterns. These patterns are stored so that they can be further analysed in the third phase of the Web usage Figure 4: Pre-processing of Web Usage Data mining process. B. Web Log Format 1) Data Cleaning: The purpose of data cleaning A web server log file contains requests made to the is to eliminate irrelevant items, and these web server, recorded in chronological order. The most kinds of techniques are of importance for any popular log file formats are the Common Log Format type of web log analysis not only data mining. (CLF) and the extended CLF. A common log format According to the purposes of different mining file is created by the web server to keep track of the applications, irrelevant records in web access requests that occur on a web site. A standard log file log will be eliminated during data cleaning. has the following format as shown in Figure 3. Since the target of Web Usage Mining is to get the user’s travel patterns, following two kinds of records are unnecessary and should be removed: The records of graphics, videos and the format information the records have filename suffixes Figure 3: Common Web Log Format ISSN: 2249-2593 http://www.ijcotjournal.org Page 54 International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 of GIF, JPEG, CSS, and so on, which can caching or the proxy servers caching without found in the URI field of the every record. leaving any record in server’s access log. The records with the failed HTTP status code. By examining the Status field of every record As a result, the user access paths are incompletely in the web access log, the records with status preserved in the web access log. To discover user’s codes over 299 or fewer than 200 are removed. travel pattern, the missing pages in the user access 2) User and Session Identification: The task of path should be appended. The purpose of the path user and session identification is find out the completion is to accomplish this task. The better different user sessions from the original web results of data pre-processing, we will improve the access log. User’s identification is, to identify mined patterns' quality and save algorithm's running who access web site and which pages are time. It is especially important to web log files, in accessed. The goal of session identification is respect that the structure of web log files are not the to divide the page accesses of each user at a same as the data in database or data warehouse.
Recommended publications
  • Intelligent Web Caching Using Adaptive Regression Trees, Splines, Random Forests and Tree Net
    Intelligent Web Caching Using Adaptive Regression Trees, Splines, Random Forests and Tree Net Sarina Sulaiman, Siti Mariyam Shamsuddin Ajith Abraham Soft Computing Research Group Machine Intelligence Research Labs (MIR Labs) Faculty of Computer Science and Information Systems Washington 98092, USA Universiti Teknologi Malaysia, Johor, Malaysia http://www.mirlabs.org [email protected], [email protected] [email protected] Abstract— Web caching is a technology for improving network sometimes httpd accelerator, to replicate the fact that it caches traffic on the internet. It is a temporary storage of Web objects objects for many clients but normally from one server [1]. This (such as HTML documents) for later retrieval. There are three paper investigates the performance of Classification and significant advantages to Web caching; reduced bandwidth Regression Trees (CART), Multivariate Adaptive Regression consumption, reduced server load, and reduced latency. These Splines (MARS), Random Forest (RF) and TreeNet (TN) for rewards have made the Web less expensive with better classification in Web caching. performance. The aim of this research is to introduce advanced machine learning approaches for Web caching to decide either to The rest of the paper is organised as follows: Section 2 cache or not to the cache server, which could be modelled as a describes the related works, followed by introduction about the classification problem. The challenges include identifying attributes machine learning approaches in Section 3. Section 4 on ranking and significant improvements in the classification experimental setup and Section 5 illustrates the performance accuracy. Four methods are employed in this research; evaluation of the proposed approaches. Section 6 discusses the Classification and Regression Trees (CART), Multivariate Adaptive result from the experiment and finally, Section 7 concludes the Regression Splines (MARS), Random Forest (RF) and TreeNet article.
    [Show full text]
  • A Study of Design Aspects of Web Personalization for Online Users in India
    A Study of Design Aspects of Web Personalization for Online Users in India A Thesis submitted to Gujarat Technological University for the Award of Doctor of Philosophy in Management By Darshana Desai Enrollment No: 129990992004 Under supervision of Prof. Dr. Satendra Kumar GUJARAT TECHNOLOGICAL UNIVERSITY AHMEDABAD August - 2017 A Study of Design Aspects of Web Personalization for Online Users in India A Thesis submitted to Gujarat Technological University for the Award of Doctor of Philosophy in Management By Darshana Desai Enrollment No: 129990992004 under supervision of Prof. Dr. Satendra Kumar GUJARAT TECHNOLOGICAL UNIVERSITY AHMEDABAD August – 2017 © DARSHANA DESAI DECLARATION I declare that the thesis entitled A study of design aspects of web personalization for online users in India submitted by me for the degree of Doctor of Philosophy is the record of research work carried out by me during the period from October 2012 to December 2016 under the supervision of Prof. Dr. Satendra Kumar Professor and Research Head at C. K Shah Vijapurwala Institute of Management & Research(CKSVIM), Vadodara, Gujarat and this has not formed the basis for the award of any degree, diploma, associateship, fellowship, titles in this or any other University or other institution of higher learning. I further declare that the material obtained from other sources has been duly acknowledged in the thesis. I shall be solely responsible for any plagiarism or other irregularities, if noticed in the thesis. Signature of the Research Scholar: ………………………….. Date: …………………. Name of Research Scholar: Darshana Desai Place: Vadodara iv. CERTIFICATE I certify that the work incorporated in the thesis entitled A study of design aspects of web personalization for online users in India submitted by Shri / Smt.
    [Show full text]
  • A Data Warehouse-Based System for Adaptive Website Recommendations
    AWESOME – A Data Warehouse-based System for Adaptive Website Recommendations Andreas Thor Erhard Rahm University of Leipzig, Germany {thor, rahm}@informatik.uni-leipzig.de Abstract There are many ways to automatically generate rec- ommendations taking into account different types of in- Recommendations are crucial for the success of formation (e.g. product characteristics, user characteris- large websites. While there are many ways to de- tics, or buying history) and applying different statistical or termine recommendations, the relative quality of data mining approaches ([JKR02], [KDA02]). Sample these recommenders depends on many factors approaches include recommendations of top-selling prod- and is largely unknown. We propose a new clas- ucts (overall or per product category), new products, simi- sification of recommenders and comparatively lar products, products bought together by customers, evaluate their relative quality for a sample web- products viewed together in the same web session, or site. The evaluation is performed with products bought by similar customers. Obviously, the AWESOME (Adaptive website recommenda- relative utility of these recommendation approaches (rec- tions), a new data warehouse-based recommen- ommenders for short) depends on the website, its users dation system capturing and evaluating user and other factors so that there cannot be a single best ap- feedback on presented recommendations. More- proach. Website developers thus have to decide about over, we show how AWESOME performs an which approaches they should support and where and automatic and adaptive closed-loop website op- when they should be applied. Surprisingly, little informa- timization by dynamically selecting the most tion is available in the open literature on the relative qual- promising recommenders based on continuously ity of different recommenders.
    [Show full text]
  • Applications of Machine Learning in Software Engineering
    APPLICATIONS OF MACHINE LEARNING IN SOFTWARE ENGINEERING Manthan Patel1, Vivek Kumar Prasad2 1Student, Department of Computer Engineering, Nirma University, (India) 2Professor, Department of Computer Engineering, Nirma University, (India) ABSTRACT Machine Learning, the branch of Artificial Intelligence, is an ability for machine to learn, unlearn and relearn things from human activities as well as environment and gives more relevant result or drives towards it. It is more feasible as well as cost effective because it nullifies the manual efforts. It became a well-known topic due to its usages and therefore many people are trying to learn it. This paper mainly focuses on a brief about the different types of applications in software engineering and challenges for development. Keywords: Applications, Challenges, Machine learning I. INTRODUCTION The goal of learning for anything or anyone is to solve the problem from previous records or experiences. Learning for machines is nothing but to become more familiar with the thing in such a way that can help in many ways like weather prediction, recommendation based on the taste, deciding route in traffic, diagnosing samples with most accurate output, etc. Mainly a machine learning does the statistical analysis of data and extract and use most relevant data for solving the current situation or for prediction of future events. If we consider an example of data analysis from a warehouse which produce goods, the machine will take raw data as input and manipulate them such that the data will help the owner to understand as well as predict about profit making products among them so that the overall lose can be reduced.
    [Show full text]
  • Maquetación 1
    DIGIBÍS ® Newsletter. No. 19. January-June, 2018 Information about enriched digitization, software for Libraries, Archives and Museums and international standards . Archivo Carmen Martín Gaite Con el módulo de descripciones archivísticas de DIGIBIB Adapted to all devices The Municipal Archive of Castellón SUMMARY Editorial 3 INTERNATIONAL DIGICLIC DIGIBÍ S ® Newsletter Europeana CEO Tachi Hernando de Larramendi 3rd International EuropeanaTech Conference 4 Project Manager Europeana Business Plan 2018: Xavier Agenjo Bullón CFO democratizing culture 6 Nuria Ruano Penas IT Manager Jesús L. Domínguez Muriel Art Director EVENTS Antonio Otiñano Martínez Sales Manager Courses Javier Mas García Technology Coordinator Course-workshop of the Ignacio Francisca Hernández Carrascal Larramendi Foundation in the AEF 7 Administration Department María Luz Ruiz Rodríguez (coord.) José María Alcega Barroeta IT Department Feli Matarranz de Antonio (coord.) GLAM APPLICATIONS Alejandra Arri Pacheco Andrés Felipe Botero Zapata Julio Diago García Virtual Libraries Carlos Henche Hernández Luis Panadero Guardeño Linked data in the Fernando Román Ortega Innovation Department Virtual Library of Castilla y León 8 Paulo César Juanes Hernández (coord.) Noemí Barbero Urbano Virtual archive of the María Isabel Campillejo Suárez Susana Hernández Rubio municipality of Castellón 10 Montserrat Martínez Guerra Digitization Department The Polymath Virtual Library Francisco Viso Parra (coord.) María José Escuté Serrano present in Wikidata 12 Amando Martínez Catalán Javier Ramos
    [Show full text]