Reorganisation of Adaptive Websites Using Web Usage Mining Techniques Dr

International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 Reorganisation of Adaptive Websites using Web Usage Mining Techniques Dr. Ananthi Sheshasaayee#1, V.Vidyapriya#2 #1Head and Associate Professor, Department of Computer Science, Quaid-E-Millath Government College for Women (A), Chennai #2.Research Scholar, Department of Computer Science, Quaid-E-Millath Government College for Women (A), Chennai Abstract— Web Usage Mining is that area of Web Mining In practice, the three Web mining tasks above could which deals with the extraction of interesting knowledge be used in isolation or combined in an application, from logging information produced by Web servers. The especially in Web content and structure mining since motive of mining is to find users’ access models the Web documents might also contain links. Web automatically and quickly from the vast Web log data, such content mining is the process of extracting knowledge as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, from the content of documents or their descriptions. registration information and other relative information left Web document text mining, resource discovery based by user access can be mined with the user access mode on concepts indexing or agent; based technology may which will provide foundation for decision making of also fall in this category. Web structure mining is the organizations. Adaptive web sites are web sites that process of inferring knowledge from the World Wide automatically improve their organization and presentation Web organization and links between references and by learning from their user access patterns. User interaction referents in the Web. Finally, web usage mining, also patterns may be collected directly on the website or may be known as Web Log Mining, is the process of mined from Web server logs. Through this paper we present extracting interesting patterns in web access logs [2]. the various web usage mining techniques to extract the useful and relevant information on the web for adaptive web sites. Keywords - Web mining, Web usage mining, Web Log, Adaptive web site, Web site reorganisation. I. INTRODUCTION Web mining is a very interesting research topic which combines of the activated research areas: Data Mining and World Wide Web. With the huge Figure 1: Classification of Web mining amount of information available online, the World Wide Web is a fertile area for data mining research. II. ADAPTIVE WEB SITES The Web mining research relates to several research An adaptive website adjusts the structure, content, communities, such as database, information retrieval, or presentation of information in response to measured and AI. The World Wide Web is a popular and user interaction with the site, with the objective of interactive medium to disseminate information today. optimizing future user interactions. Adaptive websites The Web is huge, diverse, and dynamic and thus raises are web sites that automatically improve their the scalability, multimedia data, and temporal issues organization and presentation by learning from their respectively. user access patterns. User interaction patterns may be Web data mining can be defined as the discovery collected directly on the website or may be mined and analysis of useful information from the web log from Web server logs. A model or models are created file. Although Web mining puts down the roots deeply of user interaction using artificial intelligence and in data mining, it is not equivalent to data mining. The statistical methods. The models are used as the basis unstructured feature of Web data triggers more for tailoring the website for known and specific complexity in the process of Web mining. An patterns of user interaction. The adaptive web site is exponential growth in on-line information combined concerned with mining the log file of a Web site for with the almost unstructured web data necessitates the knowledge about the Web site and its users, and using development of powerful yet computationally efficient the knowledge to assist users to navigate and search web data mining tools. Web mining is the use of data the Web site effectively and efficiently. mining techniques to automatically discover and extract information from Web documents and services III. ADAPTIVE WEB SITES: USAGE MINING [1]. Web mining can be classified into three areas of Web usage mining focuses on techniques that could interest based on which part of predict user behaviour during the interaction of the the Web to mine: Web content mining, Web structure user with the web. mining, and Web usage mining as shown in Figure 1. A. Concept of web usage mining ISSN: 2249-2593 http://www.ijcotjournal.org Page 53 International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 Discovery of meaningful patterns from data IV. APPROACHES IN WEB USAGE MINING generated by client-server transactions on one or more The web usage mining generally includes the Web servers includes following sources of data: following several steps: data collection, data pre- Automatically generated data stored in server treatment or data pre-processing and knowledge access logs, referrer logs, agent logs, and discovery and pattern analysis [2]. Client-side cookies. A. Data Collection E-commerce and product-oriented user Data collection is the first step of web usage events. mining, the data authenticity and internality will User profiles and/or user ratings. directly affect the following works smoothly Meta-data, page attributs page content, site carrying on and the final recommendation of structure. characteristic service’s quality. Therefore it must use scientific, reasonable Web usage mining focuses on techniques that could and advanced technology to gather various data. predict user behaviour while the user interacts with the At present, towards web usage mining technology, Web. The mined data in this category are the the main data origin has three kinds: server data, secondary data on the Web as the result of client data and middle data. interactions. These data could range very widely but B. Data Pre-processing generally we could classify them into the usage data Some databases are insufficient, inconsistent that reside in the Web clients, proxy servers and and including noise. The data pre-treatment is to servers. The Web usage mining process can be carry on an unification transformation to those regarded as a three-phase process (as shown in Figure databases. The result is that the database will to 2), consisting of the data preparation or pre-processing, become integrate and consistent, thus establish the pattern discovery and pattern analysis phases. database which may mine. In the data pre- treatment work, mainly include data cleaning, user identification, session identification and path Web usage mining (Log Data) completion as shown in Figure 4. Server Session Pre- Pattern Discovery Pattern Path File User/Session Page View processing Analysis Raw Data Completion Indentification Indentification Usage Data Cleaning Figure 2: Phases of Web usage mining In the first phase, Web log data are pre-processed in Episode Indentification order to identify users, sessions, page views, and so on. Usage In the second phase, statistical methods, as well as Site Structure Statistics data mining methods (such as association rules, & Content sequential pattern discovery, clustering, and classification) are applied in order to detect interesting Episode File patterns. These patterns are stored so that they can be further analysed in the third phase of the Web usage Figure 4: Pre-processing of Web Usage Data mining process. B. Web Log Format 1) Data Cleaning: The purpose of data cleaning A web server log file contains requests made to the is to eliminate irrelevant items, and these web server, recorded in chronological order. The most kinds of techniques are of importance for any popular log file formats are the Common Log Format type of web log analysis not only data mining. (CLF) and the extended CLF. A common log format According to the purposes of different mining file is created by the web server to keep track of the applications, irrelevant records in web access requests that occur on a web site. A standard log file log will be eliminated during data cleaning. has the following format as shown in Figure 3. Since the target of Web Usage Mining is to get the user’s travel patterns, following two kinds of records are unnecessary and should be removed: The records of graphics, videos and the format information the records have filename suffixes Figure 3: Common Web Log Format ISSN: 2249-2593 http://www.ijcotjournal.org Page 54 International Journal of Computer & Organization Trends –Volume 4 Issue 3 May to June 2014 of GIF, JPEG, CSS, and so on, which can caching or the proxy servers caching without found in the URI field of the every record. leaving any record in server’s access log. The records with the failed HTTP status code. By examining the Status field of every record As a result, the user access paths are incompletely in the web access log, the records with status preserved in the web access log. To discover user’s codes over 299 or fewer than 200 are removed. travel pattern, the missing pages in the user access 2) User and Session Identification: The task of path should be appended. The purpose of the path user and session identification is find out the completion is to accomplish this task. The better different user sessions from the original web results of data pre-processing, we will improve the access log. User’s identification is, to identify mined patterns' quality and save algorithm's running who access web site and which pages are time. It is especially important to web log files, in accessed. The goal of session identification is respect that the structure of web log files are not the to divide the page accesses of each user at a same as the data in database or data warehouse.

Load more