Journey from Data Mining to Web Mining to Big Data
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Computer Trends and Technology (IJCTT) – volume 10 number 1 – Apr 2014 Journey from Data Mining to Web Mining to Big Data Richa Gupta Department of Computer Science University of Delhi ABSTRACT : This paper describes the journey Based on the type of patterns to be mined, data of big data starting from data mining to web mining tasks can be classified into summarization, mining to big data. It discusses each of this method classification, clustering, association and trends in brief and also provides their applications. It analysis [1]. states the importance of mining big data today using fast and novel approaches. Summarization is the abstraction or generalization of data. Data is summarized and abstracted to give Keywords- data mining, web mining, big data a smaller set which provides overview of data and some useful information [1]. 1 INTRODUCTION Classification is the method of classifying objects Data is the collection of values and variables into certain groups based on their attributes. Certain related in certain sense and differing in some other classes are made by analyzing the relationship sense. The size of data has always been increasing. between the attributes and classes of objects in the training set [1]. Storing this data without using it in any sense is simply waste of storage space and storing time. Association is the discovery of togetherness or Data should be processed to extract some useful connection of objects. The association is based on knowledge from it. certain rules, known as association rules. These rules reveal the associative relationship among objects, that is, they find the correlation in a set of 2 DATA MINING objects [1]. Data Mining is analysing the data from different perspectives and summarizing it into useful Clustering is the identification of clusters or information that can be used for business solutions groups for a set of objects whose classes are and predicting the future trends. Mining the unknown. Clustering should be done such that the information helps organizations make proactive, similarities between objects of same clusters are knowledge-driven decisions and answer questions maximized and similarities between different that were previously time consuming to resolve. clusters are minimized. Data mining (DM), also called Knowledge- Trend analysis is discovering of interesting Discovery in Databases (KDD) or Knowledge- patterns in the dimension of time. It is the matching Discovery and Data Mining, is the process of of objects’ changing trends such as increasing automatically searching large volumes of data for streaks [1]. patterns such as association rules. It is a fairly recent topic in computer science but applies many Data Mining tools can be classified into three older computational techniques from statistics, categories: traditional data mining tools, information retrieval, machine learning and pattern dashboards and text-mining tools. recognition. Traditional Data Mining Tools helps companies establish data patterns and Data mining is important as the particular user will trends using complex algorithms and be looking for pattern and not for complete data in techniques usually situated on a single the database, it is better to read wanted data than computer. The majority are available in unwanted data. Data mining extract only required Windows and Unix versions. They patterns from the database in a short time span normally handle data using offline tools [2]. ISSN: 2231-2803 http://www.ijcttjournal.org Page18 International Journal of Computer Trends and Technology (IJCTT) – volume 10 number 1 – Apr 2014 Dashboards reflect data changes enabling the difference between different websites the user to see how the business is [4]. performing. Web usage mining – it attempts to Text mining tools mines data from discover useful knowledge from the data different kinds of texts, example, obtained from web user sessions. It tries Microsoft word, acrobat PDF, text files. to find usage patterns from the web data These tools scan the content and convert to understand and better serve the needs the selected data into format that is of Web-based applications. Some compatible with tool’s database [2]. applications of web usage mining are adaptive websites, web personalization 2.1 APPLICATIONS OF DATA and recommendation, business MINING intelligence. Artificial neural networks. 3.1 APPLICATIONS OF WEB MINING Business applications. In this it is used for It has its great use in e-commerce and e- database marketing, retail data analysis, services stock selection, credit approval etc. In e-learning Science applications. It is used in Self-organizing websites astronomy, molecular biology, medicine, Digital libraries geology etc. E-government Health care management Security and crime investigation Tax fraud detection 4 BIG DATA 3 WEB MINING There has been a lot of growth in the amount of As the usage of web started to increase, so does the data generated by web these days. The data has demand of data mining. Web mining is the been so large that it becomes difficult to analyse it application of data mining techniques to discover with the help of our traditional mining methods. usage patterns from large Web repositories. It Big data term has been coined for data that exceeds reveals interesting and unknown knowledge about the processing capability [6]. It has three main key both users and websites which can be used for characteristics analysis. It is used to understand customer behaviour, evaluate the effectiveness of a particular Volume – the size of data is now larger website and help quantify the success of a than terabytes and petabytes. This large marketing campaign [3, 4]. Web mining can be scale makes it difficult to analyse using classified into three types based on the type of data: conventional methods. Velocity – big data should be used to mine Web content mining – it is the process of large amount of data within a pre-defined extracting useful information and period of time. The traditional methods of knowledge from the web mining may take huge time to mine such a contents/data/documents. Content may volume of data. consist of text, images, audio, video or Variety – big data comes from various structured records such as lists and tables. sources. It is designed to handle Web content mining is differentiated from structured, semi-structured as well as two different points of view: Information unstructured data. Whereas the traditional Retrieval View and Database View [5]. methods were designed to handle Web structure mining – it is the process structured data and that too not of such of using graph theory to analyse the node large volume. and connection structure of a website. It tries to discover the underlying link Big data is a general term for massive amount of structures of the web. It can be used to digital data being collected from various sources, generate information on the similarity or that are too large and raw in form. Big data deals with new challenges like complexity, security, risks ISSN: 2231-2803 http://www.ijcttjournal.org Page19 International Journal of Computer Trends and Technology (IJCTT) – volume 10 number 1 – Apr 2014 to privacy. Big data is redefining the data faster methods of mining data which uses the management from extraction, transformation and parallel computing capability of processors. This processing to cleaning and reducing [7]. term is known as Big data. We have also provided with the applications of different methods of 4.1 APPLICATIONS OF BIG DATA mining. In social networking sites to find for usage patterns In google search REFERENCES Astronomy [1] http://academic.csuohio.edu/fuy/Pub/pot97.pdf Sensor networks [2]http://www.theiia.org/intAuditor/itaudit/archives/2006/august Government data /data-mining-101-tools-and-techniques/ Web logs [3] Bodyan G.C, Shestakov T.V, “Web Mining in Technology Mobile phones Management”, Engineering Universe for Scientific Research Natural disaster and resource management and Management, Vol 1 Issue 2, April 2009 Scientific research [4] http://en.wikipedia.org/wiki/Web_mining [5] Jaideep Srivastava, P. Desikan, Vipin Kumar, 5 CONCLUSION http://dmr.cs.umn.edu/Papers/P2004_4.pdf In this paper, we have reviewed the journey on how [6] Richa Gupta, Sunny Gupta, Anuradha Singhal, "Big Data: on big data evolved. It defines the traditional Overview", International Journal of Computer Trends and mining methods as data mining, then with the Technology (IJCTT), Vol 9, No.5, March 2014 advancement of web, came the concept of web mining. And later on, the size and variety of data [7] Gang-Hoon Kim, Silvana Trimi, Ji-Hyong Chung, "Big-Data Applications in the Government Sector", Communications of the pushed us to think ahead and develop new and ACM, Vol. 57, No.3, Pages 78-75 ISSN: 2231-2803 http://www.ijcttjournal.org Page20 .