September 2018
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Management, IT & Engineering Vol. 8 Issue 9(1), September 2018, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: http://www.ijmra.us, Email: [email protected] Double-Blind Peer Reviewed Refereed Open Access International Journal - Included in the International Serial Directories Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., Open J-Gage as well as in Cabell’s Directories of Publishing Opportunities, U.S.A Data Mining: Architecture, Techniques and its uses ANJU DEVI Abstract Data Mining refers to extracting useful information or knowledge from large amount of data. It is beneficial in every field like business, medicine, education sectors, health care, industrial, management, engineering, web data, banking, customer relationship management, fraud detection etc. It is also known as knowledge discovery process. Data mining is an integral part of Knowledge Discovery in Database (KDD). In this paper different types, architecture of data mining are describe in details with the help of block diagram. Its techniques also define which are summarization, classification, association rules, prediction, clustering and regression etc. Keywords: Data mining, Architecture, Aspects, Techniques and uses Introduction of Data Mining Data mining is a field of research which are very popular today. A large amount of data is available in every field of life such as: banking, medicine, insurance, education sectors etc. due to advances of digitization techniques. Data mining is a process of selecting interested pattern to from a large amount of data or information. Data Mining is similar to Data science. It is an interdisciplinary turf about scientific methods, process, and system to take out knowledge from data in various forms. Data mining is defined as a set of techniques for automatically analyzing interesting data in the data. Many organizations now have huge amounts of data stored in databases that needs to be analyzed. Traditionally, data has been analyzed by hand to discover interesting knowledge. In general, data mining techniques are designed understand the past or predict the future. Data mining techniques are used to take decisions based on facts rather than instinct. There are various types of techniques are used classification, association rules, prediction, clustering and regression etc. In this paper data mining is introduce in details. Aspects of Data Mining The sequences of steps identified in extracting knowledge from data are: Data mining is just one part of the process of discovering useful knowledge from data, referred to collectively as Knowledge Discovery in Databases (KDD). In brief, KDD outlines a process of knowledge discovery as a series of iterative steps: Department of Computer Science and IT, GNG College, Santpura, Yamuna Nagar 1 International Journal of Management, IT & Engineering http://www.ijmra.us, Email: [email protected] International Journal of Management, IT & Engineering Vol. 8 Issue 9(1), September 2018, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: http://www.ijmra.us, Email: [email protected] Double-Blind Peer Reviewed Refereed Open Access International Journal - Included in the International Serial Directories Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., Open J-Gage as well as in Cabell’s Directories of Publishing Opportunities, U.S.A 1) Data Cleaning 2) Data Integration 3) Data selection, 4) Data transformation, 5) Data Mining, 6) Pattern Evaluation, and 7) Knowledge 1. Data Cleaning Basically, we collect the different types of data which includes errors, noise, value missing etc. So, we need to clean up the data. In data cleaning step, noisy and inconsistent data is removed. 2. Data Integration In this step integrate or combines the different sources. 3. Data Selection In this step, data relevant to the analysis task are retrieved from the database. In the first step we are unable to collect the data. For this, we select only those data which we think useful for data mining. 4. Data Transformation Basically, the data even after cleaning is not ready for mining. In this step, data is transformed into appropriate forms for mining. Thus, the techniques used in data transformation are: aggregation, normalization etc. 5. Data Mining After transformation the data mining steps has come. Now we can apply data mining techniques on the data. It takes only the interesting patterns. Data mining uses the techniques clustering and association analysis. 6. Pattern Evaluation Generally in this step, data patterns are evaluated. 7. Knowledge Presentation As this step is beneficial to us. Generally, in this step, knowledge is represented. Data Mining Techniques Data mining is the combination of two tasks predictive and descriptive. In predictive task includes the classifications, prediction and time series analysis and on the other hand descriptive task includes: association, clustering and summarization. A medical practitioner annoying to detect a disease based on the health check test results of a patient can be considered as a analytical data mining task. Descriptive data mining tasks usually finds data relating patterns and comes up with fresh, important information from the accessible data set. 2 International Journal of Management, IT & Engineering http://www.ijmra.us, Email: [email protected] International Journal of Management, IT & Engineering Vol. 8 Issue 9(1), September 2018, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: http://www.ijmra.us, Email: [email protected] Double-Blind Peer Reviewed Refereed Open Access International Journal - Included in the International Serial Directories Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., Open J-Gage as well as in Cabell’s Directories of Publishing Opportunities, U.S.A (I) Classification Classification is a derivation of a model to determine the class of an object based on its attributes. A collection of records will be available, each record with a set of attributes. A classification model has relationships between attributes and objects of the class. Additionally, you can use classification as a feeder to, or the result of, other techniques. (II) Prediction Prediction derives the relationship between a thing you know and a thing you need to predict for future reference. Prediction is nothing but finding out the knowledge or some pattern from the large amounts of data. (III) Time - Series Analysis Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series is important because it imparts knowledge about what has taken place in the past and what will take place in time to come. Examples of time series are heights of ocean tides etc. Time series are drawn by line charts. Time series are used in signal processing, pattern recognition, weather forecasting, earthquake prediction, control engineering and largely in any domain of applied science and engineering which involves temporal measurements. Stock market prediction is an important application of time- series analysis. (IV) Association Association discovers the connection among a set of items. Association identifies the relationships between objects. Such type of connections of objects is called association rules. (V) Clustering Cluster or you can say groups are used to identifying the classes, for a set of objects whose classes are unknown. Clustering is used to identify data objects that are similar to one another. The main advantage of clustering over classification is that, it is flexible to change and helps single out useful features that differentiate groups. Clustering requirements in scalability, high dimensionality, ability to handle noisy data etc. There are various types of clustering methods like as: density based, hierarchy, partitioned, model based etc. (VI) Summarization A set of relevant data is summarized which result in a smaller set that gives aggregated information of the data. Data Summarization is a term for a short winding up of a big paragraph. Data summarization has the great importance in the data mining. Data Architecture The major components of any data mining system are data warehouse server, data source data mining engine, graphical user interface, pattern evaluation module and knowledge base. 3 International Journal of Management, IT & Engineering http://www.ijmra.us, Email: [email protected] International Journal of Management, IT & Engineering Vol. 8 Issue 9(1), September 2018, ISSN: 2249-0558 Impact Factor: 7.119 Journal Homepage: http://www.ijmra.us, Email: [email protected] Double-Blind Peer Reviewed Refereed Open Access International Journal - Included in the International Serial Directories Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., Open J-Gage as well as in Cabell’s Directories of Publishing Opportunities, U.S.A 1. Data Sources There are so many documents present. That is a database, data warehouse, World Wide Web. That is the actual sources of data. Sometimes, data may reside even in plain text files or spreadsheets. World Wide Web or the Internet is another big source of data. Organizations usually store data in databases or data warehouses. Data warehouses may contain one or more databases. 2. Database or Data Warehouse Server The database data warehouse server contains the actual data that is ready to be processed. Hence, the server handles retrieving the applicable information. That is based on the data mining request of