UNSTRUCTURED AND ITS APPLICATIONS 1Jagruti Jangal Wagh, 2Jidnyasa Dharmik Gondane, 3Ashvini Tulshiram Dukare. 1,2Student, Department of Computer Engineering, Cummins College of Engineering for Women Pune, Savitribai Phule Pune University Email: [email protected], [email protected], [email protected]

Abstract artificial intelligence, statistics, machine Information is generated in various forms. learning and . The basic goal of a About 90% of the data in the world has been data mining process is the extraction of generated over the last two years. If this data information from a large dataset and convert is left unmanaged, then it becomes or transform it into understandable form for overwhelming, making it difficult to get future use. Thus, this paper highlights on information from it whenever it is needed. unstructured data-mining and its Structured data is information, usually text applications. files, displayed in titled columns and rows Index Terms: Data Mining, Information, which can be easily processed and ordered by Structured data, unstructured data. data mining tools. Information system may differ in its I. INTRODUCTION application and form but they serve a Everyday data is generated, collected in huge common purpose that is to convert data into amount but many-a-times it remains unutilized some useful form through which knowledge without drawing useful information and can be obtained. Data is comprised of the meaningful insights. These insights are vital in basic, unfiltered and generally unrefined strategic and operational decision making information. Information is much more process like-marketing, customer engagement, refined data that is being useful for some branding, etc. Data generated by various form of analysis. Knowledge resides in the channels like marketing, distribution, customer user and comes up only when human insights engagement, social channels and web contents is and experience is applied to information and in different forms structured as well as data. But most of the today’s data generated unstructured and available in multiple systems. is unstructured. Unstructured data is that This unstructured data needs to be converted into which has no identifiable internal structure. something a bit more useful form, it requires A leading industry analyst on the confluence finding approaches to convert the free form, of unstructured data and structured data unstructured data to some form of structured or sources, published an article that stated, semi-structured data and analyze it to get “80% of business-related information meaningful insights to address business originates in unstructured form basically problems and helps business decision making. text”. So there is a great need to convert this Based upon the type of input data, reports can be unstructured data into some usable form. generated in the form of charts, bar-graphs etc. Data mining is the analysis step of dashboards can also be “Knowledge Discovery in databases” or KDD experimented to achieve effective visualization process which is the computational process of mechanism. discovering patterns in large data that is Structured data is the data which is in the involving methods at the intersection of organized form that is in the form of rows and

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-3, 2016 36

INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) columns and can be easily used by a computer  Data Cleaning − In this step the program. Relationships exist between entities of inconsistent data and noise is removed. data such as classes and their objects.  Data Integration − In Data Integration Unstructured data is the one that does not multiple data sources are combined. conform to a data model or is not in the form that  Data Selection – Here, the relevant data can be used easily by a computer. Various to the analysis task are being retrieved examples include memos, chat-rooms, videos, from the . images and researches, body of an email etc.  Data Transformation − In transformation Gartner estimates that almost 80% of the data of data the data is transformed or that is generated in any enterprise today is consolidated into forms which are Unstructured data. Roughly, around 10% of data appropriate or valid for mining by is in the structured and semi-structured category performing various aggregation [1]. operations.

II. DATA MINING  Data Mining − In this step various intelligent methods are applied so as to Data mining can be viewed as the result of the extract the various data patterns. natural evolution of information technology. The databases and data management industry  Pattern Evaluation − In this step the data evolved in the development of several critical patterns are being evaluated. functionalities like the data collection and  Knowledge Presentation − In this step, database creation, data management including knowledge is represented[1]. the storage and retrieval of data. Data mining is The goal of the data mining process is to extract the analysis step of the KDD or the ˝Knowledge the information from a data-set and convert or Discovery in Databases” process, an transform it into an understandable form for interdisciplinary subfield of , is future use. the process of discovering patterns in large datasets. III. STRUCTURED VS UNSTRUCTURED DATA Following are the steps involved in the MINING knowledge discovery process – Structured data is data that can be organized easily, regardless of its simplicity many experts in data industry today estimate that structured data only for 20% of the available data. It is usually stored in databases and is analytical and clean. Some examples of structured data are:  Machine Generated: Sensory Data - manufacturing sensors, medical devices and GPS data Point-of-Sale Data - Credit card information, product information, etc Call Detail Records - Caller and recipient information, time for call. Web Server Logs - Page requests, server activities  Human Generated: Input Data – The data that is given as a input to a Fig. 1: Data mining which is a step in the process computer: age, gender, code, etc. of knowledge discovery-[ Han & Kamber & Pei, Structured data has always had and will always Data Mining: Concepts and Techniques, 3rd be playing a crucial role in data . It Edition, Morgan Kaufmann] functions like backbone to the critical business

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-3, 2016 37

INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR) insights. Without structured data, it would have IV. CHALLENGES IN UNSTRUCTURED DATA been difficult to know where to find insights MINING hiding in the unstructured data sets. The challenges of unstructured data run the Unstructured data is the data that is not organized gamut from gathering to storing, to using it to in a predefined manner and does not have a make decisions: predefined model of data. It is obvious that social media plays a very important role in  Usability: For unstructured data to be usable, unstructured data. According to a research, 73% businesses will have to come up with a way of online adults use social networking sites. One to locate, extract, organize, and store the of the many ways in which businesses are data. utilizing this data is to gather and analyse the brand sentiments. In addition to social media  Volume: Data Size is growing exponentially there are many other forms of unstructured data which is creating new challenges to deal in which are common: scale. The volume of unstructured data is growing at a rate of approximately 60% per Word Doc’s, Text Files and PDF’s - Books, year. For many businesses, that’s more than letters, audio and video files, other written they can keep up with. That presents documents challenges for both using and securing the Audio Files -Customer service recordings, data. voicemails, phone calls Presentations-SlideShares,PowerPoint’s  Relevance: One way in which relevance Videos-YouTube uploads, personal videos comes into play is lack of insight into the Images-Pictures, memos previous story of certain pieces of data. Messaging- Text or instant messages  Heterogeneity: The difficulties of big data In all these examples the data can provide very analysis are caused because of its large compelling and useful insights. Using the right scale and the presence of mixed data based tools this unstructured data can add a depth to on different rules or patterns in the collected that could not be achieved and stored data. In the case of complicated otherwise very easily. The unstructured data heterogeneous mixture data, this data has enhances the ability of any business to derive several patterns and rules and the properties greater insights from the datasets. of their patterns may vary greatly. Unstructured data is the most important piece to the data pie of any business. Tools which are  Incompleteness: Incomplete data creates accessible widely can help businesses use this uncertainties during its analysis and it must data to the greatest of its potential. be managed during its analysis. Incomplete is the process of deriving the data refers to the missing of data field information that is of high quality from text. values for some of its samples. High-quality information is typically derived by  Quality: By nature, a large volume of the devising of trends and patterns through unstructured data is being unverified. This means such as statistical pattern learning. presents serious challenges for consumers Actually, text mining refers to the using data and enterprises. On a consumer level, mining techniques for finding out or discovering people could be negatively impacted by useful patterns from texts. The primary companies that make decisions based on difference is that unlike data mining, in text unstructured data. On an enterprise level, mining the data that is being used is making business decisions based on unstructured. inaccurate data could be extremely costly.

 Requirements for real-time data analysis has been predominant like for weather predictions, exstock tradings, time series, etc.

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-3, 2016 38

INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

V. APPLICATIONS OF UNSTRUCTURED DATA integrated data warehouse that would eventually MINING reduce the cost of storage and infrastructure.  Diseases Cure and Bio-informatics: One of Data mining in the Cloud offers tremendous the most important application trends potential for extracting and analyzing the useful dealing with mining and interpretation of information in different fields of human biological structures and sequences. Data activities like business, advertising, economics, mining tools are rapidly being used in health care medicines, biology, heredity, finding genes regarding cure of diseases like pharmacy, etc. The use of this technology should Cancer and AIDS . allow that with just a very few clicks of mouse  Business Trends: Today’s business one would be able to reach the preferred environment is more dynamic, so businesses information about clients regularity of must be able to react quickly, must be purchases of certain items, behavior, wellbeing, profitable, and also offer high quality purchasing power, spot and so on. services than ever before. Here, data mining Seamless User Experience can be provided: serves as a technology that is fundamental in Analytics solutions must integrate with other enabling customer’s transactions more tools so as to become standard. Many accurately, faster and meaningfully. The companies use Sales force for CRM, for Business analyst will get meaningful insights instance. But while they live inside Sales force, a to address business problems and helps lot of their metrics and data do not. As the business decision making. analytics gets more and more integrated, you can  Finance : Most financial institutions and use the Sales force for CRM or also use a tool banks offer a very large variety of banking like Tableau for analytics, bring the Sales force services like saving checking and individual data into Tableau, then can also use the links into and business customer transactions, credit the analytics so as to jump right back into Sales like mortgage, business and automobile force. For any end user or salesperson, the loans and investment services like the experience will always be seamless. Multimedia mutual funds. Some also offer stock services. Data Mining is the mining and analysis of Financial data that is being collected in the various types of data, including video, images , banking and financial industry is often animation and audio. The main idea of mining relatively reliable, high quality and complete the data which contains various different kinds of which facilitates systematic data analysis. information is the main objective of multimedia For example it can also help in detection of data mining. As the multimedia data mining fraud by detecting a group of individuals involves the areas of text mining and also who stage their accidents to collect on the hypertext/hypermedia mining, these fields are insurance money. very closely related to each other. Much of the  Area of forensics, security and intelligence: information which describes these other areas The manual monitoring of masses usually of also applies to multimedia data mining. This unstructured data is not feasible. The ability field is also new, but it holds much promise for of identifying the person’s names, credit the future. card, addresses and bank account numbers Thus, the Unstructured Data Mining would be of and other entities automatically is the key. great help to all the Business Analysts to draw useful insights in a very short span of time and VI. FUTURE SCOPE AND CONCLUSION thus propose the decisions which would indeed Considering the exponential growth of data, be for the betterment of their firm. scope of the system can be enhanced to cloud storage and its integration with external sources VII. ACKNOWLEDGEMENT of data. Implementation of the data mining We would like to express our gratitude towards techniques by the use of Cloud Computing will Neeta Maitre Mam for her kind co-operation and allow the users to get the meaningful encouragement which helped us in completion information and insights from virtually of this paper.

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-3, 2016 39

INTERNATIONAL JOURNAL OF CURRENT ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR)

REFERENCES

[1] Han & Kamber & Pei, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann. [2] Nurdatillah Hasim and Norhaidah Abu Haris. 2015. A study of open-source data mining tools for forecasting. ACM, New York, NY, USA, Article 79, 4 pages. DOI=10.1145/2701126.2701152

ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-3, ISSUE-3, 2016 40