Data Science, Big Data and Granular Mining

Pattern Recognition Letters 67 (2015) 109–112 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec Editorial Data science, big data and granular mining Introduction Mining of big data can be made effective with the methodologies that can deal with their characteristics, such as heterogeneity, dy- With the evolution of various modern technologies, huge amount namism or time varying, redundancy, uncertainty and impreciseness. of data is being constantly generated and collected around us. We Heterogeneity in data comes from the characterization of information are in the midst of what is popularly called information revolution in different ways such as using numerical, categorical, text, and im- and are living in a so-called world of knowledge. Intentionally and/or age or audio/video data. Dynamism in data is due to the mechanism accidentally, generation of these data is inevitable. As a result, large that generates related data changes at different times or different cir- data, broadly characterised by three Vs – large volume, velocity and cumstances, which adds new uncertainty and difficulties for analysis. variety (popularly known as “big data”) [1], is becoming a fancy word, High dimensionality is due to the inappropriate collection of data for and analysis, access and store of these data are now central to various a task. Although a large candidate set of attributes is provided, most scientific innovation, public health and welfare, public security and of them are irrelevant or redundant. These superfluous attributes de- so on. Moreover, big data are highly complex in nature and mining teriorate the learning performance of decision-making algorithms. them is not straight forward. Most of the information is heteroge- In the recent past, evolution of research interest has cropped up neous, time varying, redundant, uncertain and imprecise. To reason, a relatively new area called, granular computing (GrC) [2,3,7],due understand and mine useful knowledge from these data is becoming to the need and challenges from various domains of applications, a great challenge. It is also true that large integrated data sets can such as data mining, document analysis, financial gaming, organiza- potentially provide a much deeper understanding of both the nature tion and retrieval of huge data bases of multimedia, medical data, and society, and open up new avenues for research activities. remote sensing, and biometrics. Several aspects of GrC [6] play im- Long before the data space, there have been histories of physi- portant roles in the design of different decision-making models with cal and social spaces to describe various phenomena in nature and acceptable performance. human civilization, respectively. These two spaces eventually led to GrC, based on technologies like fuzzy sets [9], rough sets [8], com- natural and social sciences, where a few research activities related to puting with words, etc., provides powerful tools for multiple granu- understanding of processes and concepts such as discovering various larity and multiple-view data analysis, which is of vital importance phenomena in the physical world and understanding the human in- for understanding the complexity in big data. Different users may teractions are being carried out. In the recent past, digitalization of require understanding of big data at different granularity levels and various observable facts of these spaces has produced huge amount need to analyse the data with different viewpoints. GrC has exhibited of data. With the over-flooding of data called “big data” people have some capability and advantages in intelligent data analysis, pattern started realizing the existence of data space along with the natural recognition, machine learning and uncertain reasoning for noticeable and social spaces, and shown remarkable interest on it to explore. size of data. However, very few research activities have gone deep into Once the data are generated, they will not evolve accordingly if no the nutshell of heterogeneous, complex and big data analysis so far. special mechanism is arranged. The data may have powerful reac- tion to the real world even if it is fabricated, e.g., rumours spread via Granular computing: components, characteristics and features mobile phones or through a social media. Big data has a role in its promising usefulness in many fields such as commerce and business, GrC by its name is not new, but the popularity in its use for biology, medicine, public administration, material science, and cog- various domains has gained recently. It is a computing paradigm of nition in human brain, just to name a few. The objective of big data, information processing that works with the process of information as it stands, is to develop complex procedure running over large-scale granulation/abstraction. GrC is an umbrella term that accommodates enormous-sized data repositories for – extracting useful knowledge everything about theories, methods, techniques and tools of informa- hidden therein and delivering accurate predictions of various kinds tion granulation. In processing large-scale information, GrC plays an within a reasonable time period. Scientists from the academia, in- important role that finds simple approximate solution which is cost dustry, and open source community have been trying for improved effective and provides improved description of real world intelligent reasoning and understanding of big data and to provide better scope systems. In the present scenario, synergetic integration of GrC and to solve several social and natural problems. It may be mentioned computational intelligence has become a hot area of research to here that while analytics over big data is playing a leading role, there develop and experiment various efficient decision-making models has been a shortage of deep analytical talent globally. for complex problems. http://dx.doi.org/10.1016/j.patrec.2015.08.001 0167-8655/© 2015 Published by Elsevier B.V. 110 Editorial / Pattern Recognition Letters 67 (2015) 109–112 Broadly, the prime motivation in the development of GrC-based in an opposite manner, broadly they are highly correlated to each methodologies is three-fold. These are: (i) GrC does not go for an ex- other. At higher level, a granule represents more abstract concept by cessive precision of solution which is in fact the inherent character- ignoring irreverent details, while at lower level it is more specific one istic of human reasoning process, (ii) the structural re-presentation that reveals more detailed information. A challenging problem in GrC of the problem in the work-flow of GrC makes the solution process is to coordinate and/or swap between different levels of granularity. more efficient, and (iii) the computing method provides transparency Several attempts have been made to develop machine learning mod- in the information processing steps. Human reasoning-process nor- els using these approaches. mally follows the principle of information granulation and performs To improve the decision making-process using GrC, one may have the computation and operations on information granules. Most often, to represent and interpret the granular relationship, such as inter and modelling and monitoring complex systems do not require high level intra-relationship meticulously. The intra-relationship among the el- of precisions and in fact it is often expensive and not necessary. Fur- ements of a granule and inter-relationship among the elements of ther, a task with incomplete and imprecise information poses prob- two different granules have strong influence in GrC-based models. lems to differentiate distinct elements. This motivates one to go for These relationships provide the essential information for whether to granular representation of information and processes. As a result, GrC go for integration or dispersion of granules. framework provides efficient and intelligent information processing It is worthwhile mentioning the importance of methods for rea- systems for various real life decision-making applications. The said soning (judgment) about properties of computations over granules. framework can be modelled with principles of neural networks, in- These properties are constructed (induced) using a higher level gran- terval analysis, fuzzy sets and rough sets, both in isolation and inte- ulation over computations on granules. gration, among other theories. One of the key advantages in computing with granules is that granular processing mimics the human way of understanding of Granular computing: fuzzy and rough sets a particular problem where it requires different levels of information. GrC works in a similar fashion that deals with the structured Among the different technologies required for performing GrC, representation of the real-world problems. In fact, this form of the ones based on fuzzy and rough sets are the most successful and characterization is inherent in any task of the universe, and several considered to be the best choice for decision-making processes. In relationships among the different levels of structures and multiple fact, the concept of granulation is inherent in those theories. Fuzzy set levels of understanding provide ample scope to understand and theory starts with the definition of membership function and granu- interpret them. In many cases, including large data sets, a problem lates the features; thereby producing the fuzzy granulation of fea- may be difficult to be solved at one level, but subsequent

Load more