Available online at www.sciencedirect.com ScienceDirect

Procedia Computer Science 103 ( 2017 ) 295 – 302

XIIth International Symposium «Intelligent Systems», INTELS’16, 5-7 October 2016, Moscow, Russia Granular models and methods based on the spatial granulation

S. Butenkova*, A. Zhukova, A. Nagorovb, N. Krivshac

aScientific Research Center of Super– and Neurocomputer, Taganrog, Russia bKabardino-Balkarian State University, Nalchik, Russia cSouth Federal University, Rostov-na-Donu, Russia

Abstract

Automatic data processing and extraction of rules from large datasets has gained considerable interest during the last years. Several approaches have been proposed, mainly based both statistical and fuzzy sets approaches. In this paper, we propose a new view to the approaches to represent the large datasets and to find rules in one. That makes use of so-called Information Granulation and Computing with Words methods. The basic Ideas and principles of Granular Computing have been studied explicitly or implicitly in many fields in isolation. With the recently renewed and fast growing interest, it is time to extract the commonality from a diversity of fields and to study systematically and formally the domain independent principles of Granular Computing in a unified model. A framework of granular computing can be improved by applying the inherited principles of Space Granulation and Computing with Shapes. In this paper, we examine a framework from new perspectives of granular computing, based on Space Granulation and Computing with Shapes for the granulated data processing and problem solving. © 20172017 The The Authors. Authors. Published Published by byElsevier Elsevier B.V. B.V. This is an open access article under the CC BY-NC-ND license (Peerhttp://creativecommons.org/licenses/by-nc-nd/4.0/-review under responsibility of the scientific). committee of the XIIth International Symposium «Intelligent Systems». Peer-review under responsibility of the scientific committee of the XIIth International Symposium “Intelligent Systems” Keywords: granular computing; spatial granulation; intelligent system; computing with words.

1. Introduction

Human-like data perception and analysis involves the Information Granularity (IG)1,2,3. The consideration of granularity is motivated by the necessary for the information simplification, clarity, low cost, approximation and

* Corresponding author. E-mail address: [email protected]

1877-0509 © 2017 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientific committee of the XIIth International Symposium “Intelligent Systems” doi: 10.1016/j.procs.2017.01.111 296 S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302

tolerance of uncertainty2. As an emerging field of study, Granular Computing (GrC) attempts to formally investigate and model the group of granule-oriented methods and the same information processing paradigm4. Ever since the introduction of the terms of IG and GrC, we have witnessed a fast development of and a rapid growing interest in the topic1,4. Many models and methods of granular computing have been proposed and investigated. The studies of concrete models and methods are useful for the development of a field in its early stage. It is equally important, if not more, to study a general theory that avoids constraints of a concrete model. The basic notions and principles of granular computing, though under different names, have been appeared in fact in many related fields, such as artificial intelligence, interval computing, quantization, data compression and processing, , , and many others5,6,7,8. However, granular computing has not been fully explored. It is time to extract the commonality from these difference fields and to explore formally the domain principles of GrC in a graphic framework. This paper is organized as follows: section II introduces the basic ideas of Granular Computing, the next section shows the mathematic background and basic formalisms for the affine space. Section IV presents the Cartesian granules in affine space and the common model for the such granules. In section V some results in new kind of affine space measures are presented an used for the granulated data processing. Finally, we conclude with a discussion of space granulation some extensions and ideas for future work.

2. Basic ideas of Granular Computing

There are several key formal frameworks contributing to GrC and forming its mathematic content. GrC can be realized in various formal frameworks8,9.

2.1. Set theory and interval analysis

Sets are fundamental concepts of mathematics and science10. Likewise, interval analysis ultimately dwells upon a concept of sets which in this case are collections of elements in the line. Multidimensional constructs are built upon Cartesian products of numeric intervals and give rise to computing with hyperboxes4. Conceptually, sets (intervals) are rooted in a two-valued logic with their fundamental predicate of membership. The interval analysis is a cornerstone of reliable computing which in turn is ultimately associated with digital computing in which any variable is associated with a finite accuracy (implied by the fixed number of bits used to represent numbers). Intervals offer a straightforward mechanism of abstraction; all elements lying within a certain interval become indistinguishable and therefore are treated as identical. Here we are concerned with more complex and inherently multifaceted concepts and notions, where fuzzy sets1 could be incorporated into the formal description and quantification of such problems yet not in so instantaneous manner2. All of these notions incorporate some components that could be quantified with the use of fuzzy sets yet this translation is not that completely straightforward and immediate as it happens for the category of the explicit usage of fuzzy sets.

2.2. Shadowed sets

Fuzzy sets are associated with the collections of numeric membership grades. Shadowed sets4 are based upon fuzzy sets by forming a more general and highly synthetic view at the numeric concept of membership. Using shadowed sets, we quantify numeric membership values into three categories: complete belongingness, complete exclusion and unknown (which could be also conveniently referred to as don't know condition or a shadow). This helps us contrast these three fundamental constructs of information granules1. In a nutshell, shadowed sets can be regarded as a general and far more concise representation of a that could be of particular interest when dealing with further computing (in which case we could come up with substantial reduction of the overall processing effort). Shadowed sets arc isomorphic with a three-valued logic, and operations on shadowed sets are the same as in this logic. The underlying principle is to retain the vagueness of the arguments (shadows of the shadowed sets being used in the aggregation)5. S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302 297

2.3. Fuzzy sets as granular representatives of numeric data

In general, a fuzzy set is reflective of numeric data that are put together in some context2. Using its membership functions, we attempt to embrace them in a concise manner. The development of the fuzzy set is supported by the following experiment-driven and intuitively appealing rationale: a) we expect that A reflects (or matches) the available experimental data to the highest extent, and b) the fuzzy set is kept specific enough so that it comes with a well-defined semantics. These two requirements points at the multiobjective nature of the construct: we want to maximize the coverage of experimental data (as articulated by (a)) and minimize the spread of the fuzzy set (as captured by (b)). These two requirements give rise to a certain optimization problem5. Furthermore, which is quite legitimate, we assume that the fuzzy set to be constructed has a membership grades occupy a contiguous region in the universe of discourse in which this fuzzy set has been defined1.

2.4. Rule-based systems as granular models

Granular data models, as the name stipulates, are modeling constructs that are built at the level of information granules. Mappings between the granules express the relationships captured by such models. The granularity of information that is explicitly inbuilt into the construct offers interesting and useful features of the model including its evident transparency and flexibility. Fuzzy rule-based systems (models) are typical and commonly encountered examples of granular models. These systems are highly modular and easily expandable fuzzy models composed of a family of conditional statements (rules) where fuzzy sets occur in their conditions and conclusions2,8. In general, we may talk about rules embracing information granules expressed in any other formalism. Such models supports a principle of locality and a distributed nature of modeling as each rule can be interpreted as an individual local descriptor of the data (problem) which is invoked by the fuzzy sets defined in the space of conditions (inputs)7. To provide the common formalisms, including both algebraic and geometric formalisms, we provide the common approach to the Space Granulation in the abstract vector space3,9.

3. Basic Formalisms of affine Space

Especially in the Information Granulation Theory, spatially or temporally adjacent data values are thought to form of abstract patterns, which can be regarded as ordered sets of real numbers.

3.1. Abstract vector space

Let K is an arbitrary number field, and n  – a number. Then, the n -dimensional vector over the K number field is the tuple of n K elements. The tuple elements called the vector coordinates. n -dimensional number space over the K number field is the population of all n -dimensional number vectors over the number field10. To use the common data types we bring the very common definitions for the abstract vector space10. Definition 1. An arbitrary set of elements L called the vector space over the K number field in case of: a) There is the algebraic operation, for element pair ab,  L, the result of one called sum cab , cL . b) Also there is the another algebraic operation for the element aL and number kK , the result of one is ca k , cL . c) Both operations are satisfied to next axioms: I. For an arbitrary element abc,, L there are the next correct statements: a) ab ba(commutativity), b) ab c a bc(associativity). II. In L there is the 0 element (zero element) that a0a for all elements aL . III. For all elements aL there is the element a , called the inverse to a , thataa 0.

IV. For an arbitrary elements ab,  L and an arbitrary numbers k1 and k from K number field there are the next statements:

a) kk12 aa kk 12 , 298 S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302

b) kk212aaa k k,

c) kkk111 ab a  b. V. For each element aL there is correct the statement 1a a . According to the given axioms I-IV, every element, satisfying to all axioms is the abstract vector10.

Let m -dimensional vector space, for the m -fold axis there is the basis vectors (orts) ee12, ,..., em . Each abstract m

vector V can be presented as the next decomposition Vv ¦ iie , where vi is the vector coordinates. In case of i 1

basis vectors ee12, ,..., em are unmatched for the current problem, this abstract space is affine abstract space if not – Euclidean (metric) abstract space10. For the very important families of physical data and scientific data the problem of space metric is very important. The most of space data analysis and processing methods are based on the space metric5,8,9. Because there are presented a some of very different approaches to the measure problem, many popular metrics are introduced coarsely and for the incorrect basic statements4. Very common approach to the data modeling it the affine modeling8,9.

3.2. Affine space and it’s formalisms

Granular data models, as the name stipulates, are modeling constructs that are built at the level of information granules. Mappings between the granules express the relationships captured by such models. The granularity of information that is Affine space is very useful for the common figure properties studies if an arbitrary coordinates transformation demanded10. In the affine space we provide the very common geometry methods for the data representation9. The main term for our approach is the determinant in affine space. Definition 2. n – dimensional determinant in affine space is the function of n n -dimensional vectors, like F 12aa, ,...,n a = 12aa, ,...,n a, that satisfies for the next axioms: a) F 12aa, ,...,n a is linear functions for the each argument. b) If there is the pair of linear depended vectors between 12aa, ,...,n a, then F 12aa, ,...,n a 0 . c) ee,2 ,...,n e 1. Note that we will put the vector numbers on the left side of symbol, so that don’t take it with the vector index. From the geometric point of view, the n -dimensional determinant value defines the certain value, correlated with the figure, based on n vectors in the determinant9 (oriented area, oriented volume etc.). In our papers the special case of geometric models are provided for the affine space vectors. It’s necessary for the geometric interpretation of original data vectors3,9.

3.3. Data Space Partition model

According to main ideas of Information Granulation Theory by L.A. Zadeh, the Information Granule is the subset of universum, associated with the similarity (indistinguishability, etc.), relation. The completed set of granules, that contains all elements of universe G called the universe granulation1. We must complete the basic concepts by L.A. Zadeh with the definitions for the Space Granulation (data space granulation)9. Definition 3. Universe G partition is the finite family of subsets i G G , in 1,..., (atomic granules), which satisfyes to next axiom set:

i Ginz‡, 1,..., ;

ijGGˆ ‡if ijiz; 1,..., nj ; 1,..., n ; ;

i Gin G, 1,..., . i S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302 299

Each partition subset called the equialence granule2. The subset i G Ž G called compound granule (not atomic granule) in case of it consists from a few atomic granules, associated by the common indistinguishability relation9. The conversion of original data vectior space to granulated space is implemented by the next set of mathematic tools. Definition 4. Covering W of finite universe G is the finite set of subsets i G , satisfying to axioms 1 and 3. Definition 5. Partition S (or coveringW ) called the conjunctional partition (covering) if each equialence class from S (W ) is the compound granule.

Definition 6. Partition S 1 is the refinement of partition S 2 (or S 2 is the generalization of partition S 1 ), designated as SS12, in case of each granule from S 1 belongs to another granule from S 2 . Covering W 1 is the refinement of covering W 2 (or W 2 is the generalization of coveringW 1 ), designated asWW12, if each granule from W 1 belongs to another granule fromW 2 . Thusly, the basics of GrC is the methods of isomorphic coverings and partitions design with the refinement procedures11.

4. Cartesian granules in affine space

The common definition of cartesian information granule by L.A. Zadeh1 may be improved for the cartesian granules in affine spaces. Definition 7. Let 1GG,...,n is an arbitrary information granules with the dimension m 1 for the numerical 1 n variablesUU1 ,..., n , then the Cartesian product GGn u ˜˜˜u G is the Cartesian granule with the dimension n .

Cloud of points in affine space with the projections on the different axes projx A and projx A must be covered + 1 2 by the cartesian product of projections G2 with the dimension n 2 . The similar technique is called as information incapsulation by L.A. Zadeh in1. Also, the next definition provided in2:   Definition 8. The Cartesian granule G , defined as GGG xyu, is the incapsulation of original information granule G in the sense of supremum of the family of Cartesian granules, contains granuleG . With the geometry point of view, the Cartesian granule is the Cartesian product on the affine space basic axis. Space incapsulation example for or n 2 presented on the Figure 1.

+ Fig. 1. Example of a space incapsulation on the plane by the G2 Cartesian granule

Cartesian granule model (algebraic model) introduced in our papers as the special case of determinant11. In case of Cartesian coordinates the model parameters are defined by the n affine space vectors i x , in 1, ..., : 300 S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302

12 n xx11 xn 1 12 n xx22 x 21 12 n GFn xx, ,..., x1 , 1 . (1) 12 n xxnn x n1 12 n xxnn11 x n  11

The basic vectors of (1) model are the corners of Cartesian granule, presented on the Figure 112. The model calculation method (granule incapsulation) are established on the determinant properties10, and uses the same model ii (1). For the n 2 space we can provide the common formula, based on (1). We must use the m points xx12, + , im 1,2,..., of original cluster A (see Figure 1), to calculate the incapsulate granule G2 parameters by means of algebraic operations

ii min xx12 min 1

+ ii GA212 min x max x 1 , im 1,2,..., (2)

ii max xx12 max 1

The similar formulas may be provided for an arbitrary space dimension9 n . Next, there is the incapsulate granule (2) refinement may be calculated by the different criteria of granule quality3,7. For the granulated space refinement we introduce the common formula, based on (1) and (2) as:

ij ij min xx11 , min xx11 , 1

+ i j ij ij GGG222 ,min,max,1 xx22 xx22 . (3)

ij ij max xx33 , max xx33 , 1

Presented determinant in affine space related with the measure over the figure, covering the original data cluster (area, volume, etc.). On the basis of measure of the granule we provide the family of derivative measures on granule models (1) to organize the collection of formulas for the granule manipulation (Granular Computing or Computing with Figures against the Computing with Words By Zadeh2).

5. Set Measures in affine space and Space Granulation

The measure concept is the basis for the some mathematic methods and a lot of different techniques6. Definition 9. Non-negative set function mX:()P o called the set measure if it satisfies the next family of axioms:

AXŽœ mA() t 0;

m()‡ 0;

if AB,() P X , then mA ‰ B mA()  mB ()  mA ( ˆ B ).

The P()X is the set of all subsets X , – the set of real number. Consider here the family of measures, related with the specific subsets of granulated space7 G . According to7,13 for the single granule G G obtained the commonality measure Glob(G) m (G) m (G ) , defines the relative granule G size. For the pair of granules S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302 301

ijGG, G obtained the associativity measure AS(G,G)ij m (G, iˆ j G ) m (G ). For the same pair ijGG,  G the covering measure obtained as CV(ij G, G) mGm ( i G,ˆ j ) ( j G) . For the common set measures above we provide the geometry method of measure realization11. On the base of (1) model we can introduce the wide spectrum of measure formulas12. For example, three base measures for the two- dimensional granule 2 G may be provided as:

1 x 1 11K 1 GF2 x1, 1 , (4) x2 1

2 x 1 22K 1 GF2 x1, 2 , (5) x2 1

11 xx121 312322 K GF222 xxx1,,,xx 1. (6) 33 xx331

The geometric sense of (4), (5), (6) formulas is obvious14. The similar measures may be provided for an arbitrary 12n 1 11 measures on the n –dimensional granule Gn as KK GGnn, , ..., K G n etc. . ij Very important of granule measures is the similarity measure for the n – dimensional granules pair GGnn, G , + ij that incapsulated by the n – dimensional granule GGGnnn ,  G , obtained according to (6) as:

ij n+ ni ni SIM( Gnn , G ) KK GGG n n  K n . (7)

12 ij In is proved that the function of pair of granules SIM( Gnn , G )  satisfies to all measure axioms and it is the similarity measure in n – dimensional affine space. Note that for the data granulation by (7) measure we don’t need to distance measure in the dada space10 against the all famous data clustering and methods. The most of such methods are obtained only for the Euclidean space4.

6. Conclusion

The main results, obtained in the presented paper, related with the methods of Information Granulation by L. A. Zadeh and their expansion for the geometry interpretation of the common data space1. The introduced formulas for the data Cartesian granule models are provided on the basis of common affine vector space10, that more applicable if the Euclidean space methods for the wide data classes13. The introduced common granule model allows to use the determinant theory results for the data space granules manipulation: granules separation, comparison and incapsulation2. This is the procedures of geometry granules manipulation called Computing with Figures against Computing with Words by L.A. Zadeh2. The same results as the main Theory of Information Granulation may be used in the Data Mining, Pattern Recognition and another Intelligent Data Analysis problems4 in case of multidimensional data13. The received basic formalisms are the basis of very efficient (greedy) data processing algorithms for the multidimensional data. All derived formulas are applicable for the affine data space that demands least of all mathematical restrictions rather than the Euclidean data space methods. 302 S. Butenkov et al. / Procedia Computer Science 103 ( 2017 ) 295 – 302

References

1. Zadeh LA. Fuzzy sets and information granularity. In: Gupta N, Ragade R, Yager R, editors. Advances in Fuzzy Set Theory and Applications., Amsterdam: North- Holland; 1979, p. 3-18. 2. Zadeh LA. Toward a Theory of Fuzzy Information Granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 1997; 90, p. 111–127. 3. Butenkov S. Granular Computing in Image Processing and Understanding. Proceedings of IASTED International Conf. on AI and Applications “AIA–2004” 2004, p. 78-83. 4. Pedrysz W. Granular Computing – the emerging paradigm. Journal of Uncertain Systems 2007; 1(1), p.38-61. 5. Bargiela A, Pedrycz W. Granular Computing: an Introduction. Boston: Kluwer Academic Publishers, 2002. 6. Lin TY. Granular computing, LNCS 2639,Berlin: Springer, 2003; p.16-24. 7. Yao YY. Granular computing: basic issues and possible solutions. Proceedings of the 5th Joint Conference on Information Sciences, 2000; p. 186-189. 8. Yao YY. Modeling data mining with granular computing. Proceedings of COMPSAC 2001; p. 638-643. 9. Butenkov S, Zhukov A. Information granulation on the basis of algebraic systems isomorphism. Proceedings of International Algebraic Conference in memoriam of A.I. Kostrkin, Nalchik, June 12-18, 2009; p. 106-113. 10. Maltsev AI. Algebraic Systems, Moscow: Nauka; 1970. 11. Butenkov SA. Mathematic Models for the Inteligent Multidimensional Data analysis. Proceedings of International Conf. in Mathematic System Theory MTS-2009 , Moscow, January 26-30, 2009; p. 93-101. 12. Butenkov SA. The Intellectual Data Analysis Development for the Information Granulation. Proceedings of IV International Conferece “Integrated Models and Soft Computing in Artificial Intelligect. Colomna, May 28-30, 2007; 1, p. 188-194. 13. Rogozov YI, Beslaneev ZO, Nagorov AL, Butenkov SA. Data Models based on the Information Granulation theory. Proceedings of V International Conference “System Analysis and Information Technologies” SAIT- 2013, Krasnoyarsk, September 19-25, 2013; p. 24-33. 14. Nagorov AL. Intellectual Modelling for the Transport Problems in Physic. Proceedings of Intelligent Techniques Congress IS-IT’12, Moscow: FIZMATLIT, 2012; 3, p. 234-239.