<<

Available online at www.sciencedirect.com ScienceDirect

Cognitive Systems Research 59 (2020) 231–246 www.elsevier.com/locate/cogsys

Consciousness as a logically consistent and prognostic model of reality q

Evgenii Vityaev ⇑

Sobolev Institute of Mathematics, Koptuga 4, Novosibirsk, Russia Novosibirsk State University, Pirogova 2, Novosibirsk, Russia

Received 31 May 2019; received in revised form 16 September 2019; accepted 18 September 2019 Available online 20 September 2019

Abstract

The work demonstrates that brain might reflect the external world causal relationships in the form of a logically consistent and prog- nostic model of reality, which shows up as consciousness. The paper analyses and solves the problem of statistical ambiguity and pro- vides a formal model of causal relationships as probabilistic maximally specific rules. We suppose that brain makes all possible inferences from causal relationships. We prove that the suggested formal model has a property of an unambiguous inference: from consistent pre- mises we infer a consistent conclusion. It enables a set of all inferences to form a consistent model of the perceived world. Causal rela- tionships may create fixed points of cyclic inter-predictable properties. We consider the ‘‘natural” classification introduced by John St. Mill and demonstrate that a variety of fixed points of the objects’ attributes forms a ‘‘natural” classification of the external world. Then we consider notions of ‘‘natural” categories and causal models of categories, introduced by Eleanor Rosch and Bob Rehder and demon- strate that fixed points of causal relationships between objects attributes, which we perceive, formalize these notions. If the ‘‘natural” classification describes the objects of the external world, and ‘‘natural” the perception of these objects, then the theory of inte- grated information, introduced by G. Tononi, describes the information processes of the brain for ‘‘natural” concepts formation that reflects the ‘‘natural” classification. We argue that integrated information provides high accuracy of the objects identification. A computer-based experiment is provided that illustrates fixed points formation for coded digits. Ó 2019 Elsevier B.V. All rights reserved.

Keywords: Clustering; ; Natural classification; Natural concepts; Integrated information; Concepts

1. Introduction The work analyses and solves such problem of causal reflection of the external world as a statistical ambiguity The work demonstrates that the human brain may (Section 2.3). The problem is solved in such a way that reflect the external world causality in the form of a logically it is possible to obtain a formal model of causal relation- consistent and prognostic model of reality that shows up as ships, which provides a consistent and prognostic model consciousness. of the external world. To discover these causal relation- ships by the brain, a formal model of neuron that is in line with Hebb rule (Hebb, 1949), is suggested. We sup- q № The work is supported by the Russian Science Foundation grant 17- pose that brain makes all possible inferences/predictions 11-01176 in part concerning the mathematical results in 2.4–2.6 and by Russian Foundation for Basic Research # 19-01-00331-a in other parts. from those causal relationships. We prove (see Section 2.5) ⇑ Address: Sobolev Institute of Mathematics, Koptuga 4, Novosibirsk, that the suggested formal model of causal relationships Russia. has a property of an unambiguous inference/predictions, E-mail address: [email protected]. https://doi.org/10.1016/j.cogsys.2019.09.021 1389-0417/Ó 2019 Elsevier B.V. All rights reserved. 232 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 namely, consistent implications are drawn out from con- 2. The process of brain reflection of objects by causal rela- sistent premises. It enables a set of all inferences/predic- tions marked by blue lines; tions, which brain makes from causal relationships, to 3. Formation of the systems of interconnected causal rela- form a consistent and predictive model of the perceived tionships, indicated by green ovals. world. What is particularly important is that causal rela- tionships may create fixed points of cyclic inter- In G. Tononi’s theory only the third point of reflection predictable properties that create a certain ‘‘resonance” is considered. The totality of the excited groups of neurons of inter-predictions. In terms of interconnections between form a maximally integrated conceptual structure that neurons, these are cellular assemblies of neurons that defined by G. Tononi as qualia. Integrated information is mutually excite each other and form the systems of highly also considered as a system of cyclic causality. Using inte- integrated information. In the formal model these are log- grated information, the brain is adjusted to perceiving ically consistent fixed points of causal relationships. We ‘‘natural” objects of the external world. argue (Section 2.1) that if attributes of the external world In terms of integrated information, phenomenological objects are taken regardless of how persons perceive them, properties are formulated as follows. In brackets an inter- a complex of fixed points of the objects’ attributes forms a pretation of these properties from the point of view of ‘‘natural” classification of the external world. If the fixed ‘‘natural” classification is given. points of causal relationships of the external world objects, which persons perceive, are taken, they form 1. Composition – elementary mechanisms (causal relation- ‘‘natural” concepts described in cognitive sciences ships) can be combined into the higher-order ones (‘‘nat- (Section 2.2). ural” classes in the form of causal loops produce a If the ‘‘natural” classification describes objects of the hierarchy of ‘‘natural” classes); external world, and ‘‘natural” concepts are the perception 2. Information – only mechanisms that specify ‘‘differences of these objects, then the theory of integrated information that make a difference” within a system shall be taken (Tononi, 2004; Tononi, Boly, Massimini, & Koch, 2016; into account (only a system of ‘‘resonating” causal rela- Ozumi, Albantakis & Tononi, 2014) describes the informa- tionships, forming a class and ‘‘differences that make a tion processes of the brain when these objects are difference” is important. See illustration in the computer perceived. experiment below); G. Tononi defines consciousness as a primary , 3. Integration – only information irreducible to non- which has the following phenomenological characteristics: interdependent components shall be taken into account composition, information, integration, exclusion (Ozumi (only system of ‘‘resonating” causal relations, indicating et al., 2014). For a more accurate determination of these an excess of information and perception of highly corre- properties G. Tononi introduces the concept of integrated lated structures of ‘‘natural” object is accounted for); information: ‘‘integrated information characterizing the 4. Exclusion – only maximum of integrated information reduction of uncertainty is the information, generated by counts (only values of attributes that are ‘‘resonating” the system that comes in a certain state after the causal at the fix-point and, thus, mostly interrelated by causal interaction between its parts, which is superior information relationships, form a ‘‘natural” class or ‘‘natural” generated independently by its parts themselves” (Tononi, concept). 2004). The process of reflection of causal relationships of the These properties are defined as the intrinsic properties of external world (Fig. 1) shall be further considered. It the system. We consider these properties as the ability of includes: the system to reflect the complexes of external objects’ cau- sal relations, and consciousness as the ability of a complex 1. The objects of the external world (cars, boats, berths) hierarchical reflection of a ‘‘natural” classification of the which relate to certain ‘‘natural” classes; external world.

Fig. 1. Brain reflection of causal relationships between objects attributes. E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 233

Theoretical results on consistency of inference and con- conditional probability and use maximum available infor- sistency of fixed points of our formal model are supposing mation). Work (Vityaev, 2013) shows that the semantic that a probability measure of events is known. However, if probabilistic inference might be considered as a formal we discover causal relationships on the training set, and model of neuron that satisfy the Hebb rule, in which the intend to predict properties of a new object out of the train- semantic probabilistic reasoning discover all most precise ing set and belonging to a wider general population, or to conditional relationships. A set of such neurons might cre- recognize a new object as a member of some ‘‘natural” con- ate a consistent and prognostic model of the external cept, there might be inconsistencies. Here, a certain crite- world. Causal relationships may form fixed points of cyclic rion of maximum consistency is employed (see inter-predictable properties that produce a certain ‘‘reso- Section 2.8), which is based upon information measure, nance” of mutual predictions. Cycles of inferences about close in to an entropic measure of integrated causal relations, are mathematically described by the ‘‘fixed information (Tononi, 2004). The process of recognizing points”. These points are characterized by the property, ‘‘natural” classes is described in Section 2.9. that using inferences with respect to considered properties In Section 3 a computer model of ‘‘fixed points” discov- they don’t predict some new property. ering for the coded digits is provided. Neuron transmits its excitation to the other neurons Further we shell describe the idea of the work in more through multiple both excitatory and inhibitory synapses. detail before the complicated mathematical part. Inhibitory synapses may slow down other neurons activity. Causality is a result of physical determinism: ‘‘for every It is important for ‘‘inhibiting” alternate perception isolated physical system some fixed state of a system deter- images, attributes, properties, etc. Within our formal mines all the subsequent states” (Carnap, 1966). But, an model it is accomplished by discovering ‘‘inhibitory” con- automobile accident shall be taken as an example ditional relationships that predict absence of an attribute/ (Carnap, 1966). What was the reason for it? It might be a property of the object (the perceived object shall not have road surface condition or humidity, position of sun with the respective attribute/property) as compared to the other respect to drivers’ looks, reckless driving, psychological objects, where this characteristic is found. A formal model state of driver, functionality of brakes, etc. It is clear that specifies it by predicates’ negations for corresponding attri- there is no any certain cause in this case. bute/property. Absence of inconsistencies at the fixed point In the of science causality is reduced to fore- means that there are no two causal relationships simultane- casting and explaining. ‘‘Causal relation means predictabil- ously predicting both availability of some attribute/prop- ity ... in that if the entire previous situation is known, an erty with an object, and its absence. event may be predicted ..., if all the facts and laws of nat- The structure of the external world objects was analyzed ure, related to the event, are given” (Carnap, 1966). It is in the form of ‘‘natural” classifications (Section 2.1). It was clear that nobody knows all the facts, which number in case noted that ‘‘natural” classes of animals or plants have a of an accident is potentially infinite, and all the laws of nat- potentially infinite number of different properties (Mill, ure. In case of a human being and animals, the laws are 1983). Naturalists, who were building ‘‘natural” classifica- obtained by training (inductive reasoning). Therefore, tions, also noted that construction of a ‘‘natural” classifica- causality is reduced to predicting by Inductive-Statistical tion was just an indication: from an infinite number of (I-S) inference, which involves logical inference of predic- attributes you need to pass to the limited number of them, tions from facts and statistical laws with some probabilistic which would replace all other attributes (Smirnof, 1938). assessment. This means that in ‘‘natural” classes these attributes are When discovering causal relationships and laws on real strongly correlated, for example, if there are 128 classes, data or by training, a problem of statistical ambiguity and their attributes are binary, then the independent ‘‘indi- appears – contradictions (contradictory predictions) may cator” attributes among them will be about 7 attributes as be inferred from these causal relationships. See example 27 = 128, and others can be predicted based on these 7 in the Section 2.3. To avoid inconsistences, Hempel attributes. The set of mutually predicted properties, (1965, 1968) introduced a requirement of maximal speci- obtained at the fixed point, produces some ‘‘natural” class. ficity (see Sections 2.3 and 2.4), which implies that a statis- High correlation of attributes for ‘‘natural” classes was tical law should incorporate maximum of information, also confirmed in cognitive studies (see Section 3). Eleanor related to the predictable property. Rosch formulated the principles of categorization, one of Section 2.5 presents a solution to the problem of statis- which is as follows: ‘‘the perceived world is not an unstruc- tical ambiguity. Following Hempel, a definition of maxi- tured set of properties found with equal probability, on the mally specific rules is given (Sections 2.4 and 2.5), for contrary, the objects of perceived world have highly corre- which it is proved (Section 2.6) that inductive statistical lated structure” (Rosch, 1973, 1978; Rosch, Mervis, Gray, inference that uses only maximum specific rules, does not Johnson, & Boyes-Braem, 1976). Therefore, directly per- result in inconsistencies (see also Vityaev, 2006). A special ceived objects (so called basic objects) are observed and semantic probabilistic inference is developed that discover functional ligaments, rich with information. Later, Bob all maximum specific rules, which might be considered as Rehder suggested a theory of causal models, in which the the most precise causal relationships (that have maximum relation of an object to a category is based not on a set 234 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 of attributes but on the proximity of generating causal and specie of one or another animal, only by killing it” mechanisms: ‘‘the object is classified as a member of a cer- (Mill, 1983). tain category to the extent that its properties are likely to James. St. Mill gives the following definition of ‘‘natu- have been generated by this category of causal laws” ral” classification: it is such a classification, which combi- (Rehder, 2003). Thus, the structure of causal relationships nes the objects into the groups, about which we can between the attributes of objects is taken as a basis of cat- make the greatest number of common propositions, and egorization. Therefore, brain perceives a ‘‘natural” object it is based on such properties, which are causes of many not as a set of attributes, but as a ‘‘resonant” system of cau- others. He also defines an ‘‘image” of a class, which is sal relationships, closing upon itself through simultaneous the forerunner of ‘‘natural” concepts: ‘‘our concept of inference of the total aggregate of the ‘‘natural” concept class, the way in which this class is represented in our mind, features. At the same time, ‘‘resonance” occurs, if and only is the concept of some sample, having all attributes of this if these causal relationships reflect some integrity of some class in the highest ... degree”. ‘‘natural” class, in which a potentially infinite number of John St. Mill’s arguments were confirmed by naturalists. attributes mutually presuppose each other. To formalize For example, Rutkovskii (1884) writes about inexhaustible causal models, Bob Rehder proposed to use causal graph- number of general properties of the ‘‘natural” classes: ‘‘The ical models (CGMs). However, these models are based on more essential attributes are similar in comparable objects, uˆdeploymenty´ of Bayesian networks, which do not allow the more likely they are the similar in other attributes”. cycles and cannot formalize cyclic causal relationships. Smirnof (1938) makes a similar statement: ‘‘The taxonomic It should be specially noted that the ‘‘resonance” of problem is in indication: ”from an infinite number of attri- mutual predictions of the properties of objects is carried butes we need to pass to a limited number of them, which out continuously in time and therefore the predicted prop- would replace all other attributes.‘‘ As a result, there were erties are properties that have just been perceived as stimuli formulated a problem of specifying ”natural‘‘ classifica- from the object. Therefore, it is always a prediction in time. tions, which is still being discussed in the literature The absence of contradictions in the predictions is also the (Wilkins & Ebach, 2013). absence of contradictions between the predicted stimulus These researches are uncover a high degree of over and the really received stimulus. determination of information for ‘‘natural” classes. If for Here we suggest a new mathematical apparatus for for- specification of 128 classes with binary characteristics only malizing cyclic causal relationships. 7 binary attributes are required, since 27 = 128, then for hundreds or even thousands of attributes as it is the case 2. Materials and methods of ‘‘natural” classes, they are much more highly overdeter- mined. In this case it is possible to find, for example, 20 2.1. ‘‘Natural” classification attributes (indicators) for our classes, which are identical for the objects of the same class and entirely different in The first fairly detailed philosophical analysis of ‘‘natu- other classes. As each class is described by the fixed set ral” classification was carried out by John St. Mill (1983). of values of any 7 from 20 binary attributes, then remain- This analysis describes all the basic properties of ‘‘natural” ing 13 attributes are uniquely determined. It implies that C7 classifications, which were later confirmed by naturalists there are, at least, 13* 20 = 1007760 dependencies between who were building ‘‘natural” classifications and also by attributes. This property of ‘‘natural” classification was those researchers in the field of cognitive sciences, who called as ‘‘taxonomic saturation” (Kogara, 1982). For such were investigating ‘‘natural” concepts. systems of attributes there is no a problem of a ‘‘curse of According to John St. Mill (1983) ‘‘artificial” classifica- dimensionality”, when as dimensionality of the attribute tions differ from the ‘‘natural” ones in that they may be space increases, accuracy of Machine Learning algorithms based on any one or more attributes, so that different decreases. Conversely it was shown in (Nedelko, 2015) that classes differ only in inclusion of objects, having different accuracy of the classification algorithm increases, if it prop- values of these attributes. But if the classes of ‘‘animals” erly incorporates redundancy of information. The classifi- or ‘‘plants” are considered, they differ by such a large cation and recognition algorithm in Sections 2.8 and 2.9 (potentially infinite) number of properties that they can’t incorporate this redundant information. As it is shown in be enumerated. Furthermore, all these properties are based the works on ‘‘natural” concepts in (see on statements, confirming this distinction. At the same next section), human perception also incorporates this time, among the properties of some ‘‘natural” classes, there redundancy of information that allows to identify objects are both observable and non-observable attributes. To take with very high accuracy. into account hidden attributes, their causal exhibitions in the observed attributes should be found. ‘‘For example, 2.2. Principles of categorization in cognitive sciences the natural classification of animals should be based mainly on their internal structure; however, it would be strange, as In the works of Eleanor Rosch (1973, 1978; Rosch et al., noted by A. Comte, if we were able to determine the family 1976) the principles of categorization of ‘‘natural” E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 235 categories, confirming statements of John St. Mill and nat- account the theoretical, causal and ontological knowledge, uralists, are formulated on the basis of the experiments: relating to the objects of classes. For example, people do ‘‘Cognitive Economy. The first principle contains the not only know that birds have wings and can fly, and build almost common-sense notion that, as an organism, what their nests in trees, but also facts, that the birds build their one wishes to gain from one’s categories is a great deal of nests in trees, because they can fly, and fly because they information about the environment while conserving finite have wings. resources as much as possible. To categorize a stimulus Many researchers believe that the most important theo- means to consider it, for purposes of that categorization, retical knowledge is the knowledge of causal dependencies not only equivalent to other stimuli in the same category because of its functionality. It allows the organism to inter- but also different from stimuli not in that category. On fere in external events and to gain control over the environ- the one hand, it would appear to the organism’s advantage ment. Studies have shown that people’s knowledge of to have as many properties as possible predictable from categories isn’t limited by the list of properties, and knowing any one property, a principle that would lead to includes a rich set of causal relationships between these formation of large numbers of categories with as fine dis- properties, which can be ranked. The importance of these criminations between categories as possibley´. properties also depends on the category causal relation- uˆPerceived World Structure. The second principle of ships. It was shown in some experiments (Ahn, Kim, categorization asserts that ... perceived world – is not an Lassaline, & Dennis, 2000; Sloman, Love, & Ahn, 1998), unstructured total set of equiprobable co-occurring attri- that property is more important in classification, if it is butes. Rather, the material objects of the world are per- more involved in causal network of relationships between ceived to possess ... high correlational structure. ... In the attributes. In the other experiments it was shown, that short, combinations of what we perceive as the attributes the property is more important, if it has more reasons of real objects do not occur uniformly. Some pairs, triples, (Rehder & Hastie, 2001). etc., are quite probable, appearing in combination some- Considering causal dependencies, Bob Rehder proposed times with one, sometimes another attribute; others are the theory of causal models, according to which: uˆpeople’s rare; others logically cannot or empirically do not occury´. intuitive theories about categories of objects consist of a It is understood that the first principle is not possible model of the category, in which both the category’s fea- without the second one – the cognitive economy is not pos- tures and the causal mechanisms among those features sible without a structured world. uˆNaturaly´ objects (basic are explicitly represented. In other words, theories might objects) are rich with information ligaments of observed make certain combinations of features either sensible and and functional properties, which form a natural disconti- coherent ... in light of the relations linking them, and the nuity, creating categorization. These ligaments form a degree of coherence of a set of features might be an impor- ‘‘prototypes” of the objects of different classes (‘‘images” tant factor determining membership in a categoryy´ of John St. Mill): uˆCategories can be viewed in terms of (Rehder, 2003). their clear cases if the perceiver places emphasis on the cor- In the theory of causal models, the relationship of the relational structure of perceived attributes ... By proto- object to a category is based not on a set of attributes or types of categories we have generally meant the clearest similarity based on attributes, but on the basis of similarity cases of category membershipy´ (Rosch, 1973, 1978). of generating causal mechanisms: uˆSpecifically, a to-be- uˆRosch and Mervis have shown that the more prototypical classified object is considered a category member to the of a category a member is rated, the more attributes it has extent that its features were likely to have been generated in common with other members of the category and the by the category’s causal laws, such that combinations of fewer attributes in common with members of the contrast- features that are likely to be produced by a category’s cau- ing categoriesy´ (Rosch et al., 1976). sal mechanisms are viewed as good category members and Further, the theory of ‘‘natural” concepts, suggested by those unlikely to be produced by those mechanisms are Eleanor Rosch, was called the Prototypical Theory of Con- viewed as poor category members. As a result, causal- cepts (). Its main features are described as model theory will assign a low degree of category member- follows: uˆThe prototype view (or probabilistic view) keeps ship to objects that have many broken feature correlations the attractive assumption that there is some underlying (i.e., cases where causes are present but effects absent or common set of features for category members but relaxes vice versa). Objects that have many preserved correlations the requirement that every member have all the features. (i.e., causes and effects both present or both absent) will Instead, it assumes there is a probabilistic matching pro- receive a higher degree of category membership because cess: Members of the category have more features, perhaps it is just such objects that are likely to be generated by cau- weighted by importance, in common with the prototype of sal lawsy´ (Rehder, 2003). this category than with prototypes of other categoriesy´ To represent causal knowledge, some researchers have (Ross, Taylor, Middleton, & Nokes, 2008). used Bayesian networks and causal models (Cheng, 1997; In further studies it was found that the models based on Gopnik, Glymour, Sobel, Schulz, & Kushnir, 2004; attributes, similarities and prototypes are not sufficient to Griffiths & Tenenbaum, 2009). However, these models can- describe classes. It is therefore necessary to take into not simulate cyclic causality because Bayesian networks do 236 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 not support cycles. In his work Bob Rehder (Rehder & and C2^C3 explains why Jane Jones did not (:E). The set Martin, 2011) proposed a model of causal cycles, based of statements {C1, C2, C3} is consistent. However, the con- on a ‘‘disclosure” of causal graphical models. The ‘‘disclo- clusions contradict each other, making these arguments sure” is fulfilled by creating a Bayesian network which rival ones. Hempel hoped to solve this problem by forcing unfolds over time. But this work did not formalize the all statistical laws in an argument to be maximally specific – cycles of causality. they should contain all relevant information with respect to Our model is directly based on cyclic causal relation- the domain in question. In our example, then, the state- ships, represented by fundamentally new mathematical ment C3 invalidates the first argument L1, since this argu- models – fixed points of predictions on causations. To for- ment is not maximally specific with respect to all malize such fixed points, a probabilistic generalization of information about Jane Jones. So, we can only explain formal concepts was defined (Vityaev, Demin, & :E, but not E. Let us consider the problem of statistical Ponomaryov, 2012, 2015). Formal concepts emerging in ambiguity the notion of maximal specificity in more the Formal Concept Analysis (FCA) may be specified as detail. fixed points of deterministic implications (with no excep- There are two types of predictions (explanations): tions) (Ganter & Wille, 1999; Ganter, 2003). We generalize deductive-nomological (D-N) and inductive-statistical (I- formal concepts for probabilistic case through introducing S). The employed laws in D-N predictions are supposed probabilistic implications and defining fixed points for to be true, whereas in I-S predictions they are supposed probabilistic implications (Vityaev et al., 2012, 2015). Gen- to be statistical. eralization is made so, that in certain conditions probabilis- Deductive-nomological model may be presented in the tic formal concepts and formal concepts coincide. A form of a scheme computer experiment is performed (Vityaev et al., 2012, L1;...;Lm C ;...;C 2015), which demonstrates that probabilistic formal con- 1 n : G cepts might ‘‘restore” formal concepts after superimposi- tion of noise. where L1; ...; Lm set of laws; C1; ...; Cn set of facts; G pre- The formalization given in the Section 2.6 is more gen- dicted statement; L1; ...; Lm; C1; :::; Cn ‘ G; set eral, then in works (Vityaev et al., 2012, Vityaev and L1; ...; Lm; C1; ...; Cn is consistent; L1; ...; Lm0G, Martinovich, 2015). When there is a fixed set of objects, C1; ...; Cn0G; laws L1; ...; Lm contain only universal quan- and there is no general population, for which we need to tifiers; set of facts C1,...,Cn quantifier-free formulas. recognize a new object from general population as belong- Inductive-statistical model is identical to deductive- ing to one of available formal concepts, formalization in nomological with the difference that laws are statistical Section 2.6 provides probabilistic formal concepts. But and shall meet the Requirement of Maximal Specificity when set of objects is a sample from the general popula- (RMS), to avoid in-consistencies. tion, and it is necessary to recognize new objects to the Requirement of Maximal Specificity is defined by Carl one of the probabilistic formal concepts, this formalization Hempel (1965, 1968) as follows: ” provides ‘‘natural classification of the objects of general F )G; pGðÞ¼;F r Fa population. ðÞ ½r : GaðÞ 2.3. Statistical ambiguity problem Rule F ) G is the maximum specific with the state of knowledge K, if for each class H, for which both of the A problem of inconsistent predictions obtained from below statements belong to K inductively deduced knowledge is called a problem of sta- xHx Fx ; Ha tistical ambiguity. The classical example is the following. 8 ðÞðÞ) ðÞ ðÞ Suppose that we have the following statements. there is a statistical law pGðÞ; H ¼ r0 in K such that r ¼ r0. RMS requirement implies that if both F and H contain L1 Almost all cases of streptococcus infection clear up object a, and H is a subset of F, then H has more specific quickly after the administration of penicillin. information about object a, than F and, therefore, law L2 Almost no cases of penicillin resistant streptococcus pGðÞ; H shall prevail over law pGðÞ; F . However, for maxi- infection clear up quickly after the administration of mum specific rules law pGðÞ; H should have the same prob- penicillin. ability as law pGðÞ; F and thus H not adds any additional C1 Jane Jones had streptococcus infection. information. C2 Jane Jones received treatment with penicillin. The Requirement of Maximal Specificity had not been C3 Jane Jones had a penicillin resistant streptococcus investigated by Hempel and its successors formally, and infection. it had not been proved that it can avoid inconsistencies. The next section contains a formal definition of the maxi- On the base of L1 and C1^C2 one can explain why Jane mal specificity, for which we prove that I-S inference that Jones recovered quickly (E). The second argument with L2 use only maximal specific rules, is consistent. E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 237

2.4. Requirement of maximal specificity Statistical laws shall now be defined. For the sake of simplicity a probability shall be determined on empirical Let us introduce a language of the first order L of signa- system M ¼ hiA; W as on a general population, where A ture I ¼ hiP 1; :::; P k , which contains only unitary predi- is a set of objects of a general population, and W is a set cates symbols for the objects properties and stimulus of unitary predicates, defined on A. description. Let UðÞI shall denote a set of all atomic for- Probability l : A ! ½0; 1 shall be defined on A (general mulas. Atomic formulas or their negations shall be called case is considered in Halpern, 1990): liters, and a multitude of all liters shall be denoted as Lit. P Closure of all liters with respect to logical operations lðÞ¼a 1; lðÞa –0; a 2 A: a A &; ; shall be called a set of sentences RI. 2 P _ : ðÞ B b ; B # A: ð2Þ Also we need an empirical system M A; W of the sig- lðÞ¼ lðÞ ¼ hi b 2 B nature I for representing the set of objects A and set of W P ; :::; P n n predicates ¼ hi1 k defined on A. Set of all M- Probability l on ðÞA , shall be naturally defined: true sentences from RIðÞshall be called a theory ThðMÞ n l ðÞa ; :::; an ¼ lðÞa ::: lðÞan : of M. It shall be further supposed that theory ThðMÞ is a 1 1 set of universally quantified formulas. It is known that Interpretation of language L shall be determined as map- any set of universal formulas is logically equivalent to the ping I : I ! W , where predicate P j 2 W ,j ¼ 1; :::; k corre- set of rules as follows sponds to each predicate symbol P j 2 I. Mapping m : X ! A of a set of variables X of language L to the set C ¼ ðÞA1&:::&Ak ) A0 ; k P 0; A0 R fgA1; :::; Ak ; ð1Þ of objects shall be called attribution. Composition of map- ; ; :::; k where A0 A1 Ak liters. Formulas of type A0, ¼ 0 are pings mIu, where u 2 RIðÞspecifies a formula obtained considered as rulesðÞ T ) A0 , where T truth. Hence, with- from u by replacement of predicate symbols of by predi- out loss of generality, it can be assumed that theory ThðMÞ cates W through interpretation I and replacement of vari- is a set of the (1) type rules. ables from u with objects from A by attributing m. A rule might be true on empirical system M only Probability g of sentence uðÞ2a; ...; b RIðÞshall be because the premise of the rule is always false. Further- defined as: more, a rule may also be true, since its certain ‘‘sub-rule” n that involves a part of premise, is true on empirical system. guðÞ¼l ðÞfgðÞja1;...;an M mIu; mðÞ¼a a1;...;mðÞ¼b an : These observations are summarized in the following ð3Þ theorem. Teopeva1.(Vityaev, 2006). Rule C ¼ ðA1&:::& Ak ) A0Þ logically follows from rules: Definition 3. Rule C ¼ ðÞA1&:::&Ak ) A0 of (1) type, with ... ; :::; A ; A A ; :::; A &:::& > 1. (Ai1& &Aih)Ai0)C, f Ai1 ih i0 gfg1 k , gðA1 AkÞ 0 and conditional probability 0 h 0 strictly higher than condi- ... ; :::; A ; :::; A tional probabilities of all its sub-rules, shall be called a 2. (Ai1& &Aih)A0)C, f Ai1 Aih gfg1 k , 0 h 0 are probabilistic laws. Definition 1. Sub-rule of rule C is the any rule of 1 or 2 type, specified in theorem 1. A set of all probabilistic laws shall be designated as LP.

Definition 4. Probabilistic law C ¼ ðÞA1&:::&Ak ) A0 , Corollary 1. If sub-rule of rule C is true on M, then rule C is which is not a sub-rule of any other probabilistic law, true on M. shall be called the strongest probabilistic law (SPL-rule) on M. A set of all SPL-rules shall be designated as SPL. Definition 2. Any rule C, true on M, each sub-rule of which is already not true on M, is the law of empirical system M. It shall be proved that a concept of probabilistic law Rule ðÞ) A0 is true on M, if M A0. Rule ðÞ) A0 true generalizes a concept of a law, which is true on M. on M, is a law on M. Theorem 3. (See prof in Appendix A). Law SPL LP. Let Law be the set of all laws on M. Then, theory ThðMÞ logically follows from Law. Definition 5 (Vityaev, 2006). A semantic probabilistic inference (SP-inference) of some SPL-rule, predicting liter Theorem 2 (Vityaev, 2006). (See prof in Appendix A). A , shall be such sequence of probabilistic laws Law C Th M 0 ‘ ThðMÞ and for each rule 2 ðÞits sub-rule C @C @:::@C , as: exists, which is a law on M. 1 2 n 238 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246

: C A v 1 1 ¼)ðÞ0 Teope a4(Vityaev, 2006). (See prof in Appendix A). i i G ; 2: C ; C ; ...; Cn 2 LP; Ci ¼ A &...&A ) A ; ki P 0; Any maximum specific rule MSðÞ¼ðÞF ) G 1 2 1 ki 0 i i F 2 RIðÞ; G 2 Lit meets the requirement of maximal 3: Ci is a sub rule of Ci ; i:e: A &...&A noþ1 1 ki specificity. iþ1 iþ1 A &...&A ; ki < ki ; 1 ki þ1 þ1 2.6. Fixed points of predictions based on MSR rules. : C > C ; 4 gðÞiþ1 gðÞi Solution of a statistical ambiguity problem 5: Cn SPS rule: ð4Þ It shall be proved that any I-S inference is consistent for any set of rules L1; ...; Lm MSR. To do it, an inference A set of all SP-inferences, predicting liter A0 shall be by MSR rules and fixed points of inference as per MSR semantic proba- considered. This set may be presented as a rules shall be considered. bilistic inference tree for liter A0. Definition 8. An immediate successor operator Pr shall be Lemma 2. (See prof in Appendix A). Any probabilistic law defined by rules from P MSR with a set of liters L as C ¼ ðÞA1&:::&Ak ) A0 belongs to some semantic probabilis- follows: tic inference, which predicts liter A0 and, hence, to the tree of PrP ðÞ¼L L [ fA0jC semantic probabilistic inference of liter A0. ¼ ðÞA1&...&Ak ) A0 ; fgA1; ...; Ak L; C 2 Pg

Definition 6. A maximum specific rule MSðÞA0 on M for Definition 9. A fixed point of operator Pr of immediate predicting liter A0 shall be an SPL-rule that has a maxi- mum value of conditional probability among all SPL- successor with respect to a set of liters L shall be a closure of this set of liters with respect to the immediate rules of a semantic probabilistic inference tree for liter 1 successor operator PrP ðÞL , whence it follows that A0. If there are several rules with identical maximum value, 1 L L all of them are maximum specific. PrP PrP ðÞ ¼ . A set of all maximum specific rules shall be denoted as Definition 10. A set of liters L ¼ fgL1; :::; Lk is called a com- MSR. patible set, if gðL1&:::&LkÞ > 0. Proposition 1. Law MSR SPL LP. Definition 11. A set of liters L is consistent, if it does not contain simultaneously atom G and its negation :G. 2.5. Resolving the problem of statistical ambiguity Proposition 2. (See prof in Appendix A). If L is compatible, A requirement of maximal specificity shall be defined in L is consistent. the general case. It shall be supposed that statement H in the Hempel’s formulation of the requirement of maximal It shall be shown that an immediate prediction retains specificity is a sentence H 2 RIðÞof the language L. the property of compatibility. Teopeva5(Vityaev & Martinovich, 2015). (See prof in Definition 7. Rule C ¼ ðÞF ) G ; F 2 RIðÞ, G 2 Lit meets Appendix A). If L is compatible, PrP ðÞL is also compatible, the requirement of maximal specificity (RMS), if it follows P MSR. H RI Fa&Ha from 2 ðÞand ðÞ ðÞfor some a 2 A that rule To prove the theorem, the following lemma shall first be 0 & C ¼ ðÞF H ) G has the same probability proved. gðÞ¼G=F&H gðÞ¼G=F r. Lemma 5. (See prof in Appendix A). If for rules In other words, RMS states that there is no sentence A ¼ A ) G , B ¼ B ):G , A ¼ A &:::&Ak; 1 H 2 RIðÞ, which would increase (or decrease, see the G=F r B ¼ B &:::&Bm, g A &: B > 0, k P 0, m > 0 inequality Lemma below) conditional probability gðÞ¼of the 1 rule by adding it to the premise of the rule. g G= A &: B > g G= A is valid, a rule exists that has a H RI Lemma 3. (See prof in Appendix A). If statement 2 ðÞ strictly higher conditional probability than rule A. decreases probability of rule i.e. gðÞG=F&H < gðÞG=F then the statement :H increases it and gðÞG=F&:H > gðÞG=F . Corollary 2. If L is compatible, then PrP ðÞL is consistent for P MSR. Lemma 4. (See prof in Appendix A). For each rule C ¼ ðÞA1&:::&Ak ) A0 of form (1), there shall be found a Corollary 3 (Solution to a problem of statistical ambigu- 0 0 probabilistic law C ¼ ðÞB &:::&Bk0 ) A , k < k, on M, ity). I-S inference is consistent for any set of laws 1 0 ; ...; C ; :::; C : for which lðÞC0 P lðÞC . P ¼fL1 LmgMSR and set of facts fg1 n E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 239

1 Corollary 4. Fixed points PrP ðÞL for a compatible set of the questions of making ‘‘natural” classification and fixed liters L are compatible and consistent. points by data samplings should be considered. By a sam-

1 pling from a general population we shall mean a sub- A set of all fixed points L ¼ PrMSRðÞN , obtained by all model, BB ¼ hiB; W , where B is a set of objects randomly maximum specific rules on all compatible sets of liters N, selected from general population A. A frequency probabil- shall be designated using Class(M). ity lB shall be specified within a sampling, by assuming lBðÞ¼a 1=N; N ¼jBj, according to which probability g 2.7. ‘‘Natural” classification as a fixed points of prediction B on sentences from RIðÞis determined. On sampling by a maximum specific rules BB ¼ hiB; W , as on a sub-model, a theory Th(B), a set of laws Law(B), a set of probabilistic laws LP(B), a set of Probabilistic formal concepts and ‘‘natural” classifica- the strongest probabilistic laws SPL(B), and a set of maxi- tion shall be specified, using fixed points pursuant to max- mum specific laws MSR(B) might be obtained. imum specific rules. If empirical system M ¼ hiA; W is defined on some fixed set of objects A, the below defined Proposition 3 (ThðMÞThðBÞ.). Since each set of liters, probabilistic formal concepts and ‘‘natural” classifications Sb ¼ fgL 2 Litj Bb L is compatible, since it is obtained on for this set are in agreement. It can be shown (Vityaev, real objects, which have a nonzero probability, hence, a set 1 Morozova, Sutyagin, & Lapardin, 2005; Vityaev & ClassðBÞ¼ L ¼ PrMSRðÞjN N Sb; b 2 B shall be a set Martynovich, 2015), that this definition of ‘‘natural” classi- of all fixed points on sampling BB. Classes Class(B) shall be fication satisfies all the requirements, which natural scien- the analogues of probabilistic formal concepts (Vityaev tists imposed on ‘‘natural” classification. et al., 2012, 2015). However, a potential identification and ” Definition 12. Probabilistic formal concepts and ‘‘natiral” recognition of ‘‘natural classes of a general population by ” classification (Vityaev & Martynovich, 2015). ‘‘natural classes discovered on a sampling, is of concern, rather than probabilistic formal concepts, defined on sampling BB M. In this case, probability l of a general population is unknown, but frequency probability lB 1. A set of all fixed points of Class(M) shall be called a set within a sampling is known. of all probabilistic formal concepts and ‘‘natural” B classes of this empirical system M. Maximum specific laws MSR(B) within sampling B 2. Each class/concept L specifies in empirical system M a may be not the same on a general population. They can set of objects, which belong to a class/concept be considered as approximations of laws MSR(M) in the following sense. Determining maximum specific laws using MLðÞ¼fgb 2 A j Bb L , where Bb ¼ hib; W is a sub- semantic probabilistic inference shall be recalled. In the model of model M ¼ hiA; W generated by object b 2 A. process of inferring a premise of a rule is stepwise devel- 3. A lawful model of class/concept C=hL,Z i shall be L oped through strictly enhancing its conditional probability defined as a set of liters L 2 ClassðMÞ of fixed point and involving as much relevant information as possible, on and a set of rules ZL MSR, applicable to liters from L. accordance with a requirement of maximal specificity, to 4. A generating set of some class/concept C ¼ hiL; ZL shall 1 ensure maximum probable and consistent prediction. Sam- be such a subset of liters N L, that L ¼ PrMSRðÞN . pling BB might facilitate building a tree of semantic proba- 5. A set S of atomic sentences P ðÞa ,j=1,...s, s 6 k shall j bilistic inference through increasing the premise and be called a system-forming, if for each class from applying some statistical criteria for checking the strict Class(M) there is a generating set of liters, which are increase of the conditional probability l on general popu- obtained from the system-forming set of atomic sen- lation M ¼ hiA; W . For this purpose, an exact indepen- tences by taking or not taking the negation. dence criterion of Fischer for contingency tables shall be 6. ADE systematics shall be defined as a set used. Here, by sampling BB ¼ hiB; W a set LPaðÞM of S; ZS ; ZL R ¼ fgi Li2ClassðÞM , where S is a system- probabilistic laws with some confidence level a might be forming set of atomic sentences, ZS a law of systematiza- discovered, where each probabilistic inequality shall be sta- tion, which defines an order of taking negations for tistically validated with confidence level a. By the set Z M atomic sentences from S, fgLi L Class , a set of the LPaðÞ, a set of the strongest probabilistic laws i2 ðÞM M rules for fixed points from Class(M). SPLaðÞmight be found, with confidence level a. Inference by SPLaðÞM rules may be inconsistent, hence, to build fixed points by set SPLaðÞM , it is necessary to use a 2.8. Method of ‘‘natural” data classification weaker criterion of consistency of probabilistic laws in mutual predictions, which assumes occurrence of In the case of ‘‘natural” classification, an empirical sys- inconsistencies. An operator of direct inference PrUSPL M L shall be tem M ¼ hiA; W as a general population, is unknown. aðÞðÞ Only a data sampled from a general population are known. defined for this case. A set of laws, verified within a set Therefore, to develop a method for ‘‘natural” classification, of liters L shall be defined 240 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 8 9 C SPL M ; &:::& A ; þ þ þ SatðLÞ¼fC j 2 aðÞC ¼ ðÞA1 Ak ) 0 <>L[ðÞA0 ; if d ðÞL > d ðÞL ; d ðÞL > 0 => þ fgA1; :::; Ak L; A0 2 Lg; PrUSPL ðÞM ðÞ¼L LAðÞ; if d ðÞL P d ðÞL ; d ðÞL > 0 : a :> 0 ;> L; and a set of laws, disprovable at a set of liters L else

FalðLÞ¼fC jC 2 SPLaðÞM ; C ¼ ðÞA1&:::&Ak ) A0 ; r L L Fixed point P USPLaðÞM ðÞ¼ shall be obtained in the fgA1; :::; Ak L; :A0 2 Lg; ðÞ::A0 ¼ A0 : third case, when adding/deleting an element does not increase the criterion. A set of all such fixed points A criterion Kr of mutual consistency between laws from obtained on all compatible sets of liters L, shall be defined SPLa shall be defined on a set of liters L as: ” M X X as a set of ‘‘natural classes ClassaðÞ. Kr L C C ; C To obtain a regular model of class L Class M a set SPLaðÞM ðÞ¼ mðÞ mðÞwhere mðÞ 2 að Þ C2SatðÞ L C2FalðÞ L of regularities ZL interpredicting the class attributes should ¼logðÞ 1 g ðÞC : be specified. Regularities SatðLÞ represent such regularities. B At the fixed point L disprovable predictions across regular- C Function logðÞ 1 gBðÞincorporates not a probabil- ities FalðLÞ are overlapped with the verified predictions ity itself, but its closeness to 1, since it characterizes a pre- across regularities SatðLÞ. Since, theoretically, with the dictive power of regularity more accurately. known measure l, there shouldn’t be any inconsistencies, Kr Criterion SPLaðÞM specifies not only information mea- regularities FalðLÞ shall be considered obtained due to ran- sure of consistency between regularities, but also an infor- domness of sampling BB. Regularities from SPLaðÞM that mation measure of mutual integration between causal were not included into any set SatðLÞ of any class relationships within a set of liters L. Therefore, this mea- L 2 ClassaðMÞ, are considered as obtained as a result of sure is very close in meaning to entropy measure of inte- retraining by virtue of sampling BB randomness, therefore, grated information stated in (Tononi, 2004). they will be deleted from SPLaðÞM . Operator PrUSPL M ðÞL functions as follows: it either aðÞ Definition 13. A set C ¼ hiL; SatðÞ L shall be called a regular adds one of liters Lit to a set L, which is predicted by reg- model of class L 2 Class ðMÞ. ularities from SatðLÞ, or deletes one of liters of a set L, pre- a dictable by disprovable laws from FalðLÞ. Here, a For ‘‘natural” classes, definitions of generating set, sys- compatibility of laws (their information measure) applica- tem forming set, and systematics from definition 12 remain ble to a set L shall strictly increase, i.e. at every step an unchanged. inequality shall be satisfied The functions of information measure in terms of accu- racy of identifying objects of the external world shall be KrSPL ðÞPrUSPL ðÞL > KrSPL ðÞL : a a a noted: Here, that liter is added/deleted, which results in a maximum increase in the criterion (information measure). 1. It enables to cope with retraining and exclude regulari- If adding/deleting a liter does not increase a criterion, a ties, which are probably obtained due to randomness set L remains unchanged and is a fixed point. Changes of sampling; in criterion, when adding/deleting an element, are, 2. Extraction of the maximum value of information mea- respectively: sure facilitates extracting all overdetermined informa- þ tion, inherent in ‘‘natural” classes; d ðÞ¼L max KrSPL ðÞM ðÞL [ A0 KrSPL ðÞM ðÞL ; A L ; A RL a a 02PrSPLa ðÞ 0 3. Recognition of objects of the external world using overdetermined information and makes this recognition d ðÞ¼L max KrSPL ðÞM ðÞLfA0 KrSPL ðÞM ðÞL : A L ; A L a a 02PrSPLa ðÞ 02 maximally accurate and solves the problem of ‘‘dimen- r L sionality curse”. Operator P USPLaðÞM ðÞadds/deletes that element, which maximizes the value of a criterion. Added/deleted elements are defined as follows: 2.9. Recognition of ‘‘natural” classes A þ Kr L A ; ðÞ0 ¼ argmax SPLaðÞM ðÞ[ 0 A02PrSPL M ðÞL ; A0RL aðÞ Regular models of classes allow recognizing them on A Kr L A : control objects, chosen from a general population. For this ðÞ0 ¼ argmax SPLaðÞM ðÞf 0 A Pr L ; A L 02 SPLaðÞM ðÞ 02 purpose, regular models of classes C ¼ hiL; SatðÞ L in the form of regular matrices shall be defined. For each liter In each case of employing, operator PrUSPL M ðÞL adds/ aðÞ A L the power of its prediction shall be estimated pur- deletes that element, which increases a criterion to the max- 0 2 þ þ þ suant to regularities from SatðLÞ and liters from L as imum, i.e. adds element ðÞA0 ,ifd ðÞL > d ðÞL , d ðÞL > 0 X þ Kr A C and deletes element ðÞA0 ,ifd ðÞL > d ðÞL , d ðÞL > 0. SatðLÞðÞ¼0 mðÞ r L C2SatðÞ L ; C¼ðÞA1&:::&Ak)A0 Thus, operator P USPLaðÞM ðÞis determined as follows: E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 241

Definition 14. Regular matrix of class C ¼ hiL; SatðÞ L shall 3. Results DE M L; Kr A be defined as a tuple C ¼ SatðLÞðÞ0 A L . 02 Let’s illustrate the formation of fixed points for the ‘‘natural” classes/concepts by the computer experiment C L; Sat L Proposition 4. For each class ¼ hiðÞan equation is on coded digits. Digits shall be encoded, as it is shown in true Fig. 2. Attributes of digits shall be enumerated, as indi- cated in Table 1. A training set, consisting of 360 shuffled X digits (12 digits of Fig. 2, which are duplicated in 30 copies KrSPL M L KrSat L A : aðÞðÞ¼ ðÞðÞ0 without specifying where any digit is) shall be formed. On A L 02 this set 55,089 maximum specific rules were found by Using regular matrices, new objects of a general popula- semantic probabilistic inference. They are general state- tion may be recognized and relegated to the ‘‘natural” ments about the objects, mentioned by John St. Mill and classes. Pertinence of object b 2 B to class C ¼ hiL; SatðÞ L W. Whewell. shall be assessed. To do that, it shall be estimated, to what Let XaðÞbe a set of properties (stimulus) of an object a, degree object b 2 B is conform with regularities of class defined by a set of predicates, and SatðÞ L . P i &:::&P i P i MSR X the set of the maximum 1 k ) 0 2 aðÞ specific rules satisfied for properties X, P i ; :::; P i X Definition 15. Pertinence of object b 2 B to class 1 k C ¼ hiL; SatðÞ L shall be defined as with confidential level a. Then we can define operator r P UMSRaðÞX of direct inference. The fixed point is reached n when PrU nþ1 Xa PrUMSR X Xa , for some n, X X MSRaðÞX ðÞ¼ðÞ aðÞðÞðÞ Score b=C Kr A Kr A : r n ðÞ¼ SatðÞ L ðÞ0 SatðÞ L ðÞ0 where P UMSRaðÞX n multiple application of the operator. A L; A S A02ðÞSb\L 02 : 02 b Table 1 Recognition by regular matrices is made the same way Encoding of digital’s fields. as by weight matrices. To do that, for all objects of positive 1234 and negative sampling shall be calculated the ScoreðÞ b=C 5678 for each class, and a threshold value shall be computed that 9101112 provides the required values of the first and second type 13 14 15 16 17 18 19 20 errors. Then new objects of a general set might be recog- 21 22 23 24 nized and relegated to the class, if value ScoreðÞ b=C of this Fields codes. object is higher than the threshold value.

Fig. 2. Digits coding. 242 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246

Fig. 3. Fix-point for digit 6.

r Since in each application of operator P UMSRaðÞX the value other. It is worth mentioning, that regularities, used in Kr of criterion MSRaðÞX increases and at the fixed point the fixed point, are discovered on all digits, but fixed point reaches a local maximum, then the fixed point, when allocates only one digit. This illustrates the phenomenolog- reflecting some ‘‘natural” object, has a maximum informa- ical property of ‘‘information” which states that ‘‘differ- Kr ences that make a difference”. tion measure MSRaðÞX and property of uˆexclusiony´ by G. Tononi. Thus, the system of causal relationships perceives (is According to these regularities, exactly 12 fixed points conscious of) whole object. Thus, digits are identified not are discovered, accurately corresponding to the digits. An by regularities themselves, but their system relationship. example of a fixed point for digit 6 is shown in Fig. 3.It The fixed points forms a ‘‘prototype” by Eleanor Rosch, shall be considered, what this fixed point is. or ‘‘image” by John St. Mill. The program does not know The first regularity of digit 6 in Fig. 3, represented in the in advance which possible combinations of attributes are first box after the brace, means that if in the square 13 correlated with each other. (Table. 1) there is an attribute 6 (it is denote as 13–6), then it must be an attribute 2 in square 3 (it is denote by (3–2)). 4. Discussion The predicted attribute is indicated by a dotted line. This regularity shall be written as (13–6 ) 3–2). It is easy to Theoretical results obtained in the paper suggest that it see that this regularity is really implemented in all the dig- is possible to create a mathematically precise system of its. The second regularity means, that from the attribute reflection of reality, based on the most specific rules and (9–5) and the denial of the value 5 of the first attribute fixed points that use them. First of all, the question arises :(1–5) (the first attribute must not be equal to 5) follows about the functioning of the neuron – is it really detects attribute (4–7). Denial is denoted by a dash line, as shown causal relations in accordance with the semantic probabilis- in the lower part of Fig. 3. Thus, regularity (9–5&:(1–5) ) tic inference. Unfortunately, this is still impossible to ver- 4–7) is obtained. The following 3 regularities in the first ify, because connecting several contacts to one neuron is row of the digit 6, will be (13–6 ) 4–7), (17–5&:(13–5) a deadly number for the neuron itself. However, it can be ) 4–7), (13–6 ) 16–7), respectively. shown that the reflection of causal relationships is able to Fig. 3 shows, that these regularities and the attributes of model a multitude of cognitive functions in accordance digit 6 form a fixed point. They mutually predict each with existing physiological and psychological theories. E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 243

The organization of purposeful behavior is modeled by Since gðÞ¼C 1, rule C cannot be a sub-rule of any other causal relationships between actions and their results probabilistic law, since in this case its conditional probabil- (Vityaev, 2015), which fully corresponds to the theory of ity would be strictly less than conditional probability of functional systems (Anokhin, 1974). Fixed points ade- this rule, what is impossible j quately model the process of perception (Vityaev & Neupokoev, 2014). A set of causal relationships models Proof of the lemma 2. If a probabilistic law has a form expert knowledge (Vityaev, Perlovsky Kovalerchuk, & C ¼)ðÞA0 , it shall be verified, if it is the strongest proba- Speransky, 2013). Therefore, the verification of this formal bilistic law. If it is true, then a semantic probabilistic infer- model for compliance with the actual processes of the brain ence is found, if no, there is a probabilistic law, for which seems to be an important task. this probabilistic law is a sub-rule. It shall be taken as the next rule of semantic probabilistic inference, and it shall be Declaration of Competing Interest again verified, if it is the strongest law, and so on. A sub- rule, which is a probabilistic law, shall be found for prob- A &:::&A A k P The author declare that they have no known competing abilistic law C ¼ ðÞ1 k ) 0 , 1. It always exists, A financial interests or personal relationships that could have since rule C ¼)ðÞ0 is a probabilistic law. Since it is a appeared to influence the work reported in this paper. sub-rule, its conditional probability will be less than a con- ditional probability of the rule as such. It shall be added as the previous rule of a semantic probabilistic inference, and Appendix A. Mathematical proofs the procedure shall be continued.

Proof of the theorem 2. Rule C 2 ThðÞ M is either a law and Proof of the lemma 3. Designations a ¼ gðÞG&F&H , belongs to Law, or a sub-rule exists for it, which is true on b ¼ gðÞF&H ,c¼ gðÞG&F&:H ,d¼ gðÞF&:H shall be M. This sub-rule shall be taken, then again it is either a law, introduced. Then an original inequality or there is a sub-rule for it, true on M, and so on. As a gðÞG= F&H < gðÞG= F shall be re-written as result, the law is obtained, which is a sub-rule of C. Then, a= b < ðÞa þ c =ðÞb þ d ; from which it follows that by virtue of theorem 1, it can be deduced from this law j ðÞa þ c =ðÞb þ d < c=d () gðÞG=F < gðÞG=F&:H j

Proof of the theorem 3. The second exception follows from Proof of the lemma 4. Rule C ¼ ðÞA1&:::&Ak ) A0 is the definition. The first exception shall be considered. If either a probabilistic law, or there is a sub-rule 0 0 rule C 2 L is of the form C ¼)ðÞA0 , it belongs to LP R ¼ ðÞP 1&...&P k0 ) A0 , k P 0, 0 by definition. It shall be supposed that rule fgP 1; ...; P k0 fgA1; ...; Ak , k < k such, that an inequality 0 0 C ¼ ðÞA1&:::&Ak ) A0 is a law within M. It shall be lðÞR P lðÞA is satisfied. Similarly for rule R , it is either a proved that gðA1&:::&AkÞ > 0: If rule C is a law within probabilistic law, or there is a sub-rule with identical prop- M, the sub-ruleðÞ A2&:::&Ak ):A1 is not always true erties for it j within M, and, thus, in some cases conjunction A2&:::&Ak&A1 is true, whence it follows that Proof of theorem 4. It should be proved that for any sen- gðÞA2&:::&Ak&A1 > 0. Then conditional probabilities of tence H 2 RIðÞ,ifFðÞa &HðÞa ; a 2 A is true on M, all sub-rules are defined, since inequality gðÞ¼G=F&H gðÞ¼G=F r is valid. From the &:::& P A &:::&A > Fa&Ha gðÞAi1 Aih gðÞ1 k 0 follows from condition of truth ðÞ ðÞ on M, it follows that ; :::; A ; :::; A & > f Ai1 Aih gfg1 k . It shall be proved that gðÞF H 0 and, hence, a conditional probability is gðÞ¼C 1. defined. A case shall be considered, when H is a liter (B or :B). gðÞ¼C gðÞ¼A0=A1&:::&Ak gðÞA0&A1&:::&Ak =gðÞ¼A1&:::&Ak The opposite shall be assumed that gðÞG=F&H –r. Then, gðÞA &A &:::&Ak =gðÞþA &A &:::&Ak gðÞ:A &A &:::&Ak : 0 1 0 1 0 1 as per lemma 3, one of inequalities gðÞF&B ) G > r or F& B G > r Since rule C is true on M, there are no cases on M, when gðÞ: ) shall be satisfied for one of the rules ðÞF&B ) G or ðÞF&:B ) G . Then, as per lemma 4, a conjunction ðÞ:A0&A1&:::&Ak is true and, hence, probabilistic law C0 exists, which is a sub-rule and has not gðÞ¼:A &A &:::&Ak 0 and gðÞ¼C 1. 0 1 0 > It shall be proved that conditional probability of each lower conditional probability, then gðÞC r. Hence, by lemma 2, probabilistic law C0 belongs to some tree of sub-rule of rule C ¼ ðÞA1&:::&Ak ) A0 is strictly less than C &:::& L semantic probabilistic inference and has a higher value of gðÞ¼1. Each sub-rule ðAi1 Aih ) Þ of rule C is false on M, where L is either liter :A, or A, for the 1st and conditional probability than maximum specific rule G 2nd type sub-rules. It means that gðÞAi &:::&Ai &:L > 0. It MSðÞ, predicting G, which contradicts to maximum 1 h G follows from the last inequality that: specificity MSðÞ. A case shall be considered, when sentence H is a L=A &:::&A A &:::&A &L = A &:::&A gðÞ¼i ih gðÞi ih gðÞ¼i ih 1 1 1 conjunction of two atoms B1&B2, for which a theorem g Ai &:::&Ai &L = g Ai &:::&Ai & L g Ai &:::&Ai &L < 1: ðÞ1 h ðÞðÞ1 h : þ ðÞ1 h has already been proved. Also, the opposite shall be 244 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 i¼ðÞ1;:::1;0 assumed that one of inequalities gðÞG=F&B &B > r, i i 1 2 G= V A & 1 &:::& m : g B1 Bm gðÞG=F&B1&:B2 > r, gðÞG=F&:B1&B2 > r, gðG=F& i¼ðÞ0;:::;0 B & B > r : 1 : 2Þ is valid. Then, by lemma 4 and lemma 2, a probabilistic law C’ exists that belongs to the tree of a It shall be proved that if g G= A &: B > g G= A , one semantic probabilistic inference and is a sub-rule of one of of inequalities shall be satisfied these rules, and is such, as gðÞC0 > r is valid. However, it is i G= A &B 1 &:::&Bim > G= A ; i ; :::; i – ; :::; : impossible, since rule C=(F&H)G) is maximum specific. g 1 m g ðÞ1 m ðÞ1 1 Hence, for all these inequalities only equality = or inequal- ity < may take place. The last case is impossible due to the The opposite shall be assumed: all inequalities are simul- following equality taneously true

gðG&FÞ gðG&F&B &B ÞþgðG&F&:B &B ÞþgðG&F&B &:B ÞþgðG&F&:B &:B Þ r ¼ ¼ 1 2 1 2 1 2 1 2 gðÞF gðF&B1&B2ÞþgðF&:B1&B2ÞþgðF&B1&:B2ÞþgðF&:B1&:B2Þ:

i A case, when sentence H is a conjunction of several G= A &B 1 &:::&Bim 6 G= A ; i ; :::; i – ; :::; : g 1 m g ðÞ1 m ðÞ1 1 atoms or their negations, is called an induction. RI i In the general case, sentence H 2 ðÞmay be pre- in those cases, when g A &B 1 &:::&Bim > 0. Since sented as a disjunction of conjunctions of atoms or their 1 m negations. To complete the proof, suffice it to consider a g A &: B > 0, there are cases, when case, when sentence H is a disjunction of two non- i & A &B 1 &:::&Bim > : intersecting sentences D _ E, gðÞ¼D E 0, for which a g 1 m 0 theorem is already proved, i.e. Then gðÞ¼G=F&D gðÞ¼G=F&E gðÞ¼G=F r: i i g G& A &B 1 &:::&Bim 6 g G= A g A &B 1 &:::&Bim ; Then 1 m 1 m

gðÞG&F&ðÞD_ E gðÞþG&F&D gðÞG&F&E ðÞi1; :::; im –ðÞ1; :::; 1 gðG=F&ðÞÞ¼D _ E ¼ ¼ r: gðÞF&ðÞD _ E gðÞþF&D gðÞF&E i ;::: ; ¼ðÞ1 1 0 i G= V A &B 1 &:::&Bim g 1 m A disjunction case of multiple non-intersecting sentences i¼ðÞ0;:::;0 is proved by induction j i¼ðÞ1;:::1;0 i1 im g V G& A &B &:::&Bm i ;:::; 1 G L ¼ðÞ0 0 Proof of proposition 2. If L is compatible, atom 2 and ¼ i ;::: ; ¼ðÞ1 1 0 i G L V A &B 1 &:::&Bim its negation .: 2 . cannot exist simultaneously, since g 1 m then gð&LÞ 6 gðÞ¼G &:G 0, where &L is a conjunction i¼ðÞ0;:::;0 j i¼ðÞ1;:::1;0 of liters from L P i i G& A & 1 &:::& m g B1 Bm i¼ðÞ0;:::;0 Proof of the lemma 5. A conditional probability shall be ¼ i¼ðÞ1;:::1;0 P i i written as follows A & 1 &:::& m g B1 Bm i¼ðÞ0;:::;0 g G= A &: B ¼ g G= A &ðÞ:B1 _ ::: _:Bm : i¼ðÞ1;:::1;0 P i i g G= A g A &B 1 &:::&B m Disjunction :B1 _ ::: _:Bm shall be presented as dis- 1 m i¼ðÞ0;:::;0 i¼ðÞ1;:::1;0 6 G= A ; i im ¼ g V 1 &:::& i¼ðÞ1P;:::1;0 junction of conjunctions B1 Bm , where i i i¼ðÞ0;:::;0 A & 1 &:::& m g B1 Bm i ;:::; i ¼ ðÞi ; :::; im , i ; :::; im 2 fg0; 1 zero means that there is a ¼ðÞ0 0 1 1 negation with the corresponding atom, and unity means G= A & B > G= A that there is no negation. Disjunction does not involve that contradicts to inequality g : g . ... &:::& set (1, ,1), which corresponds to conjunction B1 Bm. Therefore, this assumption is not valid, and there is a rule as follows Then a conditional probability g G= A &ð:B1 i ::: B A &B 1 &:::&Bim G; i ; :::; i – ; :::; ; _ _: mÞÞ shall be rewritten as 1 m ) ðÞ1 m ðÞ1 1 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246 245 that has a strictly higher assessment of conditional proba- Ganter, B. (2003). Formal concept analysis: Methods, and applications in bility than A j computer science. Dresden, Germany: TU. Ganter, B., & Wille, R. (1999). Formal concept analysis – Mathematical foundations. LNCS (LNAI). Heidelberg: Springer. Proof of theorem 5. It should be proved that, every time a Gopnik, A., Glymour, C., Sobel, D., Schulz, E., & Kushnir, T. (2004). A rule from P MSR is applied, a compatible set of liters is theory of causal learning in children: Causal maps and Bayes nets. once again obtained. The opposite shall be assumed Psychological Review, 111, 3–23. Griffiths, T., & Tenenbaum, J. (2009). Theory-based causal induction. that, when applying some rule A ¼ðA1&:::&Ak ) Psychological Review, 116,56. GÞ; fgA1; :::; Ak L, k > 1 to a set of liters L L ; :::; L Halpern, J. (1990). An analysis of first-order logic of probability. Artificial ¼ fg1 k , a liter G shall be obtained, for which Intelligence, 46, 311–350. lðL1&:::&Ln&GÞ¼0. Hebb, D. (1949). The organization of behavior. A neurophysiological Since for rules inequalities gðG=A1&:::&AkÞ > gðGÞ, theory, NY. Hempel, C. (1965). Aspects of scientific explanation. Aspects of scientific gðA1&:::&AkÞ > 0; gðGÞ > 0 are satisfied, then explanation and other essays in the philosophy of science. New York: gðÞG&A1&:::&Ak > gðÞG gðÞA1&:::&Ak > 0. B ; :::; B L &:::&L left The Free Press. Negations of liters fg1 t ¼ fg1 n Hempel, C. (1968). Maximal specificity and lawlikeness in probabilistic fA1; :::; Akg shall be added to rule A, and rule explanation. Philosophy of Science, 35, 16–33. ðA1&:::&Ak&:ðÞ)B1&:::&Bt GÞ shall be obtained. The Kogara, V. (1982). Classification functions. In Classification theory and following designations shall be made data analysis, part 1; Novosibirsk. (in Russian) Mill, J. (1983). System of Logic, Ratiocinative and Inductive. &Ai A &:::&Ak; &Bj B &:::&Bt, &L L &:::&Ln. ¼ 1 ¼ 1 ¼ 1 Nedelko, V. (2015). On the question of the effectiveness of boosting in the &:::&L &G As it was assumed, lðL1 n Þ¼0 and classification problem. Siberian Journal of Pure and Applied Mathe- &A &&B &L > g i j ¼ gðÞ 0. It shall be proved that in matics, 15(2), 72–89 (in Russian). Ozumi, M., Albantakis, L., & Tononi, G. (2014). From the phenomenol- this case g &Ai&: &Bj > 0. The opposite shall be ogy to the mechanisms of consciousness: Integrated information assumed that g &Ai&: &Bj ¼ 0, then theory 3.0. PLOS Computational Biology, May, 10:5. G&&A & &B 6 &A & &B : Rehder, B. (2003). Categorization as causal reasoning. Cognitive Science, g ðÞi : j g i : j ¼ 0 27, 709–748. Whence it follows that Rehder, Bob, & Martin, J. (2011). Towards a generative model of causal cycles. In 33rd annual meeting of the cognitive science society 2011, Boston, Massachusetts, V.1, pp. 2944–2949. 0 ¼ lðÞ¼&L&G g G&&ðÞAi &&Bj Rehder, B., & Hastie, R. (2001). Causal knowledge and categories: The ¼ g G&&ðÞAi g G&&ðÞAi &: &Bj effects of causal beliefs on categorization, induction, and similarity. Journal of Experimental : General, 130, 323–360. ¼ gðÞG&A1&:::&Ak > gðÞG gðÞA1&:::&Ak > 0: Rosch, E. (1973). Natural categories. , 4, 328–350. A contradiction is obtained. Then Rosch, E. (1978). Principles of categorization. In E. Rosch & B. Lloyd G&&A & &B (Eds.), Cognition and categorization (pp. 27–48). Hillsdale: Lawrence g ðÞi : j Erlbaum Associates, Publishers. g G=&Ai&: &Bj ¼ g &Ai&: &Bj Rosch, E., Mervis, C., Gray, W., Johnson, D., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, gðÞG&&ðÞAi g G&&ðÞAi &&Bj 382–439. ¼ gðÞ&Ai g &Ai&&Bj Ross, B., Taylor, E., Middleton, E., & Nokes, T. (2008). Concept and category learning in humans V.2. In J. Byrne (Ed.), Learning and gðÞG&&ðÞAi gðÞ&L&G ¼ memory: A comprehensive reference (pp. 535–556). Oxford: Elsevier. gðÞ&Ai g &Ai&&Bj Rutkovskii, L. (1884). Elementary logic textbook; Spt. (in Russian) Sloman, S., Love, B., & Ahn, W. (1998). Feature centrality and conceptual gðÞG&&ðÞAi gðÞG&&ðÞAi ¼ > coherence. Cognitive Science, 22, 189–228. gðÞ&Ai g &Ai&&Bj gðÞ&Ai Smirnof, E. (1938). Constructions of forms from the taxonomic view. G=A &:::&A : Zool. Jour., 17(3), 387–418 (in Russian). ¼ gðÞ1 k Tononi, G. (2004). An information integration theory of consciousness. Then, by virtue of lemmas 2, 4, 5, it shall be found that BMC Neuroscience, 5,42. Tononi, G., Boly, M., Massimini, M., & Koch, C. (2016). Integrated there exists a probabilistic law with a higher conditional information theory: From consciousness to its physical substrate. probability than rule A, which contradicts to a maximum Nature Reviews Neuroscience, 450–461. specificity of rule A j Vityaev, E. (2006). The logic of prediction. In S. Goncharov, R. Downey, & H. Ono (Eds.), Mathematical logic in Asia. Proceedings of the 9th Asian logic conference, Novosibirsk, Russia, August 16-19, 2005 References (pp. 263–276). Singapore: World Scientific. Vityaev, E. (2013). A formal model of neuron that provides consistent Ahn, W., Kim, N., Lassaline, M., & Dennis, M. (2000). Causal status as a predictions Advances in Intelligent Systems and Computing, V.196. In determinant of feature centrality. Cognitive Psychology, 41, 361–416. A. Chella, R. Pirrone, R. Sorbello, & K. Johannsdottir (Eds.), Anokhin, P. K. (1974). Biology and neurophysiology of the conditioned Biologically inspired cognitive architectures 2012. Proceedings of the reflex and its role in adaptive behaviour. Oxford etc.: Pergamon Press third annual meeting of the BICA society (pp. 339–344). Heidelberg, (pp. 574). . New York, Dordrecht, London: Springer. Carnap, R. (1966). Philosophical foundations of physics. Basic Books. Vityaev, E. (2015). Purposefulness as a principle of brain activity Cheng, P. (1997). From covariation to causation: A causal power theory. Cognitive systems monographs, V.25, Chapter No.:13. In M. Nadin Psychological Review, 104, 367–405. (Ed.), Anticipation: Learning from the past (pp. 231–254). Springer. 246 E. Vityaev / Cognitive Systems Research 59 (2020) 231–246

Vityaev, E., & Martinovich, V. (2015). Probabilistic formal concepts with tive architectures. Special issue: Papers from the fourth annual meeting negation. In: PCI 2014, Voronkov, A., Virbitskaite, I., Eds.; LNCS of the BICA society (BICA 2013), v.6, pp. 159–168. 8974, pp. 385–399. Vityaev, E., & Martynovich, V. (2015). ‘‘Natural” classification and Vityaev, E., Morozova, N., Sutyagin, A., & Lapardin, K. (2005). Natural systematic formalization as a fix-point of predictions. Siberian Elec- classification as a law of nature. In Structural regularities analysis tronic Mathematical Reports, 12, 1006–1031 (in Russian). (Comp. syst. issue 174); Novosibirsk, pp. 80–92. (in Russian). Vityaev, E., & Neupokoev, N. (2014). Perception formal model based on Vityaev, E., Demin, A., & Ponomaryov, D. (2012). Probabilistic gener- fix-point of predictions. In V. Red’ko (Ed.), Approaches to mind alization of formal concepts. Programming and Computer Software, 38 modeling (pp. 155–172). Moscow: URSS Editorials (in Russian). (5), 219–230. Wilkins, J., & Ebach, M. (2013). The nature of classification. Relationships Vityaev, E., Perlovsky, L., Kovalerchuk, B., & Speransky S. (2013). and kinds in the natural sciences. Palgrave Macmillan (pp. 208). Probabilistic dynamic logic of cognition. Biologically inspired cogni- Palgrave Macmillan.