ORIGINAL RESEARCH published: 15 July 2015 doi: 10.3389/fncom.2015.00090

Pattern activation/recognition theory of mind

Bertrand du Castel *

Schlumberger Research, Houston, TX, USA

In his 2012 book How to Create a Mind, defines a “ Theory of Mind” that states that the uses millions of pattern recognizers, plus modules to check, organize, and augment them. In this article, I further the theory to go beyond pattern recognition and include also pattern activation, thus encompassing both sensory and motor functions. In addition, I treat checking, organizing, and augmentation as patterns of patterns instead of separate modules, therefore handling them the same as patterns in general. Henceforth I put forward a unified theory I call “Pattern Activation/Recognition Theory of Mind.” While the original theory was based on hierarchical hidden Markov models, this evolution is based on their precursor: stochastic grammars. I demonstrate that a class of self-describing stochastic grammars allows for unifying pattern activation, recognition, organization, consistency checking, metaphor, and learning, into a single theory that expresses patterns throughout. I have implemented the model as a probabilistic programming language specialized in activation/recognition grammatical and neural operations. I use this prototype to compute and present diagrams for each stochastic grammar and corresponding neural circuit. I then discuss the theory as it relates to artificial network developments, common coding, neural reuse, and unity of mind, concluding by proposing potential paths to validation.

Edited by: Keywords: recurrent, neural, stochastic, autapse, self-description, metaphor, grammar, hylomorphism Hava T. Siegelmann, Rutgers University, USA Reviewed by: Introduction Cees van Leeuwen, KU Leuven, Belgium In his book How to Create a Mind (Kurzweil, 2012), Ray Kurzweil proposes to model the brain Kyle I. Harrington, with a unified processing paradigm (pp. 5, 7, 23) that consists in a hierarchy of self-organizing Harvard Medical School, USA pattern routines (pp. 29, 172) “. . . constantly predicting the future and hypothesizing what we *Correspondence: will experience” (pp. 31) and “. . . involved in our ability to recognize objects and situations” (pp. Bertrand du Castel, 32). Furthermore, Kurzweil proposes that the model be implemented with hierarchical hidden Schlumberger Research, 5599 San Markov models (HHMMs, p. 144) whose parameters are learned via genetic algorithms (p. Felipe St., Houston, TX 77056, USA 146). [email protected] Importantly, Kurzweil calls his model “Pattern Recognition Theory of Mind.” While I fully concur to the hypothesis of a unified processing paradigm involving patterns, I suggest in this Received: 25 October 2014 article that the model Kurzweil presents in his book does not fully achieve his goal and that it can Accepted: 25 June 2015 Published: 15 July 2015 be more ambitious. First, I note that Kurzweil adds to the central pattern recognition processing two modules: a “critical thinking module” (p. 175) that checks for consistency of patterns, and an Citation: du Castel B (2015) Pattern “open question module” (p. 175) that uses metaphors to create new patterns. These two modules activation/recognition theory of mind. invoke additional, separate apparatus to that of his pattern recognition modules. Second, Kurzweil Front. Comput. Neurosci. 9:90. proposes that self-organization of pattern recognition modules is achieved via linear programming doi: 10.3389/fncom.2015.00090 (p. 174), which is yet another apparatus.

Frontiers in Computational Neuroscience | www.frontiersin.org 1 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

I suggest that while functions of consistency checking, presentation, learning afforded by self-description. Diagrams metaphorical creation, and self-organization are indeed needed, are produced by running a prototype implementation of the their treatment should not require extra mechanisms. The theory model in the form of a probabilistic programming language should handle them just as any other pattern, since they are belonging to the lineage of PRISM (Sato and Kameya, 1997) actually patterns of patterns, an assertion that I will formally and Church (Goodman et al., 2008), but dedicated to the substantiate further on. Moreover, I consider that limiting self-describing activation/recognition grammatical and neural pattern processing to recognition is limiting the scope of a operations described in this article. unified processing paradigm. Indeed, patterns of activation are In summary, I present here a model that augments and not different from patterns of recognition; for example, drawing completes Kurzweil’s software ambition and has a natural a circle or recognizing a circle both involve the same pattern interpretation in terms of the brain’s hardware. Subsequently, I (circle), and so the same unified pattern mechanism should apply discuss the theory as it relates to artificial network developments, in both cases, with the difference being determined only by the common coding, neural reuse, and unity of mind, and I propose type of action (Mumford and Desolneux, 2010). In fact, Kurzweil potential paths to validation. Finally, Kurzweil places evolution does mention that HHMMs can be used in both activation and at the center of his learning model, with genetic algorithms. I recognition (p. 143); but he did not include that in his model, nor prefer here to use instead swarming as a learning mechanism, as did he consider them being used at the same time in activation evolution is a phylogenetic property (which happens over time), and recognition, a key element of my analysis going forward. while swarming is an ontogenetic property (which happens in In this article, I therefore advance an evolution of the real time). Swarming is readily achieved with neuronal circuits in model, extending it to both activation and recognition. I call lower and higher animals, and therefore readily asserted in neural it “Pattern Activation/Recognition Theory of Mind.” It is based circuits. With this considered, I can now describe the Pattern on stochastic grammars, i.e., the superset model from which Activation/Recognition Theory of Mind. HHMMs were originally derived (Fine et al., 1998). Stochastic grammars differ from HHMMs in that they are fully recursive, Materials and Methods or, said otherwise, fully recurrent. I will demonstrate that they are capable of self-description, allowing them to handle both I first introduce activation/recognition grammars, a class patterns and patterns of patterns. Consequently, they can account of probabilistic context-free grammars that can function for pattern consistency, since consistency is a pattern of patterns; both separately and simultaneously in activation and similarly, stochastic grammars can use metaphors to create new recognition. I show that activation/recognition grammars patterns in a process which also involves patterns of patterns. have the power needed for biologically-inspired computations, Since the self-describing stochastic grammars I present can both that is, the power of Turing machines, while providing an activate and recognize patterns, they meld in a single model the additional level of expressiveness and theory that maps neural motor and sensory experiences of the brain. circuitry. I demonstrate with an example from vision how a More generally, I will show that in addition to providing probabilistic version of these grammars can learn colors through a unified processing paradigm, self-describing stochastic reinforcement by expressing a swarm of lower grammars. I then grammars capable of both activation and recognition have a show with metaphor and composition that this mechanism natural mapping to neuronal circuits. Recursion (recurrence), generalizes up to a self-describing grammar that provides the which enables self-description, is a property of neural circuits, root of a hierarchical and recursive unified model of pattern an example being a neuron whose axon feeds back to the activation and recognition in neural circuits. neuron itself via an autapse (van der Loos and Glaser, 1972). In addition, stochastic grammars probabilities express the Activation/Recognition stochastic nature of synapses with excitation (more likely to I now introduce grammars that can simultaneously activate and trigger; probability above .5) and inhibition (less likely to trigger; recognize patterns. probability below .5). Furthermore, activation is what an axon A grammar executes a set of rules until rules no longer does, while recognition is what a dendrite does. Self-description apply (Chomsky, 1957). Grammars herein are probabilistic (a.k.a. allows reading, modifying, and creating neuronal circuitry, stochastic) context-free grammars without null symbol (Booth, modeled by stochastic grammars both activating and recognizing 1969; Chi, 1999). They are coded in a minimal subset of Wirth’s other stochastic grammars. This is equivalent to saying that syntax notation (Wirth, 1977), augmented with probabilities. patterns can both activate and recognize other patterns, thus These grammars are not traditional, because they are not limited providing a general mechanism to create, modify, and exercise to functioning solely either in activation or recognition; instead, patterns and patterns of patterns for both activation and they can function both separately and simultaneously in activation recognition. and recognition, which I will show is a small but critically In order to make the correspondence between stochastic consequential specification in regard to biologically-inspired grammars and neural circuits both clear and explicit, I interpretation of grammar theory and practice. present each grammar of the text together with an associated Grammar “Draw = DrawSquare DrawCircle DrawTriangle.” neural diagram. Neural properties to watch for are those that produces a square, a circle, and a triangle. Conversely, when map stochastic grammars properties: hierarchy, recurrence, presented with a square, a circle, and a triangle, grammar directionality, probabilities, and, most importantly for this “Spot = SpotSquare SpotCircle SpotTriangle.” recognizes them.

Frontiers in Computational Neuroscience | www.frontiersin.org 2 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

Production and recognition can mix, as with grammar “Mix = DrawSquare. C = D. D = DrawCircle.” is not equivalent to DrawSquare SpotCircle DrawTriangle.” This grammar recognizes preceding grammars, as drawing a square requires the application a circle, and produces a square and a triangle. of two rules, whereas drawing a circle requires the application of Formally, the three grammars above are unified in a single three rules; consequently, the square is ordered ahead of the circle paradigm of production and recognition; this is captured by (Figure 2). a common rule “A = B C D.” that expresses, with arbitrary Further varying the rules, infinity of situations can be symbols (de Saussure, 1916), the abstract pattern constituted by conveyed. For example, grammar “A = Look B. B = See C. any three entities. This pattern is then specialized with rules C = Describe.” expresses looking for an object and, on seeing that terminate abstract symbols with activation and recognition it, describing it (Stock and Stock, 2004). Abstract symbols such functions. The first grammar above is thus formally written “A = as A, B, and C that expand to other symbols, are called non- B C D. B = DrawSquare. C = DrawCircle. D = DrawTriangle.” terminal and have no other role than structural. Symbols such as while the second one is written “A = B C D. B = SpotSquare. C = DrawSquare, SpotCircle, Look, and See that do not expand further, SpotCircle. D = SpotTriangle.”andthethirdone“A = B C D. B = are called terminal and have a functional role; as explained later, DrawSquare. C = SpotCircle. D = DrawTriangle.” (Figure 1). they can also be placeholders for further grammatical expansion. In activation/recognition grammar rules, sequencing does not Terminals can be actuators or sensors; DrawSquare and Look are convey order, except that recognition comes ahead of production. actuators, as they produce output, while SpotCircle and See are Grammars “A = B C. B = DrawSquare. C = DrawCircle.” and sensors, as they recognize input (Figure 3). “A = B C. B = DrawCircle. C = DrawSquare.” are equivalent. Taken as a whole, the above account of grammars differs They both produce a square and a circle, in any order. The from tradition (du Castel, 1978) only in that it melds number of rules involved in a derivation conveys order, whether activation and recognition, or, otherwise stated, actuation, for production or recognition. In the two previous examples, and sensing, or action/perception (Clark, 2013). Grammars the number of rules needed to produce a square and a circle have long been known to work alternatively in production is the same for both grammars: two rules are involved, so the (“performance”) and recognition (“competence”), but their grammars are equivalent. But the grammar “A = B C. B = working simultaneously in both modes is new; and with this

FIGURE 1 | Three grammars are shown in the top row, while the arrow), and when it is recognizing (up arrow). In neural descriptions, second row shows a graphical form of the grammars, and the an inverted triangle denotes a soma, while a segment cut by an arc third row presents their associated neural circuits. Arrows in denotes an axonal extension and a synapse, further connecting to a grammatical descriptions indicate when a grammar is activating (down dendrite.

Frontiers in Computational Neuroscience | www.frontiersin.org 3 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 2 | Four grammars are shown with their neural equivalent to the two preceding ones, as three rule applications are counterpart. The two first grammars are equivalent, as their ordering required for the circle, and two for the square. The fourth grammar is is not set by rule sequencing but rather by the number of rules different from the three preceding grammars, as the square requires applied; the production of the square requires just two rule three rule applications, and is therefore ordered after the circle, which applications, the same as for the circle. The third grammar is not takes two.

Rules can have alternates, marked by the symbol “| ” (or) and weighted with probabilities. Grammar “A = .3 B | .7 C. B = DrawSquare. C = DrawCircle.” prints a square with probability .3 and a circle with probability .7, such that the mean of the grammar’s distribution is 30% squares and 70% circles. Grammar “A = .3 B | .7 C. B = SpotSquare. C = SpotCircle.” recognizes a square with probability .3, and a circle with probability .7, such that the grammar’s expectation is 30% squares and 70% circles. Note that grammar probabilities sum up to 1 for convenience of presentation, but this is not necessary; weights can substitute to probabilities without affecting the descriptive value of a grammar FIGURE 3 | This grammar and its neural equivalent demonstrate (Smith and Johnson, 2007). For style, unspecified probabilities variety of purpose; here, actions are described instead of geometrical are read as equipartition, such that grammar “A = B | C | D | figures. A, B, and C are non-terminals, while Look, See, and Describe are = terminals. Look and Describe are in production, See is in recognition. The E.” is actually “A .25 B | .25 C | .25 D | .25 E.” (Figure 4). particular grammatical and neuronal pattern of this example will be repeatedly Completing this enumeration of fundamental properties of found further down in self-describing grammars and their neural equivalents. activation/recognition grammars, non-terminal symbols can be used in feedback loops (Bellman, 1986; Buzsáki, 2006; Joshi et al., 2007). Grammar “A = DrawSquare A.” repeats producing small addition, activation/recognition grammars find a new role squares to infinity, while grammar “A = SpotCircle B. B = in biologically-inspired computation. DrawSquare A.” only repeats as long as circles are recognized, drawing as many squares as there are circles (Figure 5). Expressivity Equipped with alternates and recursion, activation/ I now add probabilities to activation/recognition grammars, and recognition grammars can express Turing machines (Turing, I show that these grammars have the same power as Turing 1936). For proof, I consider the seminal Turing example, which machines, while presenting additional modeling capabilities. is the generation of the infinite sequence 001011011101111... The

Frontiers in Computational Neuroscience | www.frontiersin.org 4 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 4 | Three grammars and their neural equivalent are presented, with probabilities. In grammars, alternate paths are marked by a horizontal bar, with corresponding probabilities indicated for each path. In neural circuits, probabilities are associated with synapses.

FIGURE 5 | Two recursive (recurrent) grammars are shown with their grammatical formulation. The recurring synapse of the first circuit connects neural circuits. In the grammatical schema, the recursion is only shown the soma onto itself, so it categorizes as autapse. The second neural for three levels for reason of presentation (in further figures it is typically cut circuit has two synapses terminating on the same soma, one being to one or two for the sake of clarity). The neural diagram does not suffer recurrent. The recurrent synapse of the second circuit is categorized as from the same limitation of presentation, so it is faithful to the original regular, as it connects one soma to a different one.

Frontiers in Computational Neuroscience | www.frontiersin.org 5 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind following activation/recognition grammar transliterates Turing’s concepts that are not explicit in Turing machines. It is this extra specification using its exact original notation: “b = PeRPeRP0 modeling capability that I use to build the theoretical apparatus RRP0LL o. o = 1RPxLLL o | 0 q. q = Any RR q | None P1 that on one hand meets the extra requirements I have put on L p. Any = 0 | 1. p = xER q | eR f | None L L p. f = Any RR Kurtzweil’s original theory, and on the other finds a natural f | None P0 L L o.” In the terminology of activation/recognition expression in neural circuits. grammars, terminals Pe (Print e), R (Move right), P0 (Print Now, while it is yet possible to suggest that they are natural 0), L (Move left), Px (Print x), P1 (Print 1), and E (Erase), models of biologically-inspired processes, a plausible origin has are actuators, while terminals 1 (Read 1), 0 (Read 0), None to be found for activation/recognition grammars. (Read None; meaning blank square), x (Read x), and e (Read e), are sensors. Therefore, mixed production and recognition Learning overcomes computational limitations of traditional context-free I now show that probabilistic activation/recognition grammars grammars (Chomsky and Schützenberger, 1963), allowing them can learn with swarms, using color learning as an example. to perform most biologically-inspired computations, if not all The use of probabilities allows producing and recognizing (Penrose, 1989; Siegelmann, 1995)(Figure 6). patterns expressing gradients. Assuming base colors red and Turing’s infinite sequence is actually an enumeration of the green, grammars can produce from them composite colors, natural number set, perhaps not a biological object. However, such as yellow grammar “A = B A. B = .5 C | .5 D. C = recognition of numbers is certainly a common and early human PrintGreenPoint. D = PrintRedPoint.” and orange grammar activity (Libertus et al., 2011). Grammar “A = B | C. B = 0 | 1. “A = B A. B = .61 C | .39 D. C = PrintGreenPoint. D = C = B A.” expresses any digit sequence; the first rule differentiates PrintRedPoint.” or gold grammar “A = B A. B = .54 C | .46 D. finishing a sequence and pursuing one, the second rule produces C = PrintGreenPoint. D = PrintRedPoint.” Replacing actuators digits, and the third rule adds to sequences. Following the same by sensors, “C = ReadGreenPoint. D = ReadRedPoint.” lets pattern, but with different terminals, grammar “A = B | C. grammars recognize, instead of produce, corresponding colors B = DrawSquare | DrawCircle. C = B A.” produces a number with variations respecting probability distributions. Yellow, of squares and circles in sequences mapping the production of orange, and gold grammars are very similar, since they differ only digits, grammar “A = B | C. B = SpotSquare | SpotCircle. C = B by their gradient probabilities (Figure 8). A.” recognizes such a sequence, and grammar “A = B | C. B = The purpose of this section is to study reinforcement learning SpotSquare | DrawCircle. C = B A.” mixes identifying squares with grammar swarms. Swarms are well-understood (Kennedy and producing circles. The non-terminal grammar part common and Eberhart, 1995), and grammar swarms have been studied in to digits and geometrical figures expresses a metaphor from one production (von Mammen and Jacob, 2009). Here I will consider domain (digits) to another (figures), a subject I will come back to one aspect of swarm learning, albeit a critical one, in recognition. further on (Figure 7). I will consider the capability of a swarm to select for the orange I have showed that activation/recognition grammars have color. The problem is equivalent to that of a swarm exploring for the same power as Turing machines, to formally demonstrate food (Dasgupta et al., 2010); instead of sampling a geometrical that these grammars are universal. However, they are more space, the swarm samples the green-red spectrum. The swarm is than Turing machines. They constitute a higher-level model made of color grammars with different probabilities. The swarm that expresses in particular hierarchical and recursive (recurrent) is presented with varied colors; when a color is presented, the

FIGURE 6 | Here is the Turing grammar with its neural circuit. Note that the grammar has actually been simplified for the sake of presentation, as ordering should be introduced following the precepts previously enunciated regarding the number of rules executed.

Frontiers in Computational Neuroscience | www.frontiersin.org 6 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 7 | Varying terminals. The first grammar and its neural by recognition of a square, and production of 1 by production of a circuit produce binary digit sequences. The second grammar is the circle. As will be discussed further down, this constitutes a same as the first one except for its terminals. Instead of producing metaphor, applying a set pattern (expressed by the non-terminal part digits 0 and 1, it recognizes squares and produces circles. It does of the grammar) from one domain (digits) to another (geometrical that following the same pattern, where production of 0 is replaced figures) by varying the terminals.

FIGURE 8 | Yellow, orange, and gold grammars are shown with their neural circuits. The only difference between these three grammars is probabilities, which express the mixture of green and red constitutive of each color.

Frontiers in Computational Neuroscience | www.frontiersin.org 7 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind grammar of the swarm closest to that color recognizes it. If the recognized color is close to orange, the described grammar produces orange in return. This is the essence of swarm learning, as the swarm can then be modified to concentrate more and more on the exact orange color, a process that I do not describe here but that can be performed by the methods presented in the next section. Color grammars presented above differ only by their probabilities. Grammars can themselves be represented by grammars (Harris, 1968). Therefore, patterns (such as individual color grammars, with set probabilities) can be represented by other patterns (such as general color grammars, with varying probabilities). My demonstration consists in showing that a grammar can describe another one in a way that expresses the swarm capability sought. For this, I first need to introduce a simple case of a grammar FIGURE 9 | Simple description. The first grammar is the very simple describing another one. The most simple grammar I can consider grammar. The second grammar describes the first grammar, using the quote is “A = PrintGreenPoint.” The key elements of the grammar are operator, to point to appropriate nodes of the grammar described. The second A and PrintGreenPoint. The equal sign (“=”) and dot (“.”) are neural circuit combines the neural circuits of the describing and described markers that are necessary in the linear form of grammar writing, grammars. The grammatical function of the quote operator is fulfilled by the projection of the neural circuit of the describing grammar to the neural circuit of but that do not need to be reproduced in a grammar-describing the described grammar. grammar, because their function is accomplished by the structure of that grammar (cf. Methods Summary further below). Thus, grammar “A = PrintGreenPoint.” is fully described by grammar H = SpotRedPoint. D = .9 J | .1 K. J = SpotGreenPoint. K = “A = QuoteA B. B = QuotePrintGreenPoint.” This describing SpotRedPoint.” This assembly of grammars all similar but for grammar uses the quote operator. This operator allows specifying one element is called a grammar swarm (von Mammen and a symbol that is not otherwise active, which is necessary to avoid Jacob, 2009). When presented with a color, this swarm of three confusion between the non-terminal node A of the describing grammars activates the grammar closest in color to the input grammar and the terminal node A referring to the described (Figure 11). grammar. This operator, recognized as such by Harris (Harris, The swarm is then a tool of color detection. Let’s assume 1968), is that of Gödel’s proposition-describing propositions that the swarm receives as input the color orange, that is, a (Gödel, 1931/1986); it was also identified, independently, by mix of .61 green and .39 red. The branch of the swarm that McCarthy (1960) and Wirth (1977). Without this operator, the detects that color is the second branch of the grammar since its describing grammar would go into inappropriate recursion. probabilities, namely .6 and .4, are the closest to those of orange Instead it describes the simple grammar as desired; expressing (the distribution of .6 and .4 is closest to that of .61 and .39). that the recursion domain of the described grammar is different For that detection to be confirmed, this swarm branch needs to from that of the describing grammar (Figure 9). be connected to an orange production which acknowledges the I now consider the slightly more complex grammar “A = detection and therefore presents an input for reinforcement. This .61 B | .39 C. B = PrintGreenPoint. C = PrintRedPoint.” It is a capability is present once the swarm expands into production subset of the orange grammar, with probabilities expressing the rules, as in “A = B C. B = .6 D | .4 E. D = SpotGreenPoint. proper mix of green and red that produces orange. Grammar E = SpotRedPoint. C = F G H. F = QuoteA J. J = K L. “A = B C D. B = QuoteA E. E = F G. F = QuotePoint61 K = QuotePoint6 QuoteC. L = QuotePoint4 QuoteD. G = QuoteB. G = QuotePoint39 QuoteC. C = QuoteB H. H = QuoteC M. M = QuotePrintGreenPoint. H = QuoteD N. N = QuotePrintGreenPoint. D = QuoteC I. I = QuotePrintRedPoint.” QuotePrintRedPoint.” When the second branch is triggered, it describes this orange grammar subset, including probabilities allows closure of the reinforcement loop, which is validated by (Figure 10). presentation of the expected orange color (Figure 12). The more complex orange grammar is just an expansion of Any other gradient can be substituted to red-green, be it a the simpler one to multiple points. For the sake of simplicity, different spectrum, tone, or other probabilistically-determined I will therefore consider the simplified orange grammar for base for classification. However, this example of linking input my demonstration. Varying the probabilities of the grammar, to output for reinforcement learning needs to be augmented it is possible to produce any color grammar of the green-red with more powerful capabilities. Consequently, I turn to spectrum. For example, it is possible to produce a grammar that demonstrating that the model is both general and organized. recognizes colors with probabilities .5 and .5, as easily as one recognizing probabilities .6 and .4, or one recognizing .9 and .1. Self-Description It is then possible to produce these three grammars at once with I now show that probabilistic activation/recognition grammars grammar “A = B | C | D. B = .5 E | .5 F. E = SpotGreenPoint. can self-describe, allowing patterns to activate/recognize other F = SpotRedPoint. C = .6 G | .4 H. G = SpotGreenPoint. patterns, such as with metaphor and composition.

Frontiers in Computational Neuroscience | www.frontiersin.org 8 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 10 | Probabilistic description. The first grammar is a subset of described grammars. Probabilities quoted in the describing grammar project the orange grammar. The second grammar describes the first grammar. The to the described grammar, just as probabilities referred by the describing second neural circuit combines the neural circuits of the describing and circuit project to the circuit described.

FIGURE 11 | A grammar swarm. This is a swarm of three color grammars, side by side with its neural equivalent.

Frontiers in Computational Neuroscience | www.frontiersin.org 9 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 12 | Reinforcement learning. The first grammar is the target second part is the description of the target orange grammar. In the orange grammar. The second grammar is the branch of the grammar corresponding neural circuit, the top part is the trigger, the bottom right swarm that is triggered when presented with a color close to orange. part is the description, and the bottom left part is the activated circuitry The first part of the second grammar is the triggering mechanism; the resolving to orange.

Grammars-describing grammars can be recursively defined This begs studying grammar learning over the full upward in an increasing generalization. The fixed point of that activation/recognition grammatical landscape. Kutzweil’s recursion is an ultimate grammar that can produce and recognize two separate modules that I have not yet integrated in the theory any grammar, including self. This root grammar “A = .5 B | .5 C. are dealing with augmentation and organization. Augmentation B = Symbol D. D = .5 E | .5 F. E = Weight G. G = .5 Symbol | .5 is related to metaphor, the capability of creating new patterns as H. H = Symbol G. F = E D. C = B A.” defines the most general well as applying existing patterns to new domains. Organization pattern, from which all other patterns derive (cf. Methods is related to the general problem of associating patterns with Summary, further below). Like all activation/recognition one another. Metaphors allow reusing circuitry: for example, if a grammars, this top grammar can function in production, particular circuit counts items, one cannot imagine this circuit recognition, and mixed mode. The top grammar provides for a being replicated for the infinite number of possible items that can unified model since it covers all patterns (Figure 13). be counted. Therefore, there needs to be a mechanism that allows

Frontiers in Computational Neuroscience | www.frontiersin.org 10 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 13 | The root grammar can describe any grammar, including itself. Like the grammar it corresponds to, the neural circuit self-describes, providing root circuitry from which all circuits can derive. reusing the counting circuit to apply it to different items, whichis C = PrintRedPoint.” Digit grammar outputs digits, and orange what I will present. Organizational capabilities are an extension grammar outputs points. Let’s imagine that instead of producing of metaphoric capabilities. If a circuit counts items, it should not the digit 1, I want the digit grammar to produce an orange point, only be capable to count any item, it should also be able to count therefore producing a pattern of orange points that follows the anything; for example, the number of times an orange color is pattern of 1’s of the digit grammar. A way to do this could produced. In the following, I show that self-description allows be to just replace digit 1 of the digit grammar by an orange handling both metaphor and organization within the Pattern grammar expansion, as in “A = B | C. B = 0 | D. C = B A. Activation/Recognition Theory of Mind. D = .61 E | .39 F. E = PrintGreenPoint. F = PrintRedPoint.” Regarding metaphors, a pattern can be described by another While this static combination produces the desired result, it one, while the describing grammar can also perform other does not provide a plausible model for the brain. Like in the operations, a feature I have already used for swarms. For the sake case of metaphor discussed above, this process of combining of presentation, I consider a very simple pattern, that of counting grammars directly would accrue an infinity of combinations in two squares, done by grammar “A = B C. B = DrawSquare. the brain, not a possibility. What is needed is rather a means C = DrawSquare.” This grammar is described by grammar “A = to combine the two circuits while preserving them, so that they B C D. B = QuoteA E. E = QuoteB QuoteC. C = QuoteB can be combined in an infinite number of ways in a dynamic F. F = DrawSquare. D = QuoteC G. G = DrawSquare.” This fashion. Here again, we turn to self-description, by invoking a describing grammar can be augmented to draw a circle each time grammar that describes these two grammars and combines them it detects a square, with “A = B C D. B = QuoteA E. E = QuoteB while keeping them intact. This is achieved by grammar “A = QuoteC. C = QuoteB F. F = UnquoteDrawSquare DrawCircle. B C D. B = QuoteA E. E = QuoteB QuoteC. C = QuoteB F. D = QuoteC G. G = UnquoteDrawSquare DrawCircle.” The F = Quote0 J. J = Unquote1 K. D = QuoteC G. G = QuoteB unquote operator is the reverse of the quote operator. With H. H = QuoteA. K = L M N. L = QuoteD O. O = P Q. the addition of the new rules, instead of producing a square, P = QuotePoint61 QuoteE. Q = QuotePoint39 QuoteF. M = the described grammar forwards the drawing operation to QuoteE R. R = QuotePrintGreenPoint. N = QuoteF S. S = the describing grammar that then produces a circle. In other QuotePrintRedPoint.” In addition to providing a means to reuse words, the describing grammar retargets the counting pattern circuitry, this dynamic combination, by leaving the two circuits from one domain (squares) to another (circles), which is the combined intact, allows them to be learned independently of essence of metaphors, defined as “a cross-domain mapping in the each other. Independent learning helps limit the size of spaces conceptual system” (Lakoff, 1979). Of course, this is illustrating explored by swarms (Bengio et al., 2009)(Figure 15). only the basic mechanism which metaphors rely on, while I have While I have previously studied swarm learning for gradients, published elsewhere with Yi Mao a more complete account, using other variations can be considered, such as structural properties Montague grammars (Montague, 1974; du Castel and Mao, 2006) of grammars. In the same way as a grammar can generate a (Figure 14). swarm of other grammars with different probabilities, it can Regarding organization, I now consider two grammars generate other grammars with different structural properties, previously discussed, digit grammar “A = B | C. B = 0 | 1. C = B and then learn from the swarm, with branches that fulfill some A.” and orange grammar “A = .61 B |.39 C. B = PrintGreenPoint. target function. If learning the particular precedes the general,

Frontiers in Computational Neuroscience | www.frontiersin.org 11 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 14 | Actualization of a metaphor. The first grammar instead of squares. In the second neural circuit, the left part is the counts two squares, and is described by the second grammar. original square-counting circuit, the middle part activates the Each time the first grammar outputs a square, the second square-counting circuit, and the right part transforms counting grammar transforms it into a circle, thereby allowing counting circles squares into counting circles. a possible but not necessary hypothesis for this argument, “Rules = .5 Rule | .5 RulesSequence. Rule = Symbol Alternates. hierarchies of activation/recognition grammars are then learned Alternates = .5 Alternate | .5 AlternatesSequence. Alternate = from the bottom-up, accumulating experience in the memory Weight Symbols. Symbols = .5 Symbol | .5 SymbolsSequence. of validated grammars until reaching the top grammar. In that SymbolsSequence = Symbol Symbols. AlternatesSequence = model, pattern circuits build up over time, and the discovery of Alternate Alternates. RulesSequence = Rule Rules.” This new patterns is an incremental operation. method can be generalized to any grammar for better understanding of their function. Methods Summary Earlier versions of some of the grammars were posted online I now provide a method allowing better understanding of as unpublished works and are presented here in new versions stochastic self-description, and I explain computation of the with permission from the author (du Castel, 2013a,b). grammatical and neural diagrams. The model is fully implemented as a prototype which I ran Since grammar patterns are formal and structural, complex to produce the grammatical and neural diagrams of this article ones are better understood by considering them together with an as well as all describing grammars. The prototype compiles expressive representation. For example, consider self-describing grammars expressed in Wirth’s format into both their graphical grammar “A = .5 B | .5 C. B = Symbol D. D = .5 E | expression and the expression of their self-description. These .5 F. E = Weight G. G = .5 Symbol | .5 H. H = Symbol compilations drive execution of the stochastic grammars in G. F = E D. C = B A.” For legibility, A can be recast as two modes: one provides the grammatical diagrams, while the Rules, B as Rule, C as RulesSequence, D as Alternates, E other provides the neural diagrams. This unity of execution as Alternate, F as AlternatesSequence, G as Symbols, and ensures that both diagrams are indeed issued from a single H as SymbolsSequence. Then the grammar is expressed as description, as presented by the model, and that there is in

Frontiers in Computational Neuroscience | www.frontiersin.org 12 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 15 | Composition. Grammars 1 and 3 are combined by grammar 2. The neural circuit of grammar 2 activates the neural circuit of grammar 1, and then the neural circuit of grammar 3, joining them to make them interdependent.

fact a one-to-one correspondence between the grammars and a difference between the three is that PRISM is oriented their neural representations. The prototype is quite elaborate in toward logic, Church toward general modeling, while the present its treatment of grammars, as it handles all the cases of this implementation is oriented toward properties discussed here: article and more, but is less so in its handling of sampling and activation/recognition symmetry, grammar/neural diagrams, distributions, where it is far from the state of the art. While grammatical inference, metaphors, and composition. Being the prototype implementation is not in a form appropriate for universal, the three languages support each other’s computational general use, I present next a screen shot showing both the core model, but in each case less naturally than with the original. execution code and the results of the execution on a very simple To further illustrate this point, a language like Mathematica example (Figure 16). (Wolfram, 1988) presents similar capabilities but is rather a My implementation of self-describing activation/recognition natural fit to mathematics at large. Perhaps anecdotally, I suggest stochastic grammars, which affords universal computation, that the less natural character of advanced logic, probabilistic meta-circularity, and probabilities, belongs, however modestly, modeling, or sophisticated mathematics transcribed by self- to a general class of languages well-represented by PRISM describing activation/recognition stochastic grammars reflects (Sato and Kameya, 1997), which is a probabilistic offspring well the reality of human performance, where each of these of Prolog, and Church (Goodman et al., 2008), which is a activities involves pragmatic graduate-level experience, whereas probabilistic offspring of Lisp. From a theoretical perspective, grammars are arguably inherently a construct of early childhood

Frontiers in Computational Neuroscience | www.frontiersin.org 13 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

FIGURE 16 | This is the implementation. The left side shows executing code, while the right side shows the result of execution for a simple grammatical and neural description. and a more natural human experience. This said, from a practical Figure 3: Circuits of activation/recognition grammars perspective, my prototype could be implemented as a new represent concepts in both space and time. module of PRISM and Church, focused on the self-describing Figure 4: Probabilities of activation/recognition grammars activation/recognition grammatical and neural functions of this map synaptic probabilities. article but benefiting from the statistical capabilities and support Figure 5: Recursion in activation/recognition grammars found in these two systems. corresponds to recurrence in neural circuits, via autapses or regular synapses. Results Figure 6: An activation/recognition grammar and its neural circuit can represent a Turing machine, asserting While grammars are one of the oldest subjects of scientific study, universal computation. neither activation/recognition grammars nor the theoretical Figure 7: Activation/recognition grammars with varying model of stochastic grammars recursively describing other terminals express a metaphor in neural circuits. grammars hierarchies have, to my knowledge, been studied Figure 8: By varying probabilities, activation/recognition before. This model describes circuits in stochastic relationships; grammars and their circuits can express gradient furthermore, it describes that very description, and keeps doing patterns. so recursively up to self-description. At each recursion patterns Figure 9: An activation/recognition grammar can describe can be produced, recognized, and learned via reinforcement and another one, and its neural circuit activates the circuit composition, building up from past descriptions. of the grammar described. Following is a recapitulation of the figures in this article Figure 10: Activation of one circuit by another projects showing grammars, their schema, and their neural circuit, with probabilities from the latter to the former. comments now framed by the whole of this article: Figure 11: An activation/recognition grammar can express a swarm of similar circuits. Figure 1: Activation/recognition grammars correspond to Figure 12: A branch of a swarm of neural circuits enables neural circuits, with recognition represented by cell reinforcement learning by forwarding results to input, and activation by cell output. another circuit. Figure 2: Activation/recognition grammars express both Figure 13: The root grammar can describe any other grammar, parallel and sequential execution in neuronal including self, and ditto for the corresponding root circuits. neural circuit.

Frontiers in Computational Neuroscience | www.frontiersin.org 14 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

Figure 14: A self-describing neural circuit activates a I suggest, however, that acquisition/recognition symmetry metaphor by directing the output of another neural and self-description are not in conflict with current artificial circuit. networks, but are rather complimentary to them, so that the Figure 15: Any two circuits can be connected dynamically by a way forward may be a mix of current practice blended with the third one using self-description. new requirements. Another possibility is that a new brand of analytical (Carpenter and Grossberg, 1987; Herbort and Butz, With this general exhibit of properties of activation/recognition 2012) neural networks based on stochastic grammars and their grammars and their neural circuits, a model emerges that distributional and sampling properties emerges with millions of allows neural computations to be expressed in the well- diachronically and synchronically learned patterns (Rodríguez- researched mathematical framework of grammar theory. Sánchez et al., 2015). Additionally, the same principles can With their corresponding neural circuits, self-describing be applied to structuring biological artificial neural networks activation/recognition grammars form a biologically-inspired (Markram, 2012). model that can produce, recognize, and learn neurons and The symmetry of activation and recognition has been circuits of neurons from a single source, suggesting a unitary theorized in particular with the introduction of common coding model for their development and function. (Prinz, 1990), and of a shared circuit model (Hurley, 2007) supported by the discovery of mirror neurons (Gallese et al., Discussion 1996). The new theory supports activation and recognition sharing a same circuit as well as them sharing a same Theory description but with different circuitry. It can therefore be My goal was to provide a unified model that augments and proposed as an underlying formalism for both common coding completes Kurzweil’s theory. Activation/recognition grammars (same description) and the shared circuit model (same circuit). further that theory by allowing both activation and recognition Neurophysiological evidence for these models would then favor of patterns; this corresponds to neural activation and recognition the new theory as well. Nevertheless, I think that definite functions, including motor and sensory stimuli. validation entails mapping the circuitry presented in this paper Stochastic grammars are one step up from Kurzweil’s (Mikula and Denk, 2015). HHMMs, allowing full recursion, which maps naturally with While stochastic grammatical self-description is new, it finds neural circuits that are analogically recurrent (Gilbert and a home in theories of neural reuse (Anderson, 2010) and cultural Li, 2013). The combination of activation/recognition with recycling (Dehaene and Cohen, 2007). In both theories, it has probabilities warrants adaptive pattern processing that exploits been recognized that co-optation of an existing circuit for a new the stochastic nature of synaptic connections (Staras and Branco, function does not negate pre-existing functions. For example, 2009). hand gestures accompanying speech reflect the source domain By using activation/recognition stochastic grammars with of a spoken metaphor (McNeill, 1992; Marghetis et al., 2014). swarming patterns, it is possible to include learning in the model. This is in accord with the activation mechanism of this article, By combining self-description with these properties, learning which only adds, not substitutes, to the output of a neuron. extends from simple patterns to patterns of patterns, allowing the New and old functions coexist, and may or not find an outward treatment of consistency checking as yet another pattern. expression, depending on activation of downward circuitry. Stochastic grammars can both describe patterns and generate When it does, behavior consistent with the theory ensues, new ones, using a combinatorial learning process that allows such as gestures associated with speech. General validation, varying patterns for purpose. New patterns are modifications of however, would consist in a broad understanding of conditions existing ones, which expresses Kurzweil’s metaphoric model of under which situations of co-opted neural output actually affect discovery. behavior. With the above examples a comprehensive underpinning Validation model of neural circuitry emerges, but ultimate validation I now discuss in regards to validation three claims of the new of the theory relates to the unity of the brain. Indeed, the theory, which are: patterns are governing throughout; activation theory suggests that neural circuits are composed by self- and recognition share descriptions; and patterns are self- description into circuitry of increasing generality, up to the describing. In terms of neural computation, the algebra is that of root stochastic grammar describing all stochastic grammars. stochastic grammars; neural circuits implement their rules; and Would neuroscience reach the point where neural circuits computation stems from recursively nested descriptions. can be followed along that path (Laumonnerie et al., 2015), Back-propagation (LeCun et al., 1998) and spiking (Maass, direct validation of the theory would then be effected, arguably 1997) networks are perhaps paragons of current computational advancing a formal version of Aristotle’s hylomorphism. artificial neural networks. The former provides strong results on recognition benchmarks while aiming at biological plausibility Conclusion (Bengio et al., 2015); the latter roots more in biology while I must emphasize that the Pattern Activation/Recognition Theory aiming at matching the former in performance (Schmidhuber, of Mind should reflect more than the neuronal circuitry that I 2015). Neither addresses acquisition/recognition symmetry nor have considered. In particular, neuron/interneuron distinctions self-description. (DeFelipe et al., 2013), columns and layers arrangements,

Frontiers in Computational Neuroscience | www.frontiersin.org 15 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind modular specialization, and brain waves, are to be part of an order, Harry Barrow, with stochastic grammars and vision, Mike elaboration of the model. While I do not know that any or more Sheppard, with Bayesian reasoning, and teleology, Craig Beasley, of these properties would invalidate the model at this point, I with marine biology and modeling, and Tom Zimmerman, with would, a contrario, like them to reinforce it. swarms and learning, pushed me to explore to its theoretical But crucially, the model predicts that we should find in the limit my interest in the role of grammars in neuroscience. brain self-descriptive neural circuits of the kind modeled by the Additionally, Susan Rosenbaum and Eric Schoen facilitated the theory. That would be the discovery that shows that this new publication of this article, and Michel Koenig encouraged me model of the brain answers the millenary quest for understanding to surmount obstacles and deliver. I also thank Clémente du the unity of mind. Castel for her relentless editing. Finally, I am indebted to three Frontiers reviewers for comments that prompted me to improve Acknowledgments upon the paper by providing neural schemas for each individual grammar, by developing the validation section, and by presenting I am indebted to four colleagues for discussions that encouraged the computational implementation of the model, while bringing me as I was finding my way to this article. In chronological caution to the biologically-inspired perspective.

References Fine, S., Singer, Y., and Tishby, N. (1998). The hierarchical hidden Markov model: analysis and applications. Mach. Learn. 32, 41–62. doi: Anderson, M. L. (2010). Neural reuse: a fundamental organizational principle of 10.1023/A:1007469218079 the brain. Behav. Brain Sci. 33, 245–313. doi: 10.1017/S0140525X10000853 Gallese, V., Fadiga, L., Fogassi, L., and Rizzolatti, G. (1996). Action recognition in Bellman, R. E. (1986). The Bellman Continuum, ed R. S. Roth (Singapore: World the premotor cortex. Brain 119, 593–609. doi: 10.1093/brain/119.2.593 Scientific). Gilbert, C. D., and Li, W. (2013). Neural correlations, population coding and Bengio, Y., Lee, D. H., Bornschein, J., and Lin, Z. (2015). Towards Biologically computation. Nat. Rev. Neurosci. 14, 350–363. doi: 10.1038/nrn3476 Plausible Deep Learning. arXiv:1502.04156. Gödel, K. (1931/1986). “On formally undecidable propositions of Principia Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009). “Curriculum Mathematica and related systems,” in Collected work: Vol. I, ed S. Feferman learning,” in Proceedings of the 26th Annual International Conference on (Oxford: Oxford University Press), 144–195. , eds D. Pohoreckyj, L. Bottou, and M. L. Littman (Montreal: Goodman, N. D., Mansinghka, V. K., Roy, D. M., Bonawitz, K., and Tenenbaum, ACM), 382. J. B. (2008). “Church: a language for generative models,” in Proceedings of the Booth, T. L. (1969). “Probabilistic representation of formal languages,” in IEEE 24th conference in Uncertainty in Artificial Intelligence, eds D. McAllester, P. Record of Tenth Annual Symposium on Switching and Automata Theory Myllymaki (Corvallis: AUAI Press), 220–229. (Waterloo), 74–81. Harris, Z. S. (1968). Mathematical Structures of Language. New York, NY: John Buzsáki, G. (2006). Rhythms of the Brain. Oxford: Oxford University Press. Wiley and Son. Carpenter, G. A., and Grossberg, S. (1987). A massively parallel architecture for a Herbort, O., and Butz, M. V. (2012). Too good to be true? Ideomotor theory from a self-organizing neural pattern recognition machine. Comput. Vision Graphics computational perspective. Front. Psychol. 3:494. doi: 10.3389/fpsyg.2012.00494 Image Process. 37, 54–115. doi: 10.1016/S0734-189X(87)80014-2 Hurley, S. (2007). The shared circuits model: how control, mirroring and du Castel, B. (1978). Form and interpretation of relative clauses in English. simulation can enable imitation, deliberation, and mindreading. Behav. Brain Linguist. Inq. 9, 275–289. Sci. 31, 1–22. doi: 10.1017/S0140525X07003123 du Castel, B. (2013a). Formal Biology. Available online at: http://www.researchgate. Joshi, P., Prashant, J., and Sontag, E. D. (2007). Computational aspects of feedback net/publication/259343850. in neural circuits. PLoS Comput. Biol. 3:e165. doi: 10.1371/journal.pcbi. du Castel, B. (2013b). Probabilistic Grammars Model Recursive Neural Feedback. 0020165 Available online at: http://www.researchgate.net/publication/259344086. Kennedy, J., and Eberhart, R. (1995). “Particle swarm optimization,”in IEEE Record du Castel, B., and Mao, Y. (2006). Generics and Metaphors Unified under a of International Conference on Neural Networks (Perth, WA), 1942–1948. Four-Layer Semantic Theory of Concepts, in Acts of the Third Conference on Kurzweil, R. (2012). How to Create a Mind: The Secret of Human Thought Revealed. Experience and Truth, ed T. Benda (Taipei: Soochow University Press). New York, NY: Viking Books. Chi, Z. (1999). Statistical properties of probabilistic context-free grammars. Lakoff, G. (1979). “The contemporary theory of metaphor,” in Metaphor and Comput. Linguist. 25, 131–160. Thought, ed A. Ortony (Cambridge: Cambridge University Press), 203–204. Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton. Laumonnerie, C., Tong, Y. G., Alstermark, H., and Wilson, S. I. (2015). Chomsky, N., and Schützenberger, M.-P. (1963). “The algebraic theory of context- Commissural axonal corridors instruct neuronal migration in the mouse spinal free languages,” in Computer Programming and Formal Systems, eds P. Braffort, cord. Nat. Commun. 6, 7028. doi: 10.1038/ncomms8028 D. Hirschberg (North Holland; New York, NY), 118–161. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). “Gradient-based Clark, A. (2013). Whatever next? Predictive , situated agents, and learning applied to document recognition,” in Proceedings of the IEEE, Vol 86 the future of cognitive science. Behav. Brain Sci. 36, 181–204. doi: (Piscataway, NJ), 2278–2324. 10.1017/S0140525X12000477 Libertus, M. E., Feigenson, L., and Halberda, J. (2011). Preschool acuity of the Dasgupta, S., Das, S., Biswas, A., and Abraham, A. (2010). Automatic circle approximate number system correlates with school math ability. Dev. Sci. 14, detection on digital images with an adaptive bacterial foraging algorithm. Soft 1292–1300. doi: 10.1111/j.1467-7687.2011.01080.x Comput. 14, 1151–1164. doi: 10.1007/s00500-009-0508-z Maass, W. (1997). Networks of spiking neurons: the third generation of DeFelipe, J., López-Cruz, P. L., Benavides-Piccione, R., Bielza, C., LarraÑaga, P., neural network models. Neural Netw. 10, 1659–1671. doi: 10.1016/S0893- Anderson, S. et al. (2013). New insights into the classification and nomenclature 6080(97)00011-7 of cortical GABAergic interneurons. Nat. Rev. Neurosci. 14, 202–216. doi: Marghetis, T., Núñez, R., and Bergen, B. K. (2014). Doing arithmetic 10.1038/nrn3444 by hand: hand movements during exact arithmetic reveal systematic, Dehaene, S., and Cohen, L. (2007). Cultural recycling of cortical maps. Neuron 56, dynamic spatial processing. Q. J. Exp. Psychol. 67, 1579–1596. doi: 384–398. doi: 10.1016/j.neuron.2007.10.004 10.1080/17470218.2014.897359 de Saussure, F. (1916). “Principes généraux” in Cours de Linguistique Générale, eds Markram, H. (2012). The human brain project. Sci. Am. 306, 50–55. doi: C. Bailly, A. Sechehaye, and A. Riedlinger, A (Lausanne: Payot), 98–101. 10.1038/scientificamerican0612-50

Frontiers in Computational Neuroscience | www.frontiersin.org 16 July 2015 | Volume 9 | Article 90 du Castel Pattern activation/recognition theory of mind

McCarthy, J. (1960). Recursive functions of symbolic expressions and Staras, L., and Branco, K., T. (2009). The probability of neurotransmitter release: their computation by machine, Part I. Commun. ACM 3, 184–195. doi: variability and feedback control at single synapses. Nat. Rev. Neurosci. 10, 10.1145/367177.367199 373–383. doi: 10.1038/nrn2634 McNeill, D. (1992). Hand and Mind: What Gestures Reveal About Thought. Stock, A., and Stock, C. (2004). A short history of ideo-motor action. Psychol. Res. Chicago, IL: University of Chicago Press. 68, 176–188. doi: 10.1007/s00426-003-0154-5 Mikula, S., and Denk, W. (2015). High-resolution whole-brain staining for Turing, A. (1936). “On computable numbers, with an application to the electron microscopic circuit reconstruction. Nat. Methods. 12, 541–546. doi: Entscheidungsproblem,” in Proceedings of the London Mathematical Society 10.1038/nmeth.3361 (London), 2, 230–265. Montague, R. (1974). Formal Philosophy. New Haven, CT: Yale University Press. van der Loos, H., and Glaser, E. M. (1972). Autapses in neocortex cerebri: synapses Mumford, D., and Desolneux, A. (2010). Pattern Theory: The Stochastic Analysis of between a pyramidal cell’s axon and its own dendrites. Brain Res. 48, 355–360. Real-World Signals. Natick, MA: A. K. Peters, Ltd. doi: 10.1016/0006-8993(72)90189-8 Penrose, R. (1989). The Emperor’s New Mind. Oxford: Oxford University von Mammen, S., and Jacob, C. (2009). The evolution of swarm grammars – Press. growing trees, crafting art and bottom-up design, Comput. Intell. 4, 10–19. doi: Prinz, W. (1990). A Common Coding Approach to Perception and Action. Berlin: 10.1109/MCI.2009.933096 Springer. Wirth, N. (1977). What can we do about the unnecessary diversity of notation for Rodríguez-Sánchez, A., Neumann, H., and Piater, J. (2015). Beyond simple syntactic definitions? Commun. ACM 20, 822–823. doi: 10.1145/359863.359883 and complex neurons: towards intermediate-level representations of Wolfram, S. (1988). Mathematica: A System for Doing Mathematics by Computer. shapes and objects. Künstliche Intell. 29, 19–29. doi: 10.1007/s13218-014- Boston, MA: Addison-Wesley. 0341-0 Sato, T., and Kameya, Y. (1997). “PRISM: a language for symbolic-statistical Conflict of Interest Statement: The author declares that the research was modeling,” in Proceedings of the Fifteenth International Joint Conference on conducted in the absence of any commercial or financial relationships that could Artificial Intelligence (Nagoya), 1330–1339. be construed as a potential conflict of interest. Schmidhuber, J. (2015). Deep learning in neural networks: an overview. Neural Netw. 61, 85–117. doi: 10.1016/j.neunet.2014.09.003 Copyright © 2015 du Castel. This is an open-access article distributed under the Siegelmann, H. (1995). Computation beyond the Turing limit. Science 268, terms of the Creative Commons Attribution License (CC BY). The use, distribution or 545–548. doi: 10.1126/science.268.5210.545 reproduction in other forums is permitted, provided the original author(s) or licensor Smith, N., and Johnson, M. (2007). Weighted and probabilistic context- are credited and that the original publication in this journal is cited, in accordance free grammars are equally expressive. Comput. Linguist. 33, 477–491. doi: with accepted academic practice. No use, distribution or reproduction is permitted 10.1162/coli.2007.33.4.477 which does not comply with these terms.

Frontiers in Computational Neuroscience | www.frontiersin.org 17 July 2015 | Volume 9 | Article 90