applied sciences

Article A Brain-Inspired Goal-Oriented Robot Navigation System

Qiuying Chen and Hongwei Mo * College of Automation, Harbin Engineering University, Harbin 150001, China; [email protected] * Correspondence: [email protected]

 Received: 10 September 2019; Accepted: 11 November 2019; Published: 14 November 2019 

Abstract: Autonomous navigation in unknown environments is still a challenge for robotics. Many efforts have been exerted to develop truly autonomous goal-oriented robot navigation models based on the neural mechanism of spatial cognition and mapping in animals’ brains. Inspired by the Semantic Pointer Architecture Unified Network (SPAUN) neural model and neural navigation mechanism, we developed a brain-like biologically plausible mathematical model and applied it to robotic spatial navigation tasks. The proposed cognitive navigation framework adopts a one-dimensional ring attractor to model the head-direction cells, uses the sinusoidal interference model to obtain the grid-like activity pattern, and gets optimal movement direction based on the entire set of activities. The application of adaptive resonance theory (ART) could effectively reduce resource consumption and solve the problem of stability and plasticity in the dynamic adjustment network. This brain-like system model broadens the perspective to develop more powerful autonomous robotic navigation systems. The proposed model was tested under different conditions and exhibited superior navigation performance, proving its effectiveness and reliability.

Keywords: grid cell; ; autonomous navigation; brain-like system

1. Introduction Animals have innate skills to navigate in unfamiliar environments [1]. It seems effortless for animals such as rats and primates but has been a fundamental long-term challenge for robots. Developing a brain-like navigation model based on the brain navigation mechanism of animals has attracted increasing attention. The , one of the most important brain regions, which codes for direction, distance, and speed signals of movement, is thought to play a pivotal role in place learning, navigation, and exploration. Place cells in the CA1 field of the hippocampus have a conspicuous functional characteristic of location-specific activity. They have a high firing rate when animals are in a particular location and may form a map-like representation of the environment, called grid cells space. Animals can use grid cells space for efficient navigation [2]. The grid cells found by Moser are also key cells for navigation [3–5]. Grid cells in the (EC) are arranged in strikingly periodic firing fields. A regularly tessellating triangular grid-like pattern was found to cover an entire environment, regardless of external cues. This suggests that grid cells have the ability to take in path integrals by internally generating a mapping of the spatial environment, which has been proved by an attractor network structure [6,7]. Head-direction (HD) cells exist in the postsubiculum and the EC [8]. Like a compass, they provide the input of biological plasticity for the brain-like navigation system [9]. The discharge frequency of speed cells found in the medial entorhinal cortex (MEC) is regulated by running speed, which could dynamically represent the animal’s location [10]. Integrating HD cells and speed cells, animals have knowledge of their direction and speed of motion and can track their idiothetic motion [11].

Appl. Sci. 2019, 9, 4869; doi:10.3390/app9224869 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 4869 2 of 13

Navigation also involves memory and retrieval of environmental information, which is considered a cognitive mechanism for storing and processing information. A navigation system with spatial cognition and mapping can better achieve rapid and stable navigation. Anatomically, the neural responses of working memory are found in large areas of the parietal and frontal cortex. Accumulating evidence suggests that the posterior parietal cortex (PPC) and the dorsolateral prefrontal cortex (DLPFC) in the cerebral cortex are closely involved in working memory [12–14], temporary storage, and manipulation of data related to cognitive control [15]. Adaptive resonance theory (ART) is a cognitive and neural theory of how the brain autonomously learns to categorize and recognize events in a changing world. In this paper, the ART mechanism was used to deal with the retrieval and storage of environmental information. The rewards are a positive incentive for the desired action. The rewards will increase the selected probability of the previous action and guide the animal to easily reach the final goal. Anatomically, the orbitofrontal cortex (OFC) is involved in the process of reward signals [16]. In this paper, the rewards are represented by value functions of Q-learning to aid navigation. Numerous computational models have been proposed based on place cells and grid cells. Each place cell has its own firing field and, taken together, the firing fields of place cells tend to cover the entire surface of the environment. The firing fields of place cells have mainly been modeled as Gaussian functions [17–19]. A large majority of grid cells have been modeled as three classes: oscillatory interference [20–22], continuous attractor network [6,7,23], and self-organizing [24–26]. These models concentrate only on the formation of place cells and grid cells, which does not really apply to navigation and is not instructive for autonomous robotics navigation. Many robot navigation systems use a brain-like computational mechanism to complete navigation tasks. Milford developed a bionic navigation algorithm called RatSLAM [27,28]. It combines head cells and place cells to form an abstract pose cell, which is mainly used for accurate positioning. Zeng and Si [29] proposed a cognitive mapping model in which HD cells and grid cells use the same working mechanism; however, a place cell network is not included, and their model could not provide metric mapping. Tang [30] developed a hierarchical brain-like cognitive model with a particular focus on the spatial coding properties of the EC, which does not include navigation strategy or cognitive mapping. Moreover, none of the above models contains the biological mechanism of memory storage and retrieval, which has an important effect on effective navigation. Adaptive resonance theory (ATR) is a cognitive and neural theory of how the brain autonomously learns to categorize, recognize, and predict objects and events in a changing world. In this paper, ART is adapted to deal with working memory. ART based on matching learning is an effective approach to handle the dynamic stability of the network and get better navigation performance. Eliasmith [31] proposed a unified set of neural mechanisms, Semantic Pointer Architecture Unified Network (SPAUN), which has performed plenty of tasks. The structure of SPAUN is divided into three layers: input information processing, working memory, and action execution. Inspired by the framework of SPAUN’s neural structure and the neural mechanisms of biological brain navigation, we proposed a novel three-layer brain-like overall structure of cognitive navigation, which was a supplement to SPAUN. The proposed navigation model can autonomously complete navigation tasks.

2. Mathematical Model

2.1. System Structure Inspired by the SPAUN neural structure and the mechanism of neural navigation in animals’ brains, a brain-like biologically plausible cognitive model was proposed and applied to robotic spatial navigation tasks. The model’s architecture is shown in Figure1. Appl. Sci. 2019, 9, 4869 3 of 13 Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 13

ART process Cerebral OFC:Motivation and Cortex PPC DLPFC feedback

hippocampal region DG → CA3 → CA1

HD cells EC Ⅲ integration EC Ⅱ EC Ⅴ/Ⅵ Speed cells 1D ring processing

Basal Ganglia Action Selection

Figure 1.1. SystemSystem diagram diagram of navigationof navigation model model based based on Semantic on Semantic Pointer Pointer Architecture Architecture Unified NetworkUnified (SPAUN)Network (SPAUN) and brain and neural brain system. neural This system. framework This framework deals with deals data fromwith inputdata from signals input to final signals action to generationfinal action to generation achieve autonomous to achieve navigation. autonomous The modelnavigation. consists The of threemodel layers: consists hippocampal of three region,layers: dealinghippocampal with inputregion, signals dealing (head with direction input signals and instantaneous(head direction speed and instantaneous signals) by path speed integration signals) by to expressionpath integration position; to expression cerebral cortex, position; creating cerebral and cortex, modifying creating the and topological modifying map the of topological environmental map informationof environmental and passinginformation the rewardand passing information the reward to the information decision-making to the decision-making layer; and basal layer; ganglia, and determiningbasal ganglia, the determining final action the by final winner-take-all action by wi rule.nner-take-all Meaning ofrule. abbreviations: Meaning of DG,abbreviations: dentate gyrus; DG, CA3dentate and gyrus; CA1, CA3 internal and area CA1, of internal hippocampus; area of EC,hippocampus; entorhinal cortex;EC, entorhinal PPC, posterior cortex; parietalPPC, posterior cortex; DLPFC,parietal cortex; dorsolateral DLPFC, prefrontal dorsolateral cortex. prefrontal cortex.

The modelmodel includesincludes three three layers: layers: cerebral cerebral cortex, cortex, hippocampal hippocampal region region and basaland basal ganglia. ganglia. The basal The gangliabasal ganglia determine determine and perform and perform the most the highly most recommendedhighly recommended behavior, behavior, in this case, in this leading case, the leading robot tothe the robot target. to the The target. cerebral The cortexcerebral mainly cortex handles mainly handles the working the working memory memory and reward and reward function. function. When newWhen knowledge new knowledge is learned, is learned, the memory the memory capacity ofcapacity ART can of increaseART can adaptively increase adaptively without destroying without thedestroying previous the knowledge previous ofknowledge the network. of the The netw ARTork. method The ART can method build environmental can build environmental information dynamicallyinformation dynamically and stably. In and this stably. paper In we this used paper the ARTwe used network the ART to dynamically network to dynamically process information, process helpinginformation, to achieve helping rapid to achieve and stable rapid navigation. and stable navigation. Sitting betweenbetween thethe cerebral cerebral cortex cortex and and basal basal ganglia ganglia is theis the hippocampal hippocampal region, region, which which includes includes the hippocampusthe hippocampus and theand EC. the This EC. playsThis plays a crucial a crucial role in role spatial in spatial learning learning and navigation and navigation [32], where [32], external where informationexternal information (HD and speed(HD and information) speed information) are converted are toconverted expressions to ofexpressions the internal of environment.the internal Theenvironment. hippocampal The hippocampal region is an area region where is an grid area cells, wher HDe grid cells, cells, and HD speed cells, cells and convergespeed cells to converge transfer placeto transfer information place information to the place to cells. the place cells. Initially, speed cells and HD cells are inputs to layerlayer III of the EC,EC, whichwhich handles the ring attractor pre-processing. Then, Then, they they go go through through laye layerr II II of of the the EC, EC, where where they they obtain obtain the the grid grid cells’ cells’ activities. The The grid grid cells cells in in EC EC II II pass pass through the perforant pathway pathway into into the the dentate dentate gyrus gyrus (DG). (DG). Then, the granular cells in the DG are associated with the CA3 of the hippocampus by by moss moss fibers. fibers. CA3 projects to the CA1CA1 byby SchaSchafferffer collaterals.collaterals. The deep VV/VI/VI layer performs top-down feedback from the hippocampus toto thethe ECEC toto helphelp stabilizestabilize memories.memories.

2.2. Model Description Motion information is obtained by integrating direction and speed signals, then updating the self-position. This paper uses a 1D ring attractor to model the HD cells: Appl. Sci. 2019, 9, 4869 4 of 13

2.2. Model Description Motion information is obtained by integrating direction and speed signals, then updating the self-position. This paper uses a 1D ring attractor to model the HD cells:    cos(φi,1) sin(φi,1)    =   HDi   (1)   cos(φi,m) sin(φi,m)

th th where φi,j is the preferred direction of the j cell in the i ring attractor.

2πj φ = (2) i,j m j = 1, ... , m, m = 6 is the number of in a single attractor. Different rings have different preferred directions. There are three ring attractors. The angle difference is π/3. Suppose that at time t, the robot is moving with speed V(t). The velocity projected to different ring attractors is: v (t) = HD V(t) (3) i i ∗ Given the velocity, we can figure out the wavelength of sine function: v λ = (4) f

If two sinusoidal functions of different frequencies interfere with each other, it will generate beat frequency oscillations with a fixed spatial distance (as shown in Figure2b). The beat frequency is equal to the difference between the two frequencies:

f = f0 fi (5) b − Then the wavelength of the periodic oscillation is: v v λ = = (6) i fb f0 fi −

λi a fixed value and represents the distance between vertices in the grid cells space. According to Equation (6), we derive: v fi = f0 + (7) λi

Set f0 = 7 Hz as a reference frequency. Beat frequency oscillation is integrated into the corresponding phase in different ring attractors (as shown in Figure2a):

Z t αi(t) = 2π fi(T)dT (8) 0

Converting phase-coded into rate-coded to express the position signal, the activity of the jth cell in the ith ring is: sin(αi(t) + φi,j) + 1 S (t) = (9) i,j 2 With comprehensive input of ring attractors with different preferred orientations, the amplitude of the beat frequency oscillations will be recognized at each x kλ (x = V(t) t). This means that ± i × Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 13

sin(αφ (t )++ ) 1 St()= iij, (9) ij, 2

With comprehensive input of ring attractors with different preferred orientations, the amplitude Appl. Sci. 2019, 9, 4869 5 of 13 of the beat frequency oscillations will be recognized at each x ± kλi (x = V(t) × t). This means that when the robot is moving linearly, it will produce grid-like activity with vertex spacing λi (as shown in

Figurewhen the 2c). robot The activity is moving of linearly,grid cells it is: will produce grid-like activity with vertex spacing λi (as shown in Figure2c). The activity of grid cells is: = gtij, () ∏YSt() (10) gi,j(t) = S(t) (10)

λ

(a) (b) (c)

FigureFigure 2. ((aa)) Illustration Illustration of of ring ring attractor mechanism fo forr . The The attractor attractor model model was was simplifiedsimplified intointo aa one-dimensional one-dimensional linear linear ring ring attractor attractor model. model. Suppose Suppose a mouse a mouse moves inmoves one direction in one directionalong a straight along a path.straight With path. diff Witherent different distances distan fromces the from apex, the each apex, attractor each attractor can be can expressed be expressed by the byactivity the activity of a ring of attractor. a ring (attractor.b) Schematic (b) diagramSchematic of periodicdiagram oscillationof periodic obtained oscillation by interference obtained by of

interferencedifferent frequency of different waveforms. frequencyy1 = waveforms.sin(14πx + π y/12); = ysin(142 = sin(16πx +π πx/2);+ 3 πy2/ 2);= sin(16y3 = yπ1 x+ +y 23;π The/2); θy3oscillating = y1 + y2; Thebeat θ interference oscillating beat will produceinterference a fixed will interval produce beat a fixe y4,d andinterval the amplitude beat y4, and of thethe beatamplitude oscillation of the will beat be oscillationidentified atwill each be attractor.identified (c at) Illustration each attractor. of how (c) hexagon-likeIllustration of firing how mode hexagon-like was generated. firing mode Trajectory was generated.of random movementTrajectory of robotrandom (blue) movement and grid-like of robo firingt (blue) mode and generated grid-like by firing this paper’s mode generated method (red). by this paper’s method (red). The place cells accept the inputs of the grid cells’ population with different activities. The place cellsThe are activatedplace cells when accept the the inputs inputs are of greater the grid than cell thes’ population threshold h:with different activities. The place cells are activated when the inputs are greater( than the threshold h: 1 g > h ψ(g) = (11) 0 otherwise 1hg > ψ ()g =  (11) 2.3. Navigation Strategy 0otherwise There are two ways to guide the robot to the goal in an unknown environment: select the direction according to the reward signal or the Q-value of the action cells. Initially, the robot randomly explores 2.3.the Navigation environment Strategy and will get a reward signal after reaching the target. If there are no reward signals withinThere the searchare two scope, ways itto then guide selects the arobot direction to the according goal in an to unknown the Q-values; environment: otherwise, select the robot the directionfollows the according reward signals. to the reward signal or the Q-value of the action cells. Initially, the robot randomlyIf the explores reward signals the environment can be detected and will in the get step a reward range, thesignal value after of reaching the reward the signals target. will If there increase are noaccording reward tosignals Equation within (12). the With search each step,scope, all it nodes then updateselects a their direction reward according signals. The to directionthe Q-values; with otherwise,the maximum the robot sum of follows reward the signals reward will signals. be chosen. If the reward signals can be detected in the step range, the value of the reward signals will r(t) = r(t 1) + ∆r (12) increase according to Equation (12). With each step,− all nodes update their reward signals. The direction with the maximum sum of reward signals will be chosen. If the reward signal is zero at the present position, the robot selects the direction according to the Q-values. In our system, there are eight action directions (at 45◦ intervals). Each action cell receives the input from all activated place cells. According to a winner-takes-all mechanism, the direction with the maximum mean of Q-values is the optimal output direction. P i wi,oψi(g) Qo = P (13) i ψi(g) Appl. Sci. 2019, 9, 4869 6 of 13

where wi,o expresses the weight of place cell i in action direction o. The weights of the winning are modified according to the following equation:

∆w = α(r + βmax(Qo) w ) (14) i,o − i,o Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 13 If both the Q-value and the reward signal are zero at the present position, the robot will move randomly. In this case, the robot keeps the direction with a probability of 1 ξ. With a probability of ξ, − it randomly chooses a new direction with thert() aim=−+Δ rt ( of 1) finding r other, shorter routes. (12) 2.4. Network Dynamic Adjustment If the reward signal is zero at the present position, the robot selects the direction according to the Q-values.The stability-plasticity In our system, dilemma there are problem, eight action whereby directions the brain (at learns 45° intervals). quickly and Each stably action without cell receivescatastrophically the input forgetting from all itsactivated past knowledge place cells. [ 33According], exerts ato significant a winner-takes-all influence mechanism, on stable robot the directionnavigation. with In the our maximum system, nodes mean were of Q-values sparsely is added the optimal to the cognitiveoutput direction. map during robot exploration. We combined the ART network [34] with the robot navigation system not only to solve the problem of stable plasticity of the network dynamic adjustment, butψ also to enable the robot to complete navigation  wgio, i() tasks autonomously. Q = i (13) o ψ ()g In this paper, ART (simultaneously workingi andi learning) was slightly modified to evaluate whether the nodes should be added to the map. The fundamental structure of ART is shown in Figure3. The algorithm flow is as follows: (1) Network receives input xi; (2) Compute the match degree. where wio, expresses the weight of place cell i in action direction o. The weights of the winning Calculate the distance between x and other points in map b . neuron are modified according to the ifollowing equation: j Xn Δ=+netjα = β bj xi − (15) wrio,,(max())k − Qw ok io (14) j=1

(3) ChooseIf both thethe bestQ-value match and neuron. the reward signal are zero at the present position, the robot will move randomly.Obtain In the this winning case, the neuron robot keeps by competition: the direction with a probability of 1 – ξ. With a probability of ξ, it randomly chooses a new direction with the aim of finding other, shorter routes.    n  X  2.4. Network Dynamic Adjustmentnetj = min netj = min b x  (16) ∗ { }  k j − ik  j=1  The stability-plasticity dilemma problem, whereby the brain learns quickly and stably without catastrophically(4) Inspect warning forgetting threshold. its past knowledge [33], exerts a significant influence on stable robot navigation. In our system, nodes were sparsely added to the during robot exploration. If netj < ρ, compare the rewards of two nodes, adding the node with bigger reward to the map, We combined∗ the ART network [34] with the robot navigation system not only to solve the problem ofthen stable update plasticity the map of connection.the network dynamic adjustment, but also to enable the robot to complete navigationElse, if tasks random autonomously. probability >0.25, then add the winning neuron to the map.

gain control

Comparison expected value Recognition layer layer

Input xi

reset ρ

Precision adjustment Figure 3. The fundamental structure of adaptive resonance theory. Figure 3. The fundamental structure of adaptive resonance theory.

In this paper, ART (simultaneously working and learning) was slightly modified to evaluate whether the nodes should be added to the map. The fundamental structure of ART is shown in Figure 3. The algorithm flow is as follows: 1) Network receives input xi.; 2) Compute the match degree. Calculate the distance between xi and other points in map bj.

n =− netj bj xi (15) j=1 Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 13

(3) Choose the best match neuron. Obtain the winning neuron by competition:

n == − netj*min{}min netj bji x (16) j=1

(4) Inspect warning threshold. If netj* < ρ , compare the rewards of two nodes, adding the node with bigger reward to the map,

Appl.then Sci. update2019, 9 ,the 4869 map connection. 7 of 13 Else, if random probability >0.25, then add the winning neuron to the map.

3.3. ResultsResults

3.1.3.1. SimulationSimulation SetupSetup NumerousNumerous researchers researchers have have used used the the Morris Morris water water maze maze experiment experiment [ 35[35–38],–38], in in which which a a rat rat mustmust findfind aahidden hiddenplatform platformin in its its environment. environment. ThisThis hashas becomebecome thethestandard standardparadigm paradigmfor for measuring measuring navigationnavigation tasks. tasks. AA drydry variantvariant ofof thethe MorrisMorris waterwater mazemaze experimentexperiment waswas adoptedadopted toto verifyverify ourour systemsystem model.model. The navigation task was completed in a 100 100 virtual environment. The robot was placed in the The navigation task was completed in a 100× × 100 virtual environment. The robot was placed in the lowerlower leftleft cornercorner toto findfind the the target target as as shown shown in in Figure Figure4 .4. Initially, Initially, the the robot robot randomly randomly explored explored the the surroundings,surroundings, andand thethe rewardreward valuevalue waswas setset toto 11 whenwhen itit found found the the goal. goal. TableTable1 1summarizes summarizes the the parametersparameters used used in in the the model. model.

N E goal 100

start

100

Figure 4. Environmental setup. The N/E at the top left is environmental direction, N = north, E = east. Figure 4. Environmental setup. The N/E at the top left is environmental direction, N = north, E = east. Table 1. Parameter setting. Table 1. Parameter setting. Parameter α β λ ξ h ∆r ρ Parameter α β λ ξ h Δr ρ Value 0.7 0.7 20 cm 0.1 0.5 0.005 12 Value 0.7 0.7 20 cm 0.1 0.5 0.005 12

3.2. Simulation Result 3.2. Simulation Result The robot started random exploration with an empty cognitive map and continued to update it The robot started random exploration with an empty cognitive map and continued to update it using the ART process. We report three types of simulation environments to determine the effectiveness using the ART process. We report three types of simulation environments to determine the of our system model: an ideal situation with no obstacles in the environment (Figure5), simple effectiveness of our system model: an ideal situation with no obstacles in the environment (Figure 5), obstacles set in the environment (Figure6), and changes the start position in the simple obstacles simple obstacles set in the environment (Figure 6), and changes the start position in the simple situation (Figure7). obstacles situation (Figure 7). In the ideal situation with no obstacles, the robot explored until it found the target 30 times. In the ideal situation with no obstacles, the robot explored until it found the target 30 times. The The early path was relatively random due to the absence of reward signals, but with our navigation early path was relatively random due to the absence of reward signals, but with our navigation strategy, it could quickly search a shorter path of an approximately straight route to the goal after five strategy, it could quickly search a shorter path of an approximately straight route to the goal after trials. Moreover, the probability of 0.1 for random exploration was used to increase the opportunity five trials. Moreover, the probability of 0.1 for random exploration was used to increase the of finding other shorter paths. For example, on the 8th and 16th paths, it tried new directions and avoided getting into the local optimum. Appl. Sci. 2019, 9, x FOR PEER REVIEW 8 of 13 opportunityAppl. Sci. 2019, 9of, 4869 finding other shorter paths. For example, on the 8th and 16th paths, it tried 8new of 13 directions and avoided getting into the local optimum.

(a)

(b)

(c)

Figure 5. Results of navigation in ideal no-obstacle condition: ( (aa)) Navigation Navigation trajectories trajectories of of 30 30 runs; runs; firstfirst rowrow isis1–6 1–6 runs, runs, and and so so on. on. (b )(b Number) Number of stepsof steps needed needed to reach to reach the goal. the goal. (c) Map (c) nodesMap nodes (blue circle)(blue werecircle) established were established during during the exploration. the exploration.

To further test the performance of the model, we added simple obstacles to the environment (Figure6 6).). InIn thisthis situation,situation, thethe robotrobot mustmust searchsearch forfor thethe goalgoal and,and, atat thethe samesame time,time, eeffectivelyffectively avoid obstacles. Compared Compared to to the the ideal situation situation with no obstacles, more exploration steps were needed in the early stage,stage, butbut withwith the the di diffusffusionion of of rewards, rewards, it couldit could quickly quickly find find a better a better path path to the to goal.the goal. In this In thispaper, paper, we aimwe aim to outlineto outline the the general general navigation navigation principles principles in in di differentfferent environments, environments, notnot provide a specific specific obstacle-avoidance algorithm. Appl. Sci. 2019, 9, 4869 9 of 13 Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 13

(a)

(b)

(c)

Figure 6. Results of navigation with immobile obstacle. Red square represents the target, blue line is the path, and obstacles are shown by “***”: ( a) Navigation trajectories resu resultslts of of 30 runs; first first row is 1–6 runs, and so on.on. (b) Number of steps needed to reach thethe goal.goal. ( c) Map nodes (blue circle) were established during the exploration.

The mainmain challenge challenge of robotof robot navigation navigation in an unfamiliarin an unfamiliar environment environment is simultaneous is simultaneous localization andlocalization mapping and (SLAM). mapping After (SLAM). establishing After establishi the cognitiveng the map,cognitive navigation map, navigation becomes pathbecomes planning. path Theplanning.A * algorithm The A * ,algorithm, one of the one most of the popular most popula heuristicr heuristic search algorithms,search algorithms, is widely is widely used inused path in optimization.path optimization. According According to the to the obtained obtained cognitive cognitive map map shown shown in in Figure Figure6c, 6c, we we easily easily obtained obtained a relatively shorter path using the A* algorithm, before (Figure7 7a)a) andand afterafter (Figure(Figure7 b)7b) changing changing the the start point. Appl. Sci. 2019, 9, 4869 10 of 13 Appl.Appl. Sci. Sci. 2019 2019, 9, ,9 x, xFOR FOR PEER PEER REVIEW REVIEW 1010 of of 13 13

(a) (a) (b) (b)

FigureFigure 7.7. 7. Results Results ofof of navigationnavigation navigation according according according to to theto the the cognitive cogn cognitiveitive map, map, map, obstacles obstacles obstacles are are shownare shown shown by “***”:by by “***”: “***”: (a) We( a()a )We got We agotgot shorter a ashorter shorter path path (redpath line)(red (red line) from line) from the from start the the pointstart start topoint point the to target. to the the target. (target.b) After ( b(b) changing )After After changing changing the start the the point, start start we point, point, quickly we we gotquicklyquickly a shorter got got a pathashorter shorter (red path path line) (red (red to theline) line) target. to to the the target. target.

3.3.3.3. Contrastive Contrastive Analysis Analysis TheThe steps steps used used to to complete complete eacheach trialtrial areare anan importantimportant indicator indicator of of the the navigation navigation system. system. KulviciusKulvicius [39] [ 39[39]] added addedadded four fourfour pieces piecespieces of of olfactory olfactory info info informationrmationrmation to to toaid aid aid navigation. navigation. navigation. Compared Compared Compared with with with that, that, that, the the thenumbernumber number of of steps ofsteps steps required required required to to reach toreach reach the the thegoal goal goal are are arepr presented presentedesented in in Figure inFigure Figure 8. 8. 8In .In Inour our our system, system, system, having having having no no noolfactoryolfactory olfactory cues cues cues to to guide toguide guide navigation, navigation, navigation, the the first thefirst firstfew few trials fewtrialstrials needed needed needed more more moreexploratory exploratory exploratory steps, steps, steps,but but we we but could could we couldquicklyquickly quickly search search searchthe the relatively relatively the relatively shorter shorter shorterpath. path. This This path. in indicates Thisdicates indicates that that with with that the the withsame same the environmental environmental same environmental cues, cues, we we cues,couldcould weprovide provide could better providebetter performance. performance. better performance. Unlike Unlike in in Unlike[39], [39], the the in proposed [proposed39], the proposedmodel model was was model consistent consistent was consistent with with the the withbiologicalbiological the biological mechanism, mechanism, mechanism, indicating indicating indicating that that it it could could that effectively effectively it could e ffnavigate ectivelynavigate in navigatein the the experiment experiment in the experiment with with immobile immobile with immobileobstacles.obstacles. obstacles.

Figure 8. Comparison of steps required to reach the goal. Bar: Running steps of our model with FigureFigure 8. 8. Comparison Comparison of of steps steps required required to to reach reach the the goal. goal. Bar: Bar: Running Running steps steps of of our our model model with with immobile obstacle. Nobar: Running steps of our model with no obstacle. Vis+Olf: Running steps of [39] immobileimmobile obstacle. obstacle. Nobar: Nobar: Running Running steps steps of of our our mo modeldel with with no no obstacle. obstacle. Vi Vis+Olf:s+Olf: Running Running steps steps of of based on visual and olfactory cues. Vis+Olf+Mark: Running steps of [39] based on visual, olfactory, [39][39] base basedd on on visual visual and and olfactory olfactory cues. cues. Vis+Ol Vis+Olf+Mark:f+Mark: Running Running steps steps of of [39] [39] based based on on visu visual,al, and self-marking cues. olfactory,olfactory, an andd self-marking self-marking cues. cues. In terms of the map build, nodes were sparsely added to the cognitive map during robot exploration. In terms of the map build, nodes were sparsely added to the cognitive map during robot However,In terms in [35 of] therethe map was build, no judgment nodes whenwere newsparsely nodes added were recruited,to the cognitive which wouldmap during lead to robot high exploration. However, in [35] there was no judgment when new nodes were recruited, which would nodeexploration. randomness However, and not in [35] help there further was navigation. no judgment We when added new the ARTnodes network were recruited, to solve the which problem would of lead to high node randomness and not help further navigation. We added the ART network to solve stablelead to plasticity high node of randomness network dynamic and not adjustment. help further Based navigation. on the matchingWe addedlearning the ART method,network theto solve ART the problem of stable plasticity of network dynamic adjustment. Based on the matching learning processthe problem can eofffectively stable plasticity save resources, of network dynamically dynamic expressadjustment. the environment, Based on the andmatching autonomously learning method, the ART process can effectively save resources, dynamically express the environment, and completemethod, the navigation ART process tasks. can effectively save resources, dynamically express the environment, and autonomouslyautonomously complete complete navigation navigation tasks. tasks.

4. Conclusion Appl. Sci. 2019, 9, 4869 11 of 13

4. Conclusions Due to the flexibility and autonomy of biological navigation, many efforts have been exerted to apply biological mechanisms to developing autonomous robot navigation systems. Although a variety of computational models have been proposed according to animals’ goal-oriented behavior, rarely have they included the structure of multiple navigation cells. We propose a general structure for robot positioning, navigation, and memory mapping. We combined it with ART to form a new intelligent navigation system. The application of ART in robot technology promotes its cognitive ability. The proposed model was simulated and verified with three different conditions. The results show that the model enables the robot to maintain the basic ability of building an overall topological and metric map of an unknown environment. It also has the following additional capabilities and advantages: Under the guidance of population activities and reward signals, the robot can complete autonomous navigation tasks in an unknown environment and converge to an optimal path faster. By using the mechanism of forced exploration, new paths can be explored so as to effectively avoid falling into the local optimum. In conclusion, there are three main contributions of this paper. First, we constructed a unified overall navigation mathematical model that combines the biological plasticity of brain-like learning processes with working memory. This provides a valuable feedback tool for neuroscience. Second, we applied ART to aid in navigation, which reduced resource consumption and broadened the perspective to develop more powerful autonomous robotic navigation systems. Third, detecting the potential mechanism of biological space navigation, a hypothesis is proposed that there may be a region in the brain corresponding to the ART function waiting for biological neurologists to explore.

Author Contributions: Methodology, Q.C.; Software, Q.C.; Supervision, H.M.; Writing—original draft, Q.C.; Writing—review & editing, H.M. Funding: This research was funded by National key research and development program of China [grant number 2018AAA0102700]. Conflicts of Interest: The authors declare no conflict of interest.

References

1. Geva-Sagiv, M.; Las, L.; Yovel, Y.; Ulanovsky, N. Spatial cognition in bats and rats: From sensory acquisition to multiscale maps and navigation. Nat. Rev. Neurosci. 2015, 16, 94–108. [CrossRef][PubMed] 2. O’Keefe, J.; Dostrovsky, J. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 1971, 34, 171–175. [CrossRef] 3. Giocomo, L.M.; Moser, M.B.; Moser, E.I. Computational models of grid cells. Neuron 2011, 71, 589–603. [CrossRef] 4. Hafting, T.; Fyhn, M.; Molden, S.; Moser, M.B.; Moser, E.I. Microstructure of a spatial map in the entorhinal cortex. Nature 2005, 436, 801–806. [CrossRef][PubMed] 5. Stensola, H.; Stensola, T.; Solstad, T.; Frøland, K.; Moser, M.B.; Moser, E.I. The entorhinal grid map is discretized. Nature 2012, 492, 72–78. [CrossRef][PubMed] 6. Burak, Y.; Fiete, I.R. Accurate path integration in continuous attractor network models of grid cells. PLoS Comput. Biol. 2009, 5, 16. [CrossRef][PubMed] 7. McNaughton, B.L.; Battaglia, F.P.; Jensen, O.; Moser, M.B.; Moser, E.I. Path integration and the neural basis of the ‘cognitive map’. Nat. Rev. Neurosci. 2006, 7, 663–678. [CrossRef][PubMed] 8. Ranck, J.B. Head-direction cells in the deep cell layers of dorsal presubiculum in freely moving rats. Soc. Neurosci. Abstr. 1984, 10, 599. 9. Taube, J.S.; Bassett, J.P. Persistent neural activity in head direction cells. Cereb. Cortex 2003, 13, 1162–1172. [CrossRef] 10. Kropff, E.; Carmichael, J.E.; Moser, M.B.; Moser, E.I. Speed cells in the medial entorhinal cortex. Nature 2015, 523, 419–478. [CrossRef] 11. Valerio, S.; Taube, J.S. Path integration: How the head direction signal maintains and corrects spatial orientation. Nat. Neurosci. 2012, 15, 1445–1453. [CrossRef] Appl. Sci. 2019, 9, 4869 12 of 13

12. Allen, T.A.; Fortin, N.J. The evolution of episodic memory. Proc. Natl. Acad. Sci. USA 2013, 110, 10379–10386. [CrossRef][PubMed] 13. Liu, D.; Gu, X.; Zhu, J.; Zhang, X.; Han, Z.; Yan, W.; Cheng, Q.; Hao, J.; Fan, H.; Hou, R.; et al. Medial prefrontal activity during delay period contributes to learning of a working memory task. Science 2014, 346, 458–463. [CrossRef][PubMed] 14. Silva, M.A.D.; Huston, J.P.; Wang, A.; Petri, D.; Chao, O.Y.H. Evidence for a specific integrative mechanism for episodic memory mediated by AMPA/kainate receptors in a circuit involving medial prefrontal cortex and hippocampal CA3 region. Cereb. Cortex 2016, 26, 3000–3009. [CrossRef][PubMed] 15. Hoshi, E. Functional specialization within the dorsolateral prefrontal cortex: A review of anatomical and physiological studies of non-human primates. Neurosci. Res. 2006, 54, 73–84. [CrossRef] 16. Kringelbach, M.L. The human orbitofrontal cortex: Linking reward to hedonic experience. Nat. Rev. Neurosci. 2005, 6, 691–702. [CrossRef] 17. Arleo, A.; Gerstner, W. Spatial cognition and neuro-mimetic navigation: A model of hippocampal place cell activity. Biol. Cybern. 2000, 83, 287–299. [CrossRef] 18. Hartley, T.; Burgess, N.; Lever, C.; Cacucci, F.; O’Keefe, J. Modeling place fields in terms of the cortical inputs to the hippocampus. Hippocampus 2000, 10, 369–379. [CrossRef] 19. Solstad, T.; Moser, E.I.; Einevoll, G.T. From grid cells to place cells: A mathematical model. Hippocampus 2006, 16, 1026–1031. [CrossRef] 20. Brandon, M.P.; Bogaard, A.R.; Libby, C.P.; Connerney, M.A.; Gupta, K.; Hasselmo, M.E. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science 2011, 332, 595–599. [CrossRef] 21. Burgess, N.; Barry, C.; O’Keefe, J. An oscillatory interference model of grid cell firing. Hippocampus 2007, 17, 801–812. [CrossRef] 22. Giocomo, L.M.; Zilli, E.A.; Fransen, E.; Hasselmo, M.E. Temporal frequency of subthreshold oscillations scales with entorhinal grid cell field spacing. Science 2007, 315, 1719–1722. [CrossRef][PubMed] 23. Fuhs, M.C.; Touretzky, D.S. A spin glass model of path integration in rat medial entorhinal cortex. J. Neurosci. 2006, 26, 4266–4276. [CrossRef][PubMed] 24. Grossberg, S.; Pilly, P.K. How entorhinal grid cells may learn multiple spatial scales from a dorsoventral gradient of cell response rates in a self-organizing map. Plos Comput. Biol. 2012, 8, 31. [CrossRef][PubMed] 25. Mhatre, H.; Gorchetchnikov, A.; Grossberg, S. Grid cell hexagonal patterns formed by fast self-organized learning within entorhinal cortex. Hippocampus 2012, 22, 320–334. [CrossRef][PubMed] 26. Pilly, P.K.; Grossberg, S. Spiking neurons in a hierarchical self-organizing map model can learn to develop spatial and temporal properties of entorhinal grid cells and hippocampal place cells. PLoS ONE 2013, 8, 22. [CrossRef][PubMed] 27. Milford, M.J.; Wyeth, G.F. Mapping a suburb with a single camera using a biologically inspired SLAM system. IEEE Trans. Robot. 2008, 24, 1038–1053. [CrossRef] 28. Milford, M.J.; Wyeth, G.F. Persistent navigation and mapping using a biologically inspired SLAM system. Int. J. Robot. Res. 2010, 29, 1131–1153. [CrossRef] 29. Zeng, T.P.; Si, B.L. Cognitive mapping based on conjunctive representations of space and movement. Front. Neurorobot. 2017, 11, 16. [CrossRef] 30. Tang, H.J.; Huang, W.; Narayanamoorthy, A.; Yan, R. Cognitive memory and mapping in a brain-like system for robotic navigation. Neural Netw. 2017, 87, 27–37. [CrossRef] 31. Eliasmith, C.; Stewart, T.C.; Choo, X.; Bekolay, T.; DeWolf, T.; Tang, Y.; Tang, C.; Rasmussen, D. A Large-scale model of the functioning brain. Science 2012, 338, 1202–1205. [CrossRef] 32. Carpenter, F.; Manson, D.; Jeffery, K.; Burgess, N.; Barry, C. Grid cells form a global representation of connected environments. Curr. Biol. 2015, 25, 1176–1182. [CrossRef][PubMed] 33. Grossberg, S. How does a brain build a cognitive code? Psychol. Rev. 1980, 87, 1–51. [CrossRef][PubMed] 34. Grossberg, S. Adaptive resonance theory: How a brain learns to consciously attend, learn, and recognize a changing world. Neural Netw. 2013, 37, 1–47. [CrossRef][PubMed] 35. Erdem, U.M.; Hasselmo, M. A goal-directed spatial navigation model using forward trajectory planning based on grid cells. Eur. J. Neurosci 2012, 35, 916–931. [CrossRef][PubMed] 36. Llofriu, M.; Tejera, G.; Contreras, M.; Pelc, T.; Fellous, J.M.; Weitzenfeld, A. Goal-oriented robot navigation learning using a multi-scale space representation. Neural Netw. 2015, 72, 62–74. [CrossRef][PubMed] Appl. Sci. 2019, 9, 4869 13 of 13

37. Rogers, J.; Churilov, L.; Hannan, A.J.; Renoir, T. Search strategy selection in the Morris water maze indicates allocentric map formation during learning that underpins formation. Neurobiol. Learn. Mem. 2017, 139, 37–49. [CrossRef] 38. Erdem, U.M.; Milford, M.J.; Hasselmo, M.E. A hierarchical model of goal directed navigation selects trajectories in a visual environment. Neurobiol. Learn. Mem. 2015, 117, 109–121. [CrossRef] 39. Kulvicius, T.; Tamosiunaite, M.; Ainge, J.; Dudchenko, P.; Wörgötter, F. Odor supported place cell model and goal navigation in rodents. J. Comput. Neurosci. 2008, 25, 481–500. [CrossRef]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).