A computational perspective of the role of in cognition Nima Dehghani1, 2, a) and Ralf D. Wimmer3 1)Department of Physics, Massachusetts Institute of Technology 2)Center for Brains, Minds and Machines (CBMM), Massachusetts Institute of Technology 3)Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology (Dated: October 18, 2018) Thalamus has traditionally been considered as only a relay source of cortical inputs, with hierarchically organized cortical circuits serially transforming thalamic signals to cognitively-relevant representations. Given the absence of local excitatory connections within the thalamus, the notion of thalamic ‘relay’ seemed like a reasonable description over the last several decades. Recent advances in experimental approaches and theory provide a broader perspective on the role of the thalamus in cognitively-relevant cortical computations, and suggest that only a subset of thalamic circuit motifs fit the relay description. Here, we discuss this perspective and highlight the potential role for the thalamus – and specifically mediodorsal (MD) nucleus – in dynamic selection of cortical representations through a combination of intrinsic thalamic computations and output signals that change cortical network functional parameters. We suggest that through the contextual modulation of cortical computation, thalamus and cortex jointly optimize the information/cost trade-off in an emergent fashion. We emphasize that coordinated experimental and theoretical efforts will provide a path to understanding the role of the thalamus in cognition, along with an understanding to augment cognitive capacity in health and disease. Keywords: Thalamo-cortical system, Recurrent Neural Network, Reservoir Computing, Multi-objective Op- timization, Cognitive Computing, Artificial Intelligence

CORTICO-CENTRIC VIEW OF PERCEPTUAL AND cortical area and each of such distributed optimizations COGNITIVE PROCESSING may even be solved differently.90. This view of cortical computation is also paralleled with the growing use of Until recently, cognition [in mammalian, bird and rep- recurrent neural networks (RNN) that can capture the tilian nervous system] has been viewed as a cortico- dynamics of single or neural population in a va- centric process, with thalamus considered to only play riety of tasks. As such, RNNs can mimic (a) context- the mere role of a relay system. This classic view, much dependent prefrontal response89 or (b) can reproduce the driven by the visual hierarchical model of the mammalian temporal scaling of neural responses in medial frontal cortex35, puts thalamus at the beginning of a feedforward cortex and caudate nucleus151. Although it is clear that hierarchy. The transmission of information from thala- higher level cortical feedback reaches all the way down to mus to early sensory cortex (V1 in the visual system for thalamus (Wimmer et al, other refs), the main attribute example), and the gradual increasing complex represen- of perception/cognition remains cortico-centric under the tations from V2 to MT/IT and eventually prefrontal cor- umbrella of dynamic hierarchical models or RNN embod- tex (PFC), constitute the core of the perceptual repre- iment of cortical cognitive functions. Since the compu- sentation under the hierarchical model. A recent compar- tation that is carried by the system should match the ative study of biologically-plausible Convolutional Neu- computing elements at the appropriate scale25 a mis- ral Networks (CNN) and the visual ventral stream, em- match between these presumed computational systems phasizes on the feature enrichment through the network and the underlying circuitry becomes vividly apparent. of hierarchically interconnected computational modules First, associative cortex (and not just sensory cortex) (layers in the neural network and areas in the ventral receives thalamic input. Second, certain thalamic ter- stream)157. ritories receive their primary (driving) input from the arXiv:1803.00997v3 [q-bio.NC] 17 Oct 2018 The strictly static feedforward model has since mor- cortex, rather than sensory periphery, some of which are phed to a dynamic hierarchical model due to the discov- likely to be highly convergent on a single thalamic cell eries of the role of feedback from higher cortical areas level, suggesting at least a cortical modulation of sen- to lower cortical areas52. These dynamic hierarchies are sory input. Third, thalamic projections can be broad even considered to be favorable for recursive optimiza- and diffuse, suggesting a modulatory, rather than a relay tions, where the overall optimization can be achieved by function. These points are indicative that thalamus may breaking the problem into smaller ones to find the opti- play a central part in cognitive processes. But what does mum for each of these smaller problems. Such recursive including thalamus add that could not be achieved in optimization does not need to be confined to just one cortical loops? One advantage that the thalamus could bring into the cortical equation is flexibility. For exam- ple, the same sensory cue may have different meanings according to a particular context a subject is in. A re- 118 a)Electronic mail: [email protected] cent study shows that thalamus may be well suited to 2 reorganize functional connectivity in frontal cortices in sity of computing nodes (including thalamic and cortical response to such contextual changes, allowing for a more structures) and complexity of the computing architec- flexible switching of rule to action mapping. We sug- ture (Fig.1). We will start with a brief overview of the gest that the unique cognitive capability of the thalamo- thalamic architecture, followed by experimental evidence cortical system is tightly bound to parallel processing and a computational perspective of the thalamic role in and contextual modulation that are enabled by the diver- contextual cognitive computation.

cortical system

PhysariumP Reservoir Computing Machine y

bilit CNNCNNNN

pa

FPGAP ca

Logicgigicc GatesGatGateates ve

ti

i

n

(DiversityComputC & Genetic g o DNAA mmp Logic Gate Co CComputingtinttingtiing ut & C e nodeno Complexity) o m de sing Parallel processing

Figure 1. Cognitive computing morphospace: Morphospace of a few example biological and synthetic computing engines in a multidimensional layout. The thalamo-cortical system standout as a unique system with high cognitive capacity, massive parallel processing and extreme diversity of the computing nodes. Other computing systems occupy less desirable domains of this morphospace. Logic Gates: NAND, NOR; Genetic Logic Gate: Synthetic biology adaptation of logic gate; FPGA: field-programmable gate array (configurable integrated circuit); CNN : Convolutional Neural Network; Physarium Machine: programmable amorphous biological computer experimentally implemented in slime mould. Reservoir Computing: A reservoir of recurrent neurons that dynamically change their activity to nonlinearly map the input to a new space; DNA computing:A computing paradigm where many different DNA molecules are used to perform large number of logical computations in parallel; Connection Machine: the first commercial supercomputer designed for problems of Artificial Intelligence (AI) using hardware enabled parallel processing. For a multi-dimensional representation of global/local computation, serial/parallel processing and complex/simple computation, see137. For an idealized landscape of computation involving the degree of relevance of space, agent (cell) diversity and distributed computing see138.

THALAMIC ARCHITECTURE: ANATOMICAL AND tures such as tracts and various types of FUNCTIONAL FEATURES tissue staining procedures59. This gross anatomical clas- sification has been equated with a functional one, where Traditionally, thalamic nuclei (see Fig.2) are defined individual thalamic nuclei giving rise to a set of defined as collections of neurons that are segregated by gross fea- functions59,60. More recent fine anatomical studies chal- 3 lenge this notion, showing that within individual nuclei, 1 2 3 P single cell input/output connectivity patterns are quite variable. F O Thalamus lacks lateral excitatory connections and rather receives inputs from other subcortical structures T and/or cortex. In fact, a major feature of ex- pansion across evolution is the invasion of the thalamus 45,121 CN by cortical inputs . Most (90-95%) afferents to the AD LD relay nuclei are not from the sensory organs60,133. Re- VL TRN LD TRN AV cent anatomical studies have shown a great diversity of MD H PU cortical input type, strength and inferred degree of con- AM VPL 17 IL vergence, even within individual thalamic nuclei (see VA Fig.3 for an example view of cell and network architec- IL LGN VPM CM (D) ture diversity). (V) Excitatory inputs, mostly, arrive as feedbacks from TRN MGN layer 6 of the cortex as well as from brainstem reticular VPI formation. In addition to the diversity of excitatory in- 2 3 puts, thalamic circuits receive a diverse set of inhibitory 1 inputs (note that local GABAergic thalamic interneurons are mostly absent in non-LGN relay nuclei). The two ma- Figure 2. Schematic layout of thalamic nuclei: Three jor systems of inhibitory control are the thalamic reticu- cross sections of monkey thalamus. AD: anterodorsal nu- cleus; AM : anteromedial nucleus; AV : anteroventral nucleus; lar nucleus (TRN), a shell of inhibitory nucleus surround- CM : centromedian nucleus; CN : caudate nucleus; H : habe- ing thalamic excitatory nuclei, and the extra-thalamic in- nular nucleus; IL: intralaminar nuclei;LD: lateral dorsal nu- hibitory system; a group of inhibitory projects across the cleus; LGN : lateral geniculate nucleus;MD: mediodorsal nu- 47 fore-, mid- and hindbrain (see for a review on thalamic cleus; MGN(D): medial geniculate nucleus (dorsal); MGN(V): inhibition). Perhaps a major differentiating feature of medial geniculate nucleus (ventral); PU : pulvinar; TRN : tha- these two systems (TRN and ETI) is temporal precision. lamic reticular nucleus (not a relay nucleus); VA: ventral One of the key characteristics of thalamus is lack of direct anterior nucleus; VL: ventral lateral nucleus; VPI : ventral local loops. Only a very small group of inhibitory neurons posterior nucleus (inferior); VPL: ventral posterior nucleus 142 (lateral); VPM : ventral posterior nucleus medial. Redrawn with local connections exists in thalamus . A mecha- 133 nistic consequence of this architecture is the differential from . control of thalamic response gain and selectivity (Fig.3 F and G), with the TRN controlling the first as observed 111 in sensory systems , and ETI controlling the latter as While the idea of a thalamic relay was consolidated by 147 observed in motor systems . For example, basal gan- observing that the main LGN neurons thought to be as- glia control of thalamic responses, a form of ETI control, sociated with form vision (M and P pathways) exhibit would be implemented through thalamic disinhibition, spatially compact cortical terminals, recent anatomical which is not only dependent on ETI input, but also a studies of individual neurons across the thalamus show special type of thalamic conductance that enables high a variety of terminal sizes and degree of spatial spread 29,43 frequency ‘bursting’ upon release from inhibition . and intricate computational architecture (Fig.3). This Overall, the variety of thalamic inputs (both excita- complexity of the architecture and diversity of the com- tory and inhibitory), combined with intrinsic thalamic puting nodes are among the key factors that set apart features such as excitability and morphology will deter- the thalamo-cortical system from other conventional and mine the type of intrinsic computations that thalamus unconventional computing engines (Fig.1). Part of performs. This view appears to be consistent with recent the complication in understanding how these anatomi- observations of confidence encoding in both sensory69 cal types give rise to different functions is their potential and motor systems58,84. for contacting different sets of excitatory and inhibitory Thalamic relay nuclei mostly project to the cortical cortical neurons. middle layers in a topographic fashion. However, the ma- Specifically, among thalamic nuclei, mediodorsal tha- jority of thalamic structures also project more diffusely to lamus (MD) seems to have a connectivity pattern that the cortical superficial layers, such as mediodorsal (MD), is distinctively different from the classic sensory nuclei. posteriodmedial complex (POm) and pulvinar for exam- Cortico-thalamic projections to MD originate from both ple (see Fig.3 for an example of thalamic cell and cir- layers V and VI of PFC41,44,94. But, in contrast to re- cuit diversity). These diffuse projections seem poorly lay nuclei, cortical input to MD terminates both extra- suited to relay information in a precise manner. Rather, glomerularly and within the synaptic glomeruli, suggest- they might have a modulatory role of cortical function. ing that cortex plays a different role in MD activity (in Further, a great degree of diversity can be observed at comparison to LGN for example)127. Additionally, op- the level of thalamic axonal terminals within the cortex. togenetic and in vitro electrophysiology techniques have 4 revealed that MD not only projects to the layer I but RF) are very similar to those in the retina itself, arguing also has additional terminations in layer III20. MD pro- that there is little intrinsic computation happening in the jections to PFC synapse with both excitatory and in- LGN itself outside of gain control. Success in early vision hibitory cells20 where the triggered triggered feedforward studies53,54 might have inadvertently given rise to the inhibition could play a variety of roles from regulating LGN relay function being generalized across the thala- dendritic action potentials67 to imposing a narrow tem- mus. The strictly feedforward thalamic role in cognition poral window within which the excitation can reach the requires reconsideration48; only a few thalamic territories target21. These elaborate input and output connectivi- receive peripheral sensory inputs and project cortically in ties of the thalamocortical architecture point to the non- a localized manner, as the LGN does36,59,64,115,131. unitary computational role of thalamus. Among thalamic The largest thalamic structures in mammals, the MD nuclei, specifically MD (and likely other MD-like nuclei) and pulvinar contain many neurons that receive con- has the architecture necessary to be involved in modu- vergent cortical inputs and project diffusely across mul- latory computational roles rather than the well-known tiple cortical layers and regions17,122. For example, relay functions. In the next sections, we provide the ex- the primate pulvinar has both territories that receive perimental evidence and algorithmic designs pointing to topographical, non-convergent inputs from the striate this modulatory role. This idea of the thalamus control- cortex122 and others that receive convergent inputs from ling cortical state parameters is highlighted in Figs.4,5 non-striate visual cortical (and frontal) areas92. This and the next section. same thalamic nucleus also receives inputs from the su- perior colliculus109, a subcortical region receiving retinal inputs. This suggests that the pulvinar contains multi- MANY FACETS OF THALAMIC COMPUTATION ple input ‘motifs’ solely based on the diversity of exci- tatory input. Such input diversity is not limited to the It is commonly thought that processes like attention, pulvinar, but is seen within many thalamic nuclei across 11 decision making and working memory are implemented the mammalian forebrain . Local inactivation of pul- through distributed computations across multiple corti- vinar neurons results in reduced neural activity in pri- 114 cal regions19,93,128. However, it is unclear how these com- mary visual cortex suggesting a feedforward role. Re- putations are coordinated to drive relevant behavioral cent studies, however, indicate that pulvinar may have outputs. From an anatomical standpoint, the thalamus is additional roles. For example, pulvinar inactivation was 158 strategically positioned to perform this function, but rel- shown to increase low-frequency cortical oscillations . atively little is known about its broad functional engage- Given that such activity is often associated with inatten- ment in cognition. The thalamic cellular composition and tion and sleep, this study suggested that pulvinar may network structure constrain how cortex receives and pro- keep cortical regions in an activated state that would cesses information. The thalamus is the major input to allow responsiveness to top-down input from other areas the cortex and interactions between the two structures to modulate ongoing activity according to attentional de- are critical for sensation, action and cognition61,99,131. mands. A different study showed that during perceptual Despite recent studies showing that the mammalian tha- decision making pulvinar neurons encode choice confi- 69 lamus contains several circuit motifs, each with specific dence, rather than stimulus category . Together, these input/output characteristics, the thalamus is tradition- recent findings strongly argue for more pulvinar functions ally viewed as a relay to or between cortical regions133. beyond relaying information. That the active role of thalamus in cognition is beyond re- In the case of MD, direct sensory input is limited94 and lay stems from a) the fact that sensory input to thalamus the diffuse, distributed projections to cortex73 are poorly is much limited in comparison to input from other struc- suited for information relay; this input/output connectiv- tures, such as cortex and , b) the exper- ity suggests different functions. Recent studies12,118,126 imental evidence showing number of nuclei modulating have begun to shed light on the type of computation that cortical neural processing according to behavioral context MD performs. Taking advantage of the genetic acces- and c) lesions of certain nuclei such as pulvinar and MD sibility of the mouse for neuronal manipulations, these cause severe attention and memory deficits125. Below we studies have revealed that MD coordinates task-relevant will discuss how the distinctive anatomical architecture activity in the prefrontal cortex (PFC) in a manner anal- and computational role of pulvinar and MD differ from ogous to a contextual signal regulating distinct attrac- relay nuclei such as LGN. tor states within a computing reservoir. Specifically, in a It is worth mentioning that this view of bona fide task where animals had to keep a rule in mind over a brief thalamic computations is quite distinct from the one in delay period (Fig.4A), PFC neurons show population- which thalamic responses reflect their inputs, with only level persistent activity following the rule presentation, a linear changes in response size. This property of reflect- sensory cue that instructs the animal to direct its atten- ing an input (with only slight modification of amplitude) tion to either vision or audition (Fig.4B,C). MD neurons was initially observed in the lateral geniculate nucleus show responses that are devoid of categorical selectivity (LGN), which receives inputs from the retina. LGN re- (Fig.4D), yet are critical for selective PFC activity; opto- sponses to specific sensory inputs (their receptive fields, genetic MD inhibition diminishes this activity, while MD 5 activation augments it. The conclusion is that MD inputs filtering of the thalamic inputs. In other words, delay- enhance synaptic connectivity among PFC neurons or period MD activity maintains rule-selective PFC repre- may adjust the activity of PFC neurons through selective sentations by augmenting local excitatory recurrence126.

AB IIIb

IIIc

IVa

IVb

V 100 µm 100 µm

CD E TRN

VA VL

100 µm 100 µm 100 µm

FG

RE Th-cx Th-cx RE

Th-cx Th-cx L-circ Th-cx RE

L-circ

Figure 3. Diversity and complexity of thalamo-cortical architectures: (A) Comparative size of a single RGC (retinal ganglion cell) terminal (red) and an LGN (black); Redrawn from36. (B) Complete terminal arbor of a single LGN neuron projecting to V1; Redrawn from115. (C) Projection of a TC (thalamo-cortical) neuron to cat motor cortex (with only 23 terminals in TRN versus 1632 terminals in VA/VL); Redrawn from64,104 (D,E) Axonal arborization of a single MD neuron (E) and a single POm neuron (E). Principal target layers are: layer I (green), layer II-IV (green-yellow), layer V (blue), layer VI (purple); Redrawn from73. Note the comparative size of panels A-E (scale bar at 100 µm). (F,G) Synaptic network of thalamo-cortical (Th-cx), Reticular thalamic cell (RE) and local circuit (L-circ) thalamic interneuron in rodents (F) and feline/primates (G). Note that rodents do not have L-circ. Afferent excites Th-cx, which in return sends the signal to cortex. RE inhibitory effect on Th-cx cells varies depending on the excitatory drive to each Th-cx cell (F : compare the two neurons on the right versus the one on the left). Axonal collaterals of an RE cell could inhibit another RE cell (G: top RE inhibits the bottom RE), which releases the activity of L-circ leading to inhibition of weakly excited Th-cx (bottom) adjacent to the active Th-cx (top). Panels F and G are redrawn from143 based on experiments from141? . 6

In a related study, a delayed nonmatching-to-sample and cortex, shape the frame within which thalamus plays T-maze working memory task12, it was shown that MD the dual role of relay and modulator. Under this frame- amplification and maintenance of higher PFC activity in- work, different thalamic nuclei carry out multitude of dicated correct performance during the subsequent choice functions including but not limited to information relay. phase of the task. Interestingly, MD-dependent increased A suggestion of this comparative computational role of PFC activity was much more pronounced during the later LGN, pulvinar and MD is depicted in Fig8. The impor- (in delay) rather than earlier part of the task. These find- tance of (non-relay) thalamic nuclei’s regulatory influence ings indicate that PFC might have to recursively pull in on cortical function is also reflected by the disorders that MD to sustain cortical representations as working mem- emerge due to thalamic dysfunctions. Specifically, lesions ory weakens with time. Together these studies indicate to pulvinar and MD lead to severe attention and mem- that PFC cognitive computation can not be dissociated ory deficits9,124. The disruption of MD-PFC communica- from MD activity. Further evidence for the critical role tion is the likely cause of these cognitive impairment. As of the MD-PFC interaction for cognition is the disrupted mentioned earlier, the back and forth interaction between fronto-thalamic anatomical and functional connectivity MD and PFC is necessary during the task acquisition pe- seen in neurodevelopmental disorders91,95,98,107,155. riod and is reflected by an increase in (beta frequency) MD-PFC synchrony106. In addition, MD also regulates the neural synchrony of PFC neurons123. Decreasing Can MD select cortical subnetworks based on contextual MD spiking activity (by hyperpolarizing MD neurons) modulation? leads to disrupted MD-PFC synchrony and impaired per- formance in the delayed non-match to sample task106. Moreover, schizophrenics show both a reduced beta and Why would a recurrent network (PFC) computation gamma frequency deficit146, significant reduction of MD depend on its interaction with a non-recurrent (MD) non- volume2,112 and total number of MD neurons112. While relay network? What computational advantage such sys- it remains unclear whether the MD loss of neuron is pri- tem would have? Using a chemogenetic approach, a re- mary or secondary to PFC pathology112, it is evident that cent study suggested that information flow in the MD- MD is regulating PFC plasticity, cognitive flexibility9 and PFC network can be unidirectional. While both inacti- contextual processing118,126. vating PFC-to-thalamus and MD-to-cortex pathways im- paired recognition of a change in reward value in rats performing a decision making task, only the inactivation of MD-to-cortex pathway had an impact on the behav- IS THALAMUS A READ-WRITE MEDIUM FOR ioral response to a change in action-reward relationship1. CORTICAL PARALLEL PROCESSING? Given that a sensory stimulus may require a different ac- tion depending on the context in which it occurs, the The connectivity pattern of relay and non-relay point ability to flexibly re-route the active PFC subnetwork to the dichotomy of algorithmic constrains that are to a different output may be crucial. In an architec- imposed by these specific thalamic structures. There ture like the PFC-MD network, where MD can modulate are stark contrasts between sensory (LGN-like) versus PFC functional connectivity, MD might well be suited non-sensory thalamic nuclei thalamocortical and cortico- to re-route the ongoing activity in a context dependent thalamic connectivity profiles. Anatomical tracing and manner. In fact, in the mouse cognitive task described radiographic studies have shown that while relay nuclei above (Fig.4A), a subset of MD neurons showed sub- have preserved topographic focal projections the to mid- stantial spike rate modulation during task engagement dle cortical layer, major thalamic structures (MD, Pulv- compared to when the animals is in its home in cage (see inar, POm) have a more diffuse projection to the superfi- Fig.4E) 126. In contrast, PFC neurons show very little cial layers of cortex41,70,94. Among non-relay nuclei, MD difference in spike rates when the animal gets engaged shows interesting characteristics, projecting not only to in the task. This suggests that perhaps different subsets the layer I but also to the outer banks of layer III20. Only of MD neurons are capable of encoding task ‘contexts’, a small fraction of MD to PFC projections end in middle which has been shown experimentally118. Subsequently, cortical layers and more than 90% have modulatory and each given subset could unlock a distinct cortical associ- projectdissuely to superficial layers of PFC149,150,159. In ation; this hypothesis is now experimentally verified118. addition, cortico-thalamic to non-relay nuclei show These MD subsets have to be able to shift the corti- a dual role for cortical influence on thalamus41,94. For cal states dynamically while maintaining the selectivity example, not only layer VI/V PFC project huge number based on the subset of cortical connections they target. of axons directly to MD but also send collaterals to the This idea would also fit with the paradigm shift indicat- reticular thalamic nucleus and indirectly influence MD ing that thalamic neurons exert dynamical control over activity41,127,132. These massive reciprocal connectivity information relay to cortex8,105. is metabolically very costly for the brain. If brain were Overall, the anatomical and neurophysiological data to operate as a simple pattern matching system, a far show that the thalamic structure and cortico-thalamic less costly feed-forward network (similar to the structure network circuitry, and the interplay between thalamus suggested by157) would have been more economical. The 7 complex connectivity of non-relay thalamus and cortex module. In addition, collectively, these modules should point to computations that are beyond pattern match- keep track of the changes in the stimuli in the environ- ing. This view also matches what we know of cortical ment and integrate the information in time. This con- computation, namely that it involves multiple sources of textual processing would require small modifications of expertise (each decoding certain aspect of the incoming processed information in individual modules. Or for han- stimuli), along with the fact that these expertise modules dling the local constraints, it may need repetition of cer- operate in a highly parallel mode. To coordinate these tain algorithms until the satisfactory results are reached. parallel yet convergent cortical processing modules, there The connectivity profile that we discussed provides the is a need for a system that is commonly visible to each platform to run such computations.

A Delay period

ATTEND TO VISION Visual selection

Low-pass sound Ignore

Broadband ATTEND TO AUDITION Auditory selection

High-pass light Ignore

gnieuC Cue-free delay noitatneserP Response Trial available Start initiation 100 ms 400 ms 100 ms Median: 1521 ms B C D attend to vision attend to vision attend to vision 15 4 2 10 2 0 5 0 -2 -2 0 15 attend to audition 4 attend to audition attend to audition

Firing(Hz) rate 2 2

10 Firing(z-score) rate

Firing(z-score) rate 0 5 0 -2 -2 0 0 0.6 0 0.5 0 0.5 0 0.5 Time (s) Time (s) Time (s) Time (s) E MD PFC

12 outside 12 inside 12 outside 12 inside 10 10 10 10 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2 0 0 0 0 15 15 15 15

10 10 10 10

Firing(Hz) rate 5 5 5 5

0 0 0 0 −.2 0 .2 .4 .6 −.2 0 .2 .4 .6 −.2 0 .2 .4 .6 −.2 0 .2 .4 .6

Figure 4. MD-PFC interactions during sustained rule representations. (A) attentional control task design. (B) Example peri-stimulus time histogram (PSTH) for a neuron tuned to attend to vision rule signaled through low pass noise cue (C) Examples showing that rule-specificity is maintained across distinct PFC rule-tuned populations. (D) PSTHs of four MD neurons showing consistent lack of rule specificity. (E) Example rasters and PSTHs of an MD and PFC neuron when the animal is engaged in the task and outside of the behavioral arena. In contrast to PFC, MD neurons show the contextual difference in a change in firing rate. Figure is redrawn from? . 8

If the brain were to function as a simple pattern match- neurons that form a functional assembly which may or ing system without wiring and metabolic constrains, evo- may not be clustered together (in a column for example). lution would just expand the size and depth of the net- By iteratively reading (via thalamo-cortical projections) work to the point that it could potentially memorize a from and writing (via cortico-thalamic projections) to large number of possible patterns. Possibly, evolution this active blackboard, expert pattern recognition mod- would have achieved this approximation of arbitrary pat- ules, gradually refine their initial guess based on their in- terns by evolving a deep network. This would be a desir- ternal processing and the updates of the common knowl- able solution since any system can be defined as a poly- edge. This process continues until the problem is solved nomial Hamiltonians of low order, which can be accu- (Fig.5). 82 rately approximated by neural networks . But cognition This iterative communication between non-relay tha- is much more than template matching and classification lamus and cortex suggests that cortico-thalamic projec- achieved by a neural network. The limits of template tions return the results of computations (that was car- matching methods in dealing with (rotation, translation ried in parallel cortical modules) back to the thalamus. and scale) invariance in object recognition quickly be- The integration of these revisions in the next thalamic came known to neuroscientists and in early works on com- output happens via the synaptic input and dendritic puter vision. One of the early pioneers of AI, Oliver Sel- arbors of the non-relay thalamic neurons. One of the fridge, proposed Pandemonium architecture to overcome major differences in synaptic organization of MD from 129 this issue . Selfridge envisioned serially connected dis- that of the sensory nuclei is that cortical axons tar- tinct demons (an image demon, followed by a set of par- get MD neurons both extraglomerularly and within the allel feature demons, followed by a set of parallel cogni- synaptic glomeruli127. In sensory nuclei, these within- tive demons and eventually a decision demon), that in- glomeruli synaptic sites are particularly designated for dependently perceive parts of the input before reaching receiving major ascending sensory afferents139. In MD, a consensus together through a mixture of serial culmi- such large terminals may even engulf multiple synap- nation of evidence from parallel processing. This simple tic contacts110 and are positioned on proximal dendrites feedforward computational pattern recognition model is of thalamic neurons127. These within-glomuerli multi- (in some ways) a predecessor to modern day connection- synaptic structures exhibit fast kinetics, large postsynap- ist feedforward neural networks, much like what we dis- tic currents and strong short-term depression110. Inter- cussed earlier in the text. However, despite its simplicity, estingly, the short-term depression is combined with fast Pandemonium was a leap forward in understanding that recovery after repetitive stimulation110, enabling gener- the intensity of (independent parallel) activity along with ation of synaptic activity patterns that can match the a need to a summation inference are the keys to move frequency of cortico-thalamic inputs arriving via these from simple template matching to a system that has a large terminals140. As a result, these potent synaptic concept about the processed input. A later extension of structures provide the platform for PFC inputs to play a this idea was proposed by Allen Newell as the Blackboard much more active role in shaping the thalamic response model: “Metaphorically, we can think of a set of work- in relay (MD) nuclei in comparison to the role that cor- ers, all looking at the same blackboard: each is able to tical feedback to sensory thalamic nuclei may play127. read everything that is on it and to judge when he has something worthwhile to add to it. This conception is Distinctive biophysical characteristics of non-relay tha- just that of Selfridge’s Pandemonium: a set of demons lamocortical projections play a complementary role in independently looking at the total situation and shrieking the computational scheme echoing an active blackboard. in proportion to what they see that fits their natures”101. Interestingly, activating MD does not generate spikes Blackboard AI systems, adapted based on this model, across a population of prefrontal cortical neurons they project to, while activating LGN generates spikes in pri- have a common knowledge base (blackboard) that is it- 118,126 eratively updated (written to and read from) by a group mary visual cortex . Instead, MD activation results of knowledge sources (specialist modules), and a control in overall enhancement of inhibitory tone, coupled with shell (organizing the updates by knowledge sources)102. enhanced local recurrent connectivity within the PFC. Although thalamocortical feed-forward inhibition is also Interestingly, this computational metaphor can also be observed in somatosensory barrel cortex in response to extended to the interaction between thalamus and cortex, thalamic stimulation40, MD-evoked inhibition in PFC though thalamic blackboard is not a passive one as in the exerts a more powerful inhibitory gain control27. This blackboard systems50,96,97. Although, initially the active difference must be rooted in the particular cortical tar- blackboard was used as an analogy for LGN computa- get pattern of MD projections. Specifically, MD directly tion, the nature of MD connectivity and its communica- targets parvalbumin-positive PV (and not somatostatin tion with cortex seem much more suitable to the type of –SOM) interneurons in layer I and III18,27,74,75,120, while computations that is enabled by an active blackboard. (for example) VM only “weakly” activates variety of layer Starting with an input, thalamus as the common black- I interneurons20. In fact, when SOM interneuros are si- board visible to processing (cortical) modules, initially lenced, MD-evoked feedforward inhibition is enhanced27. presents the problem (input) for parallel processing by As a result, while VM plays a more important role in modules. Here by module, we refer to a group of cortical excitation/inhibition balance, MD plays the modulator 9 role via varying the integration time window and tem- evolves. Second, to avoid turning into an NP-hard poral precision of cortical responses18. While individual (non-deterministic polynomial-time hardness) problem, pyramidal neurons harbor a broad response dynamics, there must exist a mechanism that stops this iterative the feedforward inhibition regulates the population dy- computation once an approximation has been reached namic via graded recruitment of individual neurons66. (Fig.5). Here, we propose a specific solution to the Increased conductance, noisy voltage fluctuations, and first problem and a plausible one for the later issue. We depolarization are the not-necessarily exclusive factors suggest that phase-dependent contextual modulation that define how the changes in the background input, i.e. serves to deal with the first issue and a multi-objective stimulus changes and their contextual relevance, affect optimization of efficiency (computational information the gain16,113. In addition, while sensory thalamocorti- gain) and economy (computational cost, i.e. metabolic cal synapses (i.e. LGN to visual cortex) onto fast-spiking needs and the required time for computation) handles inhibitory neurons manifest much higher release proba- the second issue (Fig.6). In both cases, we suggest bility than those onto pyramidal cells, MD projections that thalamus plays an integral role in conjunction with show similar presynaptic release probability among the cortex. two inhibitory and excitatory cortical neurons27. The co-variation of excitatory and feed-forward inhibitory re- sponse sets the control for graded recruitment of pyra- 66 midal neurons into population response . This con- Computational constrains and the role of thalamus in trol in itself is evoked by prior cortical excitation of MD, phase-dependent contextual modulation and as a result, the altered activity of PV interneurons in PFC can bias the response towards passive vs flex- ible contextual processing in a manner that is distinc- As mentioned earlier, we know that hierarchical con- tively different from the observed response of the sen- volutional neural networks (HCNN), which can recapitu- sory cortices27. These mechanisms show us why the late certain properties of static hierarchical forward mod- els, can not capture any processes that need to store chemogenetic inhibition of MD leads to impaired working 156 memory and flexible goal-directed behaviors106,107. Sim- prior states . As a result, context-dependent process- ing can be extremely hard to implement in neural net- ilarly, schizophrenics show reduced MD-PFC functional 117 coupling95 and deficits in prefrontal PV interneurons81, work models . The most widely used ANNs (Feedfor- highlighting the importance of the modulatory effect of ward nets , i.e. multilayer perceptrons/Deep Learning MD on PFc function27. algorithms) face fundamental deficiencies: the ubiqui- tous training algorithms (such as back-propagation), i) have no biophysical plausibility, ii) have high computa- tional cost (number of operations and speed), and iii) I. COMPUTATIONAL AND METABOLIC CONSTRAINS require millions of examples for proper adjustment of the connections’ weights. These features render feedforward To process the changing stimuli and altering contex- NNs not suitable for temporal information processing. In tual cues, and in order to achieve cognitive flexibility, contrast, recurrent neural networks (RNNs) can univer- the thalamocortical system needs to harbor temporal sally approximate the state of dynamical systems38, and buffering. The mechanisms that we described here because of their dynamical memory are well suited for point to the ways in which such buffer may takes place, contextual computation. If the higher cortical areas were namely: a) changes in the stimuli/context are constantly to show some features of RNN-like networks, as man- reprocessed by the cortex and the outputs are rewritten ifested by the dynamical response of single neurons89, to thalamus, b) thalamus constantly reshapes the corti- then we anticipate that the local computation (interac- cal population dynamics. MD is changing the mode by tion between neighboring neurons) to be mostly driven which PFC neurons interact with one another, initiating by external biases. The thalamic projections could then and updating different attractor dynamics underlying play the role of bias where they seed the state of the distinct cognitive inputs. As a result, the thalamocor- network. From both anatomical studies and electrophys- tical system, collectively and at any instant, keeps an iological investigations46, we know that thalamus is at a updated description of the stimulus/context over some prime position to modify the signal based on the cogni- computational cycles up to the present (Fig.5). Perhaps tive processing that is happening in the cortex12,126.This the upper bound of these computational cycles is tightly thalamic-driven regulation entails “binding in time” since bound to the particulars of the cortico-thalamic and MD-like thalamus modifies its output cortex at a given thalamocortical connectivity and biophysical constrains. time and is itself influenced by what is perceived by the However, since we are dealing with a biological system cortex in time prior. But how can the “binding in time” with finite resources, this back and forth communication avoid locking-in the thalamic function to a set of inputs needs to have certain characteristics to provide a viable at a given time? How can thalamus constantly be both computational solution. First and foremost, the control ahead of cortex and yet keep track of the past informa- of interaction and its scheduling has to have a plausible tion? The secret may be embedded in the non-recurrent biological component and should bind solutions as time intrinsic structure of thalamus, the recurrent structure of 10 the higher cortical areas, , and the phase-sensitive detec- tion that biases and binds the locally recurrent activity in cortex, with large-scale feedback loops.

To expand the idea further, let’s revisit some core of computation (Fig.5), cortical feedback leads to the attributes of cognitive processing. Based on the ob- release of GABA (via reticular nucleus) in MD68. It has servations of behavior, higher cognition requires “effi- been shown that the increased GABA alters the opening cient computation”, “time delay feedback”, the capacity of T-type Ca2+ channels22, which in the case of MD, re- to “retain information” and “contextual” computational sults in enhanced MD-PFC interaction; yielding mutual properties. Such computational cognitive process surpass drive of the corticothalamic and thalamocortical activity the computational capacity of simple RNN-like networks. together68. Through this calcium-based low-threshold The essential required properties of a complex cognitive spiking, the gradual synchronization of MD and PFC system of such kind are: 1) input should be nonlinearly ensues62,68. As a result, MD units and PFC show strong mapped onto the high-dimensional state, while differ- phase-locked synchrony106. This gradual phase-locking ent inputs map onto different states, 2) slightly different mechanism forms the basis of the temporal dynamics states should map onto identical targets, 3) only recent that can nonlinearly (through consecutive cycles of past should influence the state and network is essentially computation) change the cortical activity as a result of unaware of remote past, 4) a phase-locked loop should novel stimuli or unexpected contextual changes as ob- decode information that is already encoded in time and served experimentally12,126. In fact, abnormal activity of 5) the combination of 1-4, should optimize sensory pro- T-type Ca2+ channels in MD, leads to hypersynchrony cessing based on the context. The first three attributes of in PFC neurons and frontal lobe-specific seizures68. The such system have close relevance to constrains and com- interactions between non-relay thalamus and cortex, putational properties of higher cortical areas (prefrontal). collectively, is neither feedforward, nor locally recurrent, The same three are also the main features of reservoir but it has a mixture of non-recurrent phase encoder that computing, namely “separation property”, “approxima- keeps copies of the past processing and modulates the tion property” and “fading memory”56,57,85,86. Interest- sensory input relay and its next step processing (Fig.5). ingly, and RC system can “non-linearly” map a lower The distinctive short-term dynamics are well matched dimensional system to a high-dimensional space facilitat- with the divergent structure-function relationship of ing classification of the elements of the low-dimensional sensory and non-relay MD-like thalamic nuclei. These space. The last two properties match the structure and features further emphasize that the perceptual and computational constraints of non-relay thalamic system cognitive processing can not be solely cortico-centric as a contextual modulator that is phasically changing operations. the input to the RC system. In fact, in an RC model of prefrontal cortex, addition of a phase neuron signif- icantly improved the networks performance in complex cognitive tasks. The phase neuron improves the per- formance by generating input driven attractor dynamics BIOLOGICAL CONSTRAINS AND THE ROLE OF that best matched the input33. This advantageous phase- THALAMUS IN COMPUTATIONAL OPTIMIZATION based bias effect is not limited to the simulation or phys- iological RC-like neural circuitry. In a recent study, elec- Computation and optimization are two sides of the tronic implementation and numerical studies of a limited same coin. But how does the brain optimize the com- RC system of a single nonlinear node with delayed feed- putations that would match its required objective, i.e. back has shown efficient information processing3. Such cognitive processing? There is a current trend of thinking reservoir’s transient dynamical response follows delay- that brain optimizes some arbitrary functions, with the dynamical systems, and only a limited set of parameters hopes that the future discovery of these unknown func- are required to set the rich dynamical properties of delay tions may guide us to establish a link between brain’s systems55. This system was able to effectively process operations and deep learning90. This line of approach time-dependent signals. to optimizational (and computational) operations of the brain has few flaws. First, it avoids specifying what func- The phase neuron33 and delayed dynamical RC3 both tion the brain is supposed to optimize (and as a result show properties that resemble the structure-function of it remains vague). Second, it refrains from addressing MD-like thalamus as discussed here. Specifically, the certain limitations that brain has to cope with due to phasic recruitment of cortical neurons is invoked due biological constrains. First of these limitations is the im- to the combination of cortical influence on non-relay portance of using just enough resources to solve the cur- thalamic neurons through direct and indirect corti- rent perceptual problem. Second is the necessity to come cothalamic projections (see reticular nucleus inhibitory up with a solution just in (the needed) time. The impor- influence on thalamic neurons, Fig.??). In a given cycle tance of “just-enough” and “just-in-time” computation in 11

Static Data A 11 0 1 1 0 1 0 1 01 1 0 1 1 0 1 1 10 1 1 1 0 1 0 0 ...... Function 01 1 0 1 1 0 0 0 Classification ...... Weight .3.1 .5 .4 .6 .7 0 .2 .1 10 1 1 1 0 1 1 1 01 1 0 1 1 0 1 1 11 0 1 1 0 1 0 1 ...... B

Weight .2.3 .9 .6 .1 ... Pointer ab c d e ... (a) (b) (c) (d) (e) Dynamic Data Function 0 1 1 0 1 1 0 1 1 01 1 0 1 1 0 0 0 Time Weight .3 .1 .5 .4 .6 .7 0 .2 .8 .3.1 .5 .4 .6 .7 0 .2 .1

11 0 1 1 0 1 0 1 0 1 1 0 1 1 0 1 1 Weight .3.7 .5 .6 .2 ... ab c d e ... 10 1 1 1 0 1 0 0 Pointer (a) (b) (c) (d) (e) ...... Function 1 0 1 0 1 10 1 1 0 0 1 1 0 1 1 0 1 ...... Weight .7 .3 .6 .3 .2 .2.5 .5 .2 .6 .8 .6 0 .3 .3 .5 .3 .4 tn 10 1 1 1 0 1 1 1 01 1 0 1 1 0 1 1 11 0 1 1 0 1 0 1 tn+dt Weight .5.9 .6 0 0 ... Pointer ab c d e ... tn+3dt tn+2dt (a) (b) (c) Function 11 1 0 0 1 0 1 1 0 1 0 1 1 0 C Non-recurrent Weight .8 .6 .5 0 .1 .5 .1 .6 0 .9 .8 .1 .7 .4 .2

......

Reservoir

Readout

Figure 5. Schematic representation of thalamic cognitive contextual computation. (A): In the case of static data, a set of function/weight modules can yield good classification. Function represents a polynomial (since any system that is known to be a polynomial Hamiltonians of low order can be accurately approximated by neural networks82) and the weights exemplify the connection matrix of an artificial neural network encapsulating this polynomial. Stacking multiple of such module can increase the accuracy of polynomial approximation (such as in the case of CNN). (B): Thalamo-cortical computation for contextual processing of dynamic data. Each dataframe is processed by a weight/pointer module (thalamus MD-like structure) which like a blackboard is writable by different sets of neuronal assemblies in cortex. Thalamic pointers assign the assemblies; modules’ weights adjust the influence of each assembly in further computational step [inset C shows a non-recurrent thalamic nuclei (MD-like) modulating the weights in the PFC (reservoir and readout). Here, depending on the context (blue or red), the interactions between MD and Reservoir, between Reservoir and Readout, and between Readout and MD could pursue one of the two possible outcomes. Specifically, MD changes the weights in the Reservoir to differentially set assemblies that produce two different attractor states, each leading to one of the two possible network outputs]. In (B, C), each operation of the thalamic module is itself influenced, not only by the current frame (t), but also by the computation carried by cortex module on the prior frame (t − 1). Cortical module is composed of multiple assemblies where each operate similar to the function/weight module of the static case. These assemblies are locally recurrent and each cell may be recruited to a different assembly during each operation. This mechanism could explain why prefrontal cells show mixed selectivity in their responses to stimuli (as reported in39,116). Through this recursive interaction between thalamus and cortex, cognition emerges not as just a pattern matching computation, but through contextual computation of dynamic data (bottom right schematic drawing).

cortical computation should not be overlooked30. If the continued activity since the metabolic demand surpasses first condition is not met, the organism can not sustain the dedicated energetic expenditure and the animal can 12

Information A1Cost A2 From estimates of the cost of cortical computation79, we η" η# know that the high cost of spiking forces the brain to η! rely on sparse communication and using only a small 5,134 ζ# fraction of the available neurons . While, theoreti- η€ cally, cortex can dedicate a large number of neurons (and ζ€ ζ! ζ" very high dynamical space) to solve any cognitive task, metabolic demand of such high-energetic neural activity renders such mechanism highly inefficient. As a result, B the “law of diminishing returns” dictates that increased η" η# energetic cost causing excessive pooling of active neurons to an assembly would be penalized103. The penalization η! for unnecessary high-energetic neural activity, in itself, should be driven by the nature of computation rather than being formulated as a fixed arbitrary threshold im- η€ ζ€ posed by an external observer. On the other hand, a ζ# ζ" ζ! system can resort to low-cost computation at any given time but dedicate long enough time to solve the task on hand. Naturally, such system would not be very relevant C to the biological systems since time is of essence. If an animal dedicates a long instance of its computational ca- ζ# pacity to solve a problem, the environment has changed ζ" before it reaches a solution and the solution becomes ob- solete. A deer would never have an advantage for its

Cost ζ! brain to have fully analyzed the visual scene instead of ζ€ spotting the approaching wolf and shifting resources to the most-needed task, i.e. escape. As a result, many of the optimization techniques and concepts that may η€ η! η" η# be relevant to artificial neural networks are irrelevant to Information embodied computational cognition of the brain. The op- timization that the brain requires is not aiming for the best possible performance, but rather needs to reach a Figure 6. Dynamic role of thalamo-cortical system in good mixture of economy and efficiency. the information/cost optimization. (A) Iso-maps of in- formation (A1) and cost (A2) in the domain specified by ω Not surprisingly, these constrains, i.e. efficiency and and λ (functions of cortical and thalamic activity). Informa- economy, are cornerstones of homeostasis and are ob- tion across each Iso-quant curve (η1 for example) is constant served across many scales in living systems145. The and is achieved at a certain mixture of ω and λ. Optimal in- simple “Integral feedback” acts as the mainstay of con- formation can be obtained by moving outward (arrows, A1 ). Cost optimization can be achieved by moving inward (A2 ). trol feedback in such homeostatic systems (such as E (B) since information and cost are both defined in the domain Coli heat-shock or DNA repair after exposure to gamma 26,31,32,71 of ω and λ, thalamus and cortex jointly contribute to informa- radiation) . Change in input leads to change tion and cost optimization. The points where the iso-quant in the output and the proportional change in the con- curves’ tangents are equal (black dashed line), provide the op- troller aiming to reset the output to the desired regime. timal combination of information/cost (green curve). In any When the integral feedback is disrupted, the system can cycle of cognitive operation, depending on the prior state of no longer reach proper homeostasis and either efficiency the system (ω and λ), the nearest points on the green curve or economy (or even both) will be sub-optimal31,32,145. are the optimal solutions for ending that cycle. (C) map- Many different etiologies could be behind the integral ping of the optima curve to information/cost domain shows feedback disruption, but the outcome is loss of robust re- all pareto efficient allocations (cyan curve). The slope of the parto frontier shows how the system trades cost versus in- sponse in uncertain environments. The presence of feed- formation: along the pareto curve, efficiency is constant but forward and feedback loops provide the means for robust the exchange between information and cost is not. All allo- and fast operation in processing fluctuating incoming in- cations inside of this curve could be improved as thalamus puts. This feedback regulation and operational robust- and cortex interact. The grey zone shows the biophysically ness has an energetic and computational cost for the sys- not-permissible allocation of computation and resource. tem. Although for simple systems it is feasible to asso- ciate the exact cost of an operation to the overall com- putational cost of the system, scaling the metabolic cost of feedback regulations to large networks remains a chal- not survive. In fact, the communication in neural net- lenge since it will involve multiple feedback loops, non- works are highly constrained by number of factors, specif- linear dynamics and numerous uncertain parameters23. ically the energetic demands of the network operations78. Specifically in the case of a single neurons, branching 13 architecture, non-uniform ion channel distributions and frequency stimulation of MD can induce long-term de- conduction states of action potentials affect the rate of pression/potentiation or in mPFC (medial PFC); how- energy consumption63. However, this electrochemical en- ever, the exact sign and magnitude of the differen- ergy of single neuron operation does not linearly scale to tial modulation of thalamo-prefrontal functions under the spent energy at networks level152,153. The total en- low and high input drive depends on the lack or pres- ergy function of neural populations will depend not only ence of Muscarinic and Nicotinic modulation14. Just as on the energy function of single neurons and their cou- thalamic injury/modulation can change the cortical ac- pling in a given neural population, but also on the flow of tivity and metabolism, cortical injuries (due to stroke information between different populations152,154. When for example) can cause an attenuation of the excita- a large pool of neurons is recruited to form multiple as- tory feedback to thalamus and lead to thalamo-cortical semblies to perform a certain computation, it is the inter- dysrhythmia148. Regardless of where the initial injury actions between the assemblies that will define the collec- has occurred, the disrupted thalamo-cortical interaction tive behavior of the discrete components. Since the cou- is conjoined with a misbalance in metabolism. The re- pled processes show additive entropy productions28, the sultant out of balance activity leads to cognitive disor- total energetic optimality of the desired function would ders that can happen in form of disrupted information depend on the feedback loops between the assemblies and processing due to cortical hypersynchrony as a results of how these feedbacks control the intrinsic energy expendi- excessive thalamic spiking68 or faulty modulation of sen- ture of a given assembly. These attributes are inline with sory signals and loss of the normal correlation between the general principle of modular composition of biolog- glucose metabolism in the thalamus and PFC15,65. The ical systems24,25,51. From the dynamical systems’ per- exact celullar/subcellular mechanism that lies beneath spective, to understand the operational principles (here, the joint fluctuations of firing and metabolic of cortex of large assembly of neurons), we do not need to the and thalamus is not very well understood and number of strip down the assembly to its individual component level mechanisms may act (not necessarily exclusively). For (here, individual neurons)25. As a result the optimal con- example, reduced cortical feedback may lead to thalamic trol at the functional scale of modules where the interac- hyperpolarization, and the resultant de-inactivation of tion between the system’s modules take place23,25. voltage-gated T-type Ca channels may cause the neurons 148 The constrains that we discussed above, directly trans- to switch from tonic spiking to a pathological bursting . late to the computational operations of thalamocortical Or it could be that the thalamic drive of the inhibitory system as we discussed. Instead of just trying to deal neurons in the cortex not only directly affect the corti- 34 with one fitness function at a time (where the minima cal mode of firing but also change the glycogenolysis in of the landscape would be deemed as “the” optima), astrocytes through Vasoactive intestinal peptide (VIP) 87,88 the brain has to perform a multi-objective optimization, interneurons . Interestingly, and in contrast to the finding solutions to both metabolic cost (economy) and noradrenergic afferent fibers that span horizontally across just-in-time (efficiency) computation. Thus we can infer cortical domains, VIP neurons have a bipolar architec- 88 that a unique solution does not exist for such a problem. ture and therefore their effect is spatially limited , likely Rather, any optimization for computational efficiency correlated to the size of the functional assemblies that are will cost us economy and any optimization for economy recruited to perform a computational task. Whichever will cost us efficiency. In such case, a multi-objective op- the exact mechanism at the cellular level is, the collec- timization pareto frontier is desirable. Pareto frontier of tive activity of modulatory thalamus and cortex drives information/cost will be the set of solutions where any the optimization that inherently can not be controlled other point in the space is objectively worse for both by the information available at the scale of single neu- of the objectives42,72,145. As a result, the optimization rons and solely in cortex. mechanism should push the system to this frontier. The To formalize multiobjective optimization, consider a iterative dynamical interaction between thalamus and set of functions, ω and λ of fT h (firing rate of thala- cortex seems to provide an elegant solution for this prob- mic cell) and fCx (firing rate of cortical cells). Uncer- lem (We discuss this in more details below). tainty (or its opposite, information) and computational In addition to these theoretical rationals, we also wish cost (a mixture of time and metabolic expense) can both to point to some observations that support the emer- be mapped to this functional space of ω (fT h, fCx) and gent optimization in the thalamo-cortical system. For λ (fT h, fCx) (Fig.6A1,2). Let’s define computational cost and information as product and linear sum of cor- example, metabolic studies have shown that following dn dn tical and thalamic activity ( αf n .βf n , θ fT h + ψ fCx ; thalamic injuries, a misbalance in cortical metabolism T h Cx dt dt ensues6,7,77,80. Moreover, in healthy humans (and not with α, β, θ, ψ as coefficients) to reflect the logarithmic in mood disorder patients), the metabolic rate of thala- nature of information (entropy) and the fact that biolog- mus directly relates to the power of cortical oscillations83. ical cost is an accelerating function of the cost-inducing The misbalance in cortical metabolism have been ob- variables26. The hypothetical space of cost/information served in variety of nuclei damages, but are specially is depicted in Fig.6, where top panels show indifference pronounced in mediodorsal, centre median or pulv- maps of information (A1) and cost (A2). The example inar injuries6,7. In addition, low-frequency and high- simulations and parametric plots of the cost and infor- 14 mation functions defined as above are shown in Fig.7. In each indifference map, along each iso-quant curve, the total functional attribute is the same. For example, anywhere on the η1 curve, the uncertainty (or informa- tion) in our computational engine is the same. How- ever, different iso-quant curves represent different lev- els of the functional attribute. For example, moving outward increases information (reduces uncertainty) as η1 < η2 < η3 < η4 and thus if computational cost was not a constrain, the optimal solution would have existed on η4 or further away (Fig.6 A1). In contrast, moving inward would preserve the cost (ζ1 < ζ2 < ζ3 < ζ4) if the computational engine did not have the objective of reducing uncertainty (Fig.6A2). Since information and cost are interdependent and both depend on the in- teraction between thalamus and cortex, we suggest that information/cost optimization happens through an it- erative interaction between thalamus and cortex (note Figure 7. Dynamic parameter space of thalamo- the blackboard analogy and contextual modulation dis- cortical joint optimization of information/cost. Three cussed above). Since we defined both information and different realization of information/cost interaction as a func- tion of thalamic and cortical activity (ω, λ) and the corre- cost as a set of iso-quant curves in the functional space of sponding pareto curves (see Fig.6 for details of this opti- ω (fT h, fCx) and λ (fT h, fCx), they can be co-represented mization construct). Pareto curve shows the optimal set of in the same space (Fig.6B). Optimal solutions for in- both cost and information that can be obtained given the bio- formation/cost optimizations are simply the solutions to physical constrains of neurons and networks connecting them. where the tangents of the iso-quant curves are equal (see Every point on the pareto frontier shows technically efficient the tangents [black dashed lines] and points A, B, C and levels for a given parameter set of ω, λ (see text for more de- D, in Fig.6B). These points create a set of optimal solu- tails). All the points inside the curve are feasible but are not tions for the tradeoff between information and cost (green maximally efficient. The slope (marginal rate of transforma- curve). Mapping of the optimal solutions to the compu- tion between cost and information) shows how in order to tational efficiency space E, gives us the pareto efficient increase information, cost has to change. The dynamic na- curve (cyan curve, Fig.6C). Anywhere inside the curve ture of interaction between thalamus and cortex enables an emergent optimization of information/cost depending on the is not pareto efficient (i.e. information gain and compu- computational problem on hand and the prior state of the tational cost can change in such a way that, collectively, system. the system can be in a better state (on the pareto curve). Points outside of the pareto efficient curve are not avail- able to the current state of the system due to the coeffi- cients of ω and λ. A change in these coefficients can po- composed of the two variables information and cost tentially shape a different co-representation of informa- (as the objective functions, shown in bottom panels of tion and cost (see Fig7, top row for 3 different instances Fig.6 and Fig.7), solving a computational problem of ω and λ based on different α, β, θ, ψ values), and is represented by a decrease in uncertainty. However, thus a different pareto efficient curve (see Fig7, bottom any change in uncertainty has an associated cost. First row). These different possible pareto frontiers can be set derivative of the pareto frontier shows “marginal rate based on the prior state of the system and the complex- of substitution” as ∆info . This ratio varies among ∆cost ity of the computational problem on hand. For example, different points on the pareto efficient curve. If we the modulatory MD-like thalamic triggering of feedfor- take two points on the pareto curve in the computa- ward inhibition of layer I interneurons and layers II/III tional efficiency space, such as A and C for example, 20 pyramidals may tune the cortical activity to a sus- computational efficiency of these two points are equal tained profile of during wakefulness49. Or in con- EA(η1,λ4) = EC(η3,λ2). The change in efficiency of trast to relay thalamus where maximum responsiveness point A with respect to information and cost, are the to transient signals (such as sensory stimuli onset/offset) partial derivatives ∂EA and ∂EA , respectively. As a is needed13,119, the MD-like modulatory drive may be ∂info ∂ cos t result, ∂EA d + ∂EA d = 0, meaning that there is invoked for tasks where working memory and contextual ∂ inf o inf o ∂ cos t cos t processing are needed27,106. Nonetheless, the computa- constant efficiency along the pareto curve, the tradeoff tional efficiency of the system can not be infinitely pushed between information and cost is not constant. The outward because of the system’s intrinsic biophysical con- optimization in this space is not based on some fixed strains (neurons and their wiring). The shaded region in built-in algorithm or arbitrary thresholds by an external Fig7, bottom row, shows this non-permissible zone. observer. Rather, information/cost optimization is the result of back and forth interaction between thalamus In the defined computational efficiency space E, and cortex. Based on the computational perspective 15 that we have portrayed, thalamus seems to be poised processes8? . Under this emerging paradigm, thalamus to operate as an optimizer. Thalamus receives a copy plays two distinctive roles: a) information relay, b) mod- of (sensory) input while relaying it, and receives an ulation of cortical function133, where the neocortex does efferent copy from the processor (cortex), while try- not work in isolation but is largely dependent on tha- ing to efficiently bind the information from past and lamus. In contrast to cortical networks which operate present and sending it back to cortex. The outcome as specialized memory devices via their local recurrent of such emergent optimization, is a pareto front in the excitatory connections, the thalamus is devoid of local economy-efficiency landscape (Fig.6,7). If the cortex connections, and is instead optimized for capturing state were to be the sole conductor of cognitive processing, information that is distributed across multiple cortical the dynamics of the relay and cortical processing would nodes while animals are engaged in context-dependent meander in the parameter space and not yielding any task switching126. This allows the thalamus to explic- optimization that can provide a feasible solution to itly represent task context (corresponding to different economic and just-in-time computation. Such system is combinations of cortical states), and through its unique doomed to fail, either due to metabolic costs or due to projection patterns to the cortex, different thalamic in- computational freeze over time ; thus more or less be a puts modify the effective connections between cortical useless cognitive engine. In contrast, with the help of neurons12,126. an optimizer that acts as a contextual modulator, the acceptable parameters will be confined to a manifold Here, we started with a brief overview of the architec- within the parameter space. Such regime would be a ture of thalamus, the back and forth communication be- sustainable and favorable domain for cognitive comput- tween thalamus and cortex, then we provided the electro- ing. This property shows another important facet of physiological evidence of thalamic modulatory function, a thalamo-cortical computational cognitive system and and concluded with a computational frame that encap- the need to move passed the cortico-centric view of cog- sulates the architectural and functional attributes of the nition. An important consequence of this formalization thalamic role in cognition. In such frame, the computa- is that it provides us testable hypotheses for objectively tional efficiency of the cognitive computing machinery is evaluating information and cost optimization. By careful achieved through iterative interactions between thalamus simultaneous measurements of thalamic and cortical and cortex embedded in the hierarchical organization collective activity, during different states and under dif- (Fig.4,5). Under this emergent view, thalamus serves ferent neurotransmitter modulatory effects, one should not only as relay, but also as a read/write medium for be able to examine the distinctive interaction of cortex cortical processing , playing a crucial role in contextual and MD-like versus relay thalamic nuclei. Although we modulation of cognition (Fig.8). Such multiscale organi- wish to emphasize that while information processing zation of computational processes is a necessary require- is a fundamentally energy-consuming process10,108 and ment for design of the intelligent systems24,135,136. Dis- one can drive theoretical estimates of the energetic tributed computing in biological systems in most cases cost of the activity of a population of neurons, the operates without central control100. This is well reflected exact translation of bit to watts in adaptive information in the computational perspective that we discussed here. processing systems (such as thalamocortical) can only We suggest that through the continuous contextual mod- be verified experimentally37. Without proper and ulation of cortical activity, thalamus (along with cortex) careful measurements, it is impossible to predict how plays a significant role in emergent optimization of com- much more reliable the collective computation could putational efficiency and computational cost. This phe- get at the expense of energy4. Likewise, the degree nomenon has a deep relation with phase transitions in to which the energy is traded for accuracy/speed (or complex networks. Different states (phases) of the net- their combination) will be a hard challenge for the work are associated with the connectivity of the com- experimentalists measuring the collective activity76. puting elements (see thalamic weight/pointer and corti- cal function/weight modules in Fig.5). Interestingly, in- trinsic properties of the complex networks do not define the phase transitions in system. Rather, the interplay CONCLUDING REMARKS: REFRAMING THALAMIC of the system with its external environment shapes the FUNCTION ABOVE AND BEYOND INFORMATION landscape where phase transitions occur130. This par- RELAY allel in well-studied physical systems and neuronal net- works of thalamo-cortical system show the importance of Lately, new evidence about the possible role of tha- the interplay between thalamus and cortex in cognitive lamus has started to challenged the cortico-centric view computation and optimization. The proposed frame for of perception/cognition. Anatomical studies and physio- contextual cognitive computation and the emergent in- logical measurements have begun to unravel the impor- formation/cost optimization in thalamo-cortical system tance of the Cortico-Thalamo-Cortical loops in cognitive can guide us in designing novel AI architecture. 16

Figure 8. The emergent view of thalamic role in cognition. (Top) In the traditional view, serial processing of information confines the role of thalamus to only a relay station. (Bottom) the view that is discussed in this manuscript considers thalamus as a key player in cognition, above and beyond relay to sensory cortices. Through combining the efferent readout from cortex with sensory afferent, MD-like thalamic nuclei modulate further activity of the higher cortex. The contextual modulation enabled by MD is composed of distinctively parallel operations (individual circles represent the non-recurrent nature of these processes due to lack of local excitatory connections). Under this view, and the computational operatives discussed here, the thalamo-cortical system (and not just cortex) is in charge of contextual cognitive computing. The computation enabled by Pulvinar/PO like nuclei is different from LGN and also from MD-like nuclei.

ACKNOWLEDGMENTS man, T., Wakeman, E. A., and Rolls, E. T. (1997). Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proceedings of the Royal Society B: Biological We wish to thank Michael Halassa for helpful discus- Sciences, 264(1389):1775–1783. sions. 6Baron, J., D’antona, R., Pantano, P., Serdaru, M., Samson, Y., and Bousser, M. (1986). Effects of thalamic stroke on energy metabolism of the cerebral cortex: a positron tomography study in man. Brain, 109(6):1243–1259. REFERENCES 7Baron, J., Levasseur, M., Mazoyer, B., Legault-Demare, F., Mauguiere, F., Pappata, S., Jedynak, P., Derome, P., Cam- bier, J., and Tran-Dinh, S. (1992). Thalamocortical diaschisis: 1 Alcaraz, F., Fresno, V., Marchand, A. R., Kremer, E. J., Cou- positron emission tomography in humans. Journal of Neurology, tureau, E., and Wolff, M. (2018). Thalamocortical and corti- Neurosurgery & Psychiatry, 55(10):935–942. cothalamic pathways differentially contribute to goal-directed 8Basso, M. A., Uhlrich, D., and Bickford, M. E. (2005). Cortical behaviors in the rat. eLife, 7. Function: A View from the Thalamus. Neuron, 45(4):485–488. 2 Alel´u-Paz, R. and Gim´enez-Amaya, J. M. (2008). The 9Baxter, M. (2013). Mediodorsal thalamus and cognition in non- mediodorsal thalamic nucleus and schizophrenia. Journal of human primates. Frontiers in Systems Neuroscience, 7:38. psychiatry & neuroscience: JPN, 33(6):489. 10Bennett, C. H. (2003). Notes on landauer’s principle, reversible 3 Appeltant, L., Soriano, M. C., Van der Sande, G., Danckaert, J., computation, and maxwell’s demon. Studies in History and Phi- Massar, S., Dambre, j., Schrauwen, B., Mirasso, C. R., and Fis- losophy of Science Part B: Studies in History and Philosophy cher, I. (2011). Information processing using a single dynamical of Modern Physics, 34(3):501 – 510. Quantum Information and node as complex system. Nature Communications, 2:468. Computation. 4 Ay, N., Flack, J., and Krakauer, D. C. (2007). Robustness and 11Bickford, M. E. (2016). Thalamic Circuit Diversity: Modula- complexity co-constructed in multimodal signalling networks. tion of the Driver/Modulator Framework. Frontiers in Neural Philosophical Transactions of the Royal Society B: Biological Circuits, 9. Sciences, 362(1479):441–447. 12Bolkan, S. S., Stujenske, J. M., Parnaudeau, S., Spellman, T. J., 5Baddeley, R., Abbott, L. F., Booth, M. C. A., Sengpiel, F., Free- 17

Rauffenbart, C., Abbas, A. I., Harris, A. Z., Gordon, J. A., and 33Enel, P., Procyk, E., Quilodran, R., and Dominey, P. F. (2016). Kellendonk, C. (2017). Thalamic projections sustain prefrontal Reservoir Computing Properties of Neural Dynamics in Pre- activity during working memory maintenance. Nature Neuro- frontal Cortex. PLOS Computational Biology, 12(6):e1004967. science, 20(7):987–996. 34Fan, D., Duan, L., Wang, Q., and Luan, G. (2017). Combined 13Bruno, R. M. and Sakmann, B. (2006). Cortex is driven by effects of feedforward inhibition and excitation in thalamocorti- weak but synchronously active thalamocortical synapses. Sci- cal circuit on the transitions of epileptic seizures. Frontiers in ence, 312(5780):1622–1627. Computational Neuroscience, 11:59. 14Bueno-Junior, L. S., Lopes-Aguiar, C., Ruggiero, R. N., Romcy- 35Felleman, D. J. and Essen, D. C. V. (1991). Distributed Hier- Pereira, R. N., and Leite, J. P. (2012). Muscarinic and nicotinic archical Processing in the Primate Cerebral Cortex. Cerebral modulation of thalamo-prefrontal cortex synaptic pasticity in Cortex, 1(1):1–47. vivo. PLOS ONE, 7:1–11. 36FitzGibbon, T., Erik¨oz,B., Gr¨unert,U., and Martin, P. R. 15Byne, W., Buchsbaum, M. S., Kemether, E., Hazlett, v., Shin- (2015). Analysis of the lateral geniculate nucleus in dichromatic wari, A., Mitropoulou, V., and Siever, L. J. (2001). Mag- and trichromatic marmosets. Journal of Comparative Neurol- netic resonance imaging of the thalamic mediodorsal nucleus ogy, 523(13):1948–1966. and pulvinar in schizophrenia and schizotypal personality dis- 37Flack, J. (2017). Life’s Information Hierarchy, page 283–302. order. Archives of General Psychiatry, 58(2):133–140. Cambridge University Press. 16Cardin, J. A., Palmer, L. A., and Contreras, D. (2008). Cellular 38Funahashi, K. and Nakamura, Y. (1993). Approximation of dy- mechanisms underlying stimulus-dependent gain modulation in namical systems by continuous time recurrent neural networks. primary visual cortex neurons in vivo. Neuron, 59:150–160. Neural Networks, 6(6):801–806. 17Clasca, F., Rubio-Garrido, P., and Jabaudon, D. (2012). Un- 39Fusi, S., Miller, E. K., and Rigotti, M. (2016). Why neurons veiling the diversity of thalamocortical neuron subtypes. Eur J mix: high dimensionality for higher cognition. Current Opinion Neurosci, 35:1524–32. in Neurobiology, 37:66–74. 18Collins, D. P., Anastasiades, P. G., Marlin, J. J., and Carter, 40Gabernet, L., Jadhav, S. P., Feldman, D. E., Carandini, M., A. G. (2018). Reciprocal circuits linking the prefrontal cortex and Scanziani, M. (2005). Somatosensory integration controlled with dorsal and ventral thalamic nuclei. Neuron, 98(2):366– by dynamic thalamocortical feed-forward inhibition. Neuron, 379.e4. 48(2):315–327. 19Corbetta, M. (1998). Frontoparietal cortical networks for direct- 41Giguere, M. and Goldman-Rakic, P. S. (1988). Mediodorsal ing attention and the eye to visual locations: identical, indepen- nucleus: Areal, laminar, and tangential distribution of afferents dent, or overlapping neural systems? Proceedings of National and efferents in the frontal lobe of rhesus monkeys. Journal of Academy of Science, 95:831–8. Comparative Neurology, 277(2):195–213. 20Cruikshank, S. J., Ahmed, O. J., Stevens, T. R., Patrick, S. L., 42Godfrey, P., Shipley, R., and Gryz, J. (2006). Algorithms and Gonzalez, A. N., Elmaleh, M., and Connors, B. W. (2012). Tha- analyses for maximal vector computation. The VLDB Journal, lamic control of layer 1 circuits in prefrontal cortex. Journal of 16(1):5–28. Neuroscience, 32(49):17813–17823. 43Goldberg, J. H., Farries, M. A., and Fee, M. S. (2013). Basal 21Cruikshank, S. J., Lewis, T. J., and Connors, B. W. (2007). ganglia output to the thalamus: still a paradox. Trends in Neu- Synaptic basis for intense thalamocortical activation of feedfor- rosciences, 36(12):695–705. ward inhibitory cells in neocortex. Nature Neuroscience, 10. 44Goldman-Rakic, P. S. and Porrino, L. J. (1985). The primate 22Crunelli, V. and Leresche, N. (1991). A role for gabab receptors mediodorsal (md) nucleus and its projection to the frontal lobe. in excitation and inhibition of thalamocortical cells. Trends in Journal of Comparative Neurology, 242(4):535–560. Neurosciences, 14(1):16 – 21. 45Grant, E., Hoerder-Suabedissen, A., and Moln´ar,Z. (2012). De- 23Csete, M. E. and Doyle, J. C. (2002). Reverse engineering of velopment of the Corticothalamic Projections. Frontiers in Neu- biological complexity. Science, 295(5560):1664–1669. roscience, 6. 24Dehghani, N. (2017). Design of the Artificial: lessons from the 46Groh, A., Bokor, H., Mease, R. A., Plattner, V. M., Hangya, B., biological roots of general intelligence. ArXiv. Stroh, A., Deschenes, M., and Acs´ady, L. (2014). Convergence 25Dehghani, N. (2018). Theoretical principles of multiscale spa- of Cortical and Sensory Driver Inputs on Single Thalamocortical tiotemporal control of neuronal networks: A complex systems Cells. Cerebral Cortex, 24(12):3167–3179. perspective. Frontiers in Computational Neuroscience, 12:81. 47Halassa, M. M. and Acs´ady, L. (2016). Thalamic Inhibi- 26Dekel, E. and Alon, U. (2005). Optimality and evolutionary tun- tion: Diverse Sources Diverse Scales. Trends in Neurosciences, ing of the expression level of a protein. Nature, 436(7050):588– 39(10):680–693. 592. 48Halassa, M. M. and Kastner, S. (2017). Thalamic func- 27Delevich, K., Tucciarone, J., Huang, Z. J., and Li, B. (2015). tions in distributed cognitive control. Nature Neuroscience, The mediodorsal thalamus drives feedforward inhibition in the 20(12):1669–1679. anterior cingulate cortex via parvalbumin interneurons. Journal 49Harris, K. D. and Thiele, A. (2011). Cortical state and attention. of Neuroscience, 35(14):5743–5753. Nature Reviews Neuroscience, 12:509 EP –. 28Demirel, Y. (2011). Energy Coupling, pages 419–440. MIT 50Harth, E. M., Unnikrishnan, K. P., and Pandya, A. S. (1987). Press. The inversion of sensory processing by feedback pathways: a 29Deniau, J. M. and Chevalier, G. (1985). Disinhibition as a basic model of visual cognitive functions. Science, 237(4811):184–187. process in the expression of striatal functions. II. The striato- 51Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. nigral influence on thalamocortical cells of the ventromedial tha- (1999). From molecular to modular cell biology. Nature, 402:C47 lamic nucleus. Brain Research, 334(2):227–233. EP –. 30Douglas, R. J. and Martin, K. A. (2007). Mapping the Matrix: 52Heeger, D. J. (2017). Theory of cortical function. Proceedings The Ways of Neocortex. Neuron, 56(2):226–238. of the National Academy of Sciences, 114(8):1773–1782. 31El-Samad, H. J., Goff, J. P., and Khamash, M. H. (2002). 53Hubel, D. H. and Wiesel, T. N. (1959). Receptive fields of single Calcium Homeostasis and Parturient Hypocalcemia: An In- neurones in the cat's striate cortex. The Journal of Physiology, tegral Feedback Perspective. Journal of Theoretical Biology, 148(3):574–591. 214(1):17–29. 54Hubel, D. H. and Wiesel, T. N. (1962). Receptive fields binoc- 32El-Samad, H. J., Kurata, H., Doyle, J. C., Gross, C. A., and ular interaction and functional architecture in the cat's visual Khamash, M. H. (2005). Surviving heat shock: Control strate- cortex. The Journal of Physiology, 160(1):106–154. gies for robustness and performance. Proceedings of the National 55Ikeda, K. and Matsumoto, K. (1987). High-dimensional chaotic Academy of Sciences, 102(8):2736–2741. behavior in systems with time-delayed feedback. Physica D: 18

Nonlinear Phenomena, 29(1-2):223–235. 77Larson, C. L., Davidson, R. J., Abercrombie, H. C., Ward, R. T., 56Jaeger, H. (2001). The echo state approach to analysing and Schaefer, S. M., Jackson, D. C., Holden, J. E., and Perlman, training recurrent neural networks. Technical report. S. B. (1998). Relations between pet-derived measures of thala- 57Jaeger, H. (2007). Echo state network. Scholarpedia, 2(9):2330. mic glucose metabolism and eeg alpha power. Psychophysiology, 58Jazayeri, M. and Shadlen, M. N. (2015). A Neural Mechanism 35(2):162–169. for Sensing and Reproducing a Time Interval. Current Biology, 78Laughlin, S. B. and Sejnowski, T. J. (2003). Communication in 25(20):2599–2609. neuronal networks. Science, 301:1870–4. 59Jones, E. G. (1981). Functional subdivision and synaptic orga- 79Lennie, P. (2003). The Cost of Cortical Computation. Current nization of the mammalian thalamus. Int Rev Physiol, 25:173– Biology, 13(6):493–497. 245. 80Levasseur, M., Baron, J., Sette, G., Legault-Demare, F., Pap- 60Jones, E. G. (1985). Principles of Thalamic Organization. In pata, S., Mauguiere, F., Benoit, N., Dinh, S. T., Degos, J., The Thalamus, pages 85–149. Springer US. Laplane, D., et al. (1992). Brain energy metabolism in bilateral 61Jones, E. G. (1998). Viewpoint: the core and matrix of thalamic paramedian thalamic infarcts: a positron emission tomography organization. Neuroscience, 85(2):331–345. study. Brain, 115(3):795–807. 62Jones, E. G. (2002). Thalamic circuitry and thalamocortical 81Lewis, D. A., Curley, A. A., Glausier, J. R., and Volk, D. W. synchrony. Philosophical Transactions of the Royal Society of (2012). Cortical parvalbumin interneurons and cognitive dys- London B: Biological Sciences, 357(1428):1659–1673. function in schizophrenia. Trends in Neurosciences, 35(1):57– 63Ju, H., Hines, M. L., and Yu, Y. (2016). Cable energy function 67. of cortical axons. Scientific Reports, 6:29686 EP –. 82Lin, H. W., Tegmark, M., and Rolnick, D. (2017). Why Does 64Kakei, S., Na, J., and Shinoda, Y. (2001). Thalamic termi- Deep and Cheap Learning Work So Well? Journal of Statistical nal morphology and distribution of single corticothalamic ax- Physics, 168(6):1223–1247. ons originating from layers 5 and 6 of the cat motor cortex. The 83Lindgren, K. A., Larson, C. L., Schaefer, S. M., Abercrombie, Journal of Comparative Neurology, 437(2):170–185. H. C., Ward, R. T., Oakes, T. R., Holden, J. E., Perlman, S. B., 65Katz, M., Buchsbaum, M. S., Siegel Jr, B. V., Wu, J., Haier, Benca, R. M., and Davidson, R. J. (1999). Thalamic metabolic R. J., and Bunney Jr, W. E. (1996). Correlational patterns of rate predicts eeg alpha power in healthy control subjects but cerebral glucose metabolism in never-medicated schizophrenics. not in depressed patients. Biological Psychiatry, 45(8):943–952. Neuropsychobiology, 33(1):1–11. 84Ma, W. J. and Jazayeri, M. (2014). Neural Coding of Un- 66Khubieh, A., Ratt´e, S., Lankarany, M., and Prescott, S. A. certainty and Probability. Annual Review of Neuroscience, (2016). Regulation of cortical dynamic range by background 37(1):205–220. synaptic noise and feedforward inhibition. Cerebral Cortex, 85Maass, W., Natschlaeger, T., and Markram, H. (2003). Com- 26(8):3357–3369. putational Models for Generic Cortical Microcircuits. In Com- 67Kim, H. G., Beierlein, M., and Connors, B. W. (1995). In- putational Neuroscience. Chapman and Hall/CRC. hibitory control of excitable dendrites in neocortex. Journal of 86Maass, W., Natschl¨ager,T., and Markram, H. (2002). Real- Neurophysiology, 74(4):1810–1814. Time Computing Without Stable States: A New Framework 68Kim, J., Woo, J., Park, Y.-G., Chae, S., Jo, S., Choi, J. W., for Neural Computation Based on Perturbations. Neural Com- Jun, H. Y., Yeom, Y. I., Park, S. H., Kim, K. H., Shin, H.-S., putation, 14(11):2531–2560. and Kim, D. (2011). Thalamic t-type ca2+ channels mediate 87Magistretti, P. J. (2006). Neuron–glia metabolic coupling and frontal lobe dysfunctions caused by a hypoxia-like damage in the plasticity. Journal of Experimental Biology, 209(12):2304–2311. prefrontal cortex. Journal of Neuroscience, 31(11):4063–4073. 88Magistretti, P. J. and Allaman, I. (2015). A cellular perspective 69Komura, Y., Nikkuni, A., Hirashima, N., Uetake, T., and on brain energy metabolism and functional imaging. Neuron, Miyamoto, A. (2013). Responses of pulvinar neurons reflect 86(4):883 – 901. a subject's confidence in visual categorization. Nature Neuro- 89Mante, V., Sussillo, D., Shenoy, K. V., and Newsome, W. T. science, 16(6):749–755. (2013). Context-dependent computation by recurrent dynamics 70Krettek, J. E. and Price, J. L. (1977). The cortical projections in prefrontal cortex. Nature, 503(7474):78–84. of the mediodorsal nucleus and adjacent thalamic nuclei in the 90Marblestone, A. H., Wayne, G., and Kording, K. P. (2016). To- rat. Journal of Comparative Neurology, 171(2):157–191. ward an Integration of Deep Learning and Neuroscience. Fron- 71Krishna, S., Maslov, S., and Sneppen, K. (2007). UV-Induced tiers in Computational Neuroscience, 10. Mutagenesis in Escherichia coli SOS Response: A Quantitative 91Marenco, S., Stein, J. L., Savostyanova, A. A., Sambataro, Model. PLoS Computational Biology, 3(3):e41. F., Tan, H.-Y., Goldman, A. L., Verchinski, B. A., Barnett, 72Kung, H.-T., Luccio, F., and Preparata, F. P. (1975). On Find- A. S., Dickinson, D., Apud, J. A., Callicott, J. H., Meyer- ing the Maxima of a Set of Vectors. Journal of the ACM, Lindenberg, A., and Weinberger, D. R. (2012). Investigation 22(4):469–476. of Anatomical Thalamo-Cortical Connectivity and fMRI Acti- 73Kuramoto, E., Pan, S., Furuta, T., Tanaka, Y. R., Iwai, H., vation in Schizophrenia. Neuropsychopharmacology, 37(2):499– Yamanaka, A., Ohno, S., Kaneko, T., Goto, T., and Hioki, H. 507. (2017). Individual mediodorsal thalamic neurons project to mul- 92Mathers, L. H. (1972). The synaptic organization of the cortical tiple areas of the rat prefrontal cortex: A single neuron-tracing projection to the pulvinar of the squirrel monkey. The Journal study using virus vectors. Journal of Comparative Neurology, of Comparative Neurology, 146(1):43–59. 525(1):166–185. 93Mesulam, M. M. (1990). Large-scale neurocognitive networks 74Kuroda, M., Yokofujita, J., and Murakami, K. (1998). An ul- and distributed processing for attention, language, and memory. trastructural study of the neural circuit between the prefrontal Ann Neurol, 28:597–613. cortex and the mediodorsal nucleus of the thalamus. Progress 94Mitchell, A. S. (2015). The mediodorsal thalamus as a higher or- in Neurobiology, 54(4):417 – 458. der thalamic relay nucleus important for learning and decision- 75Kuroda, M., Yokofujita, J., Oda, S., and Price, J. L. making. Neurosci Biobehav Rev, 54:76–88. (2004). Synaptic relationships between axon terminals from 95Mitelman, S. A., Byne, W., Kemether, E. M., Hazlett, E. A., the mediodorsal thalamic nucleus and γ-aminobutyric acider- and Buchsbaum, M. S. (2005). Metabolic Disconnection Be- gic cortical cells in the prelimbic cortex of the rat. Journal of tween the Mediodorsal Nucleus of the Thalamus and Cortical Comparative Neurology, 477(2):220–234. Brodmann’s Areas of the Left Hemisphere in Schizophrenia. 76Lan, G., Sartori, P., Neumann, S., Sourjik, V., and Tu, Y. American Journal of Psychiatry, 162(9):1733–1735. (2012). The energy–speed–accuracy trade-off in sensory adap- 96Mumford, D. (1991). On the computational architecture of the tation. Nature Physics, 8:422 EP –. neocortex. I: The role of the thalamo-cortical loop. Biological 19

Cybernetics, 65(2):135–145. 117Rigotti, M., Rubin, D. B. D., Wang, X.-J., and Fusi, S. (2010). 97Mumford, D. (1992). On the computational architecture of the Internal representation of task rules by recurrent dynamics: the neocortex. II The role of cortico-cortical loops. Biological Cy- importance of the diversity of neural responses. Frontiers in bernetics, 66(3):241–251. Computational Neuroscience, 4. 98Nair, A., Treiber, J. M., Shukla, D. K., Shih, P., and M¨uller, 118Rikhye, R. V., Wimmer, R. D., and Halassa, M. M. (2018). To- R.-A. (2013). Impaired thalamocortical connectivity in autism wards an integrative theory of thalamic function. Nature Neu- spectrum disorder: a study of functional and anatomical con- roscience (In press), xxx(xxx):xxx–xxx. nectivity. Brain, 136(6):1942–1955. 119Rose, H. J. and Metherate, R. (2005). Auditory thalamocorti- 99Nakajima, M. and Halassa, M. M. (2017). Thalamic control of cal transmission is reliable and temporally precise. Journal of functional cortical connectivity. Current Opinion in Neurobiol- Neurophysiology, 94(3):2019–2030. ogy, 44:127–131. 120Rotaru, D. C., Barrionuevo, G., and Sesack, S. R. (2005). 100Navlakha, S. and Bar-Joseph, Z. (2014). Distributed informa- Mediodorsal thalamic afferents to layer iii of the rat prefrontal tion processing in biological and computational systems. Com- cortex: Synaptic relationships to subclasses of interneurons. munications of the ACM, 58(1):94–102. Journal of Comparative Neurology, 490(3):220–238. 101Newell, A. (1962). Some problems of basic organization in prob- 121Rouiller, E. M. and Welker, E. (2000). A comparative analysis lem solving programs. of the morphology of corticothalamic projections in mammals. 102Nii, P. (1986). The Blackboard Model of Problem Solving and Brain Research Bulletin, 53(6):727–741. the Evolution of Blackboard Architectures. The AI Magazine, 122Rovo, Z., Ulbert, I., and Acs´ady, L. (2012). Drivers of the 7(2):38–53. primate thalamus. J Neurosci, 32:17894–908. 103Niven, J. E. and Laughlin, S. B. (2008). Energy limitation as a 123Saalmann, Y. B. (2014). Intralaminar and medial thalamic in- selective pressure on the evolution of sensory systems. Journal fluence on cortical synchrony, information transmission and cog- of Experimental Biology, 211(11):1792–1804. nition. Frontiers in Systems Neuroscience, 8:83. 104Ohno, S., Kuramoto, E., Furuta, T., Hioki, H., Tanaka, Y., 124Saalmann, Y. B. and Kastner, S. (2011). Cognitive and per- Fujiyama, F., Sonomura, T., Uemura, M., Sugiyama, K., and ceptual functions of the visual thalamus. Neuron, 71(2):209 – Kaneko, T. (2012). A morphological analysis of thalamocortical 223. axon fibers of rat posterior thalamic nuclei: a single neuron 125Saalmann, Y. B. and Kastner, S. (2015). The cognitive thala- tracing study with viral vectors. Cereb Cortex, 22:2840–57. mus. Frontiers in Systems Neuroscience, 9:39. 105Parnaudeau, S., Bolkan, S. S., and Kellendonk, C. (2017). The 126Schmitt, L. I., Wimmer, R. D., Nakajima, M., Happ, M., Mo- Mediodorsal Thalamus: An Essential Partner of the Prefrontal fakham, S., and Halassa, M. M. (2017). Thalamic amplifica- Cortex for Cognition. Biological Psychiatry. tion of cortical connectivity sustains attentional control. Nature, 106Parnaudeau, S., O’Neill, P.-K., Bolkan, S. S., Ward, R. D., Ab- 545(7653):219–223. bas, A. I., Roth, B. L., Balsam, P. D., Gordon, J. A., and Kel- 127Schwartz, M. L., Dekker, J. J., and Goldman-Rakic, P. S. lendonk, C. (2013). Inhibition of mediodorsal thalamus disrupts (1991). Dual mode of corticothalamic synaptic termination in thalamofrontal connectivity and cognition. Neuron, 77(6):1151– the mediodorsal nucleus of the rhesus monkey. Journal of Com- 1162. parative Neurology, 309(3):289–304. 107Parnaudeau, S., Taylor, K., Bolkan, S. S., Ward, R. D., Balsam, 128Scott, B. B., Constantinople, C., Akrami, A., Hanks, T. D., P. D., and Kellendonk, C. (2015). Mediodorsal Thalamus Hy- Brody, C. D., and Tank, D. W. (2017). Fronto-parietal Corti- pofunction Impairs Flexible Goal-Directed Behavior. Biological cal Circuits Encode Accumulated Evidence with a Diversity of Psychiatry, 77(5):445–453. Timescales. Neuron, 95:385–398.e5. 108Parrondo, J. M. R., Horowitz, J. M., and Sagawa, T. (2015). 129Selfridge, O. G. (1959). Pandemonium: a paradigm for learning Thermodynamics of information. Nature Physics, 11:131 EP –. in . In Blake, D. V. and Uttley, A. M., editors, Proceedings of 109Partlow, G. D., Colonnier, M., and Szabo, J. (1977). Thala- the Symposium on Mechanisation of Thought Processes, pages mic projections of the in the rhesus mon- 513–526, London. National Physical Laboratory. key,Macaca mulatta. A light and electron microscopic study. 130Seoane, L. F. and Sol´e,R. (2015). Phase transitions in Pareto The Journal of Comparative Neurology, 171(3):285–317. optimal complex networks. Physical Review E, 92(3). 110Pelzer, P., Horstmann, H., and Kuner, T. (2017). Ultrastruc- 131Sherman, S. M. (2016). Thalamus plays a central role in ongoing tural and functional properties of a giant synapse driving the cortical functioning. Nature Neuroscience, 16(4):533–541. piriform cortex to mediodorsal thalamus projection. Frontiers 132Sherman, S. M. and Guillery, R. W. (2002). The role of the in Synaptic Neuroscience, 9:3. thalamus in the flow of information to the cortex. Philosoph- 111Pinault, D. (2004). The thalamic reticular nucleus: structure ical Transactions of the Royal Society B: Biological Sciences, function and concept. Brain Research Reviews, 46(1):1–31. 357(1428):1695–1708. 112Popken, G. J., Bunney, W. E., Potkin, S. G., and Jones, E. G. 133Sherman, S. M. and Guillery, R. W. (2013). Functional Con- (2000). Subnucleus-specific loss of neurons in medial thalamus nections of Cortical Areas. The MIT Press. of schizophrenics. Proceedings of the National Academy of Sci- 134Shoham, S., O’Connor, D. H., and Segev, R. (2006). How silent ences, 97(16):9276–9280. is the brain: is there a “dark matter” problem in neuroscience? 113Prescott, S. A. and De Koninck, Y. (2003). Gain control of Journal of Comparative Physiology A, 192(8):777–784. firing rate by shunting inhibition: Roles of synaptic noise and 135Simon, H. A. (1962). The Architecture of Complexity. Proceed- dendritic saturation. Proceedings of the National Academy of ings of the American Philosophical Society, 106(6):467–482. Sciences, 100(4):2076–2081. 136Simon, H. A. (1969). The Sciences of the Artificial, chapter The 114Purushothaman, G., Marion, R., Li, K., and Casagrande, V. Architecture of Complexity. MIT Press. (2012). Gating and control of primary visual cortex by pulvinar. 137Sipper, M. (1999). The emergence of cellular computing. Com- Nat Neurosci, 15:905–12. puter, 32(7):18–26. 115Raczkowski, D. and Fitzpatrick, D. (1990). Terminal arbors 138Sol´e,R. V. and Macia, J. (2013). Expanding the landscape of of individual physiologically identified geniculocortical axons in biological computation with synthetic multicellular consortia. the tree shrew's striate cortex. The Journal of Comparative Natural Computing, 12(4):485–497. Neurology, 302(3):500–514. 139Spacek, J. and Lieberman, A. (1974). Ultrastructure and three- 116Rigotti, M., Barak, O., Warden, M. R., Wang, X.-J., Daw, dimensional organization of synaptic glomeruli in rat somatosen- N. D., Miller, E. K., and Fusi, S. (2013). The impor- sory thalamus. Journal of , 117(Pt 3):487. tance of mixed selectivity in complex cognitive tasks. Nature, 140Steriade, M. and Deschenes, M. (1984). The thalamus as a 497(7451):585–590. neuronal oscillator. Brain Research Reviews, 8(1):1–63. 20

141Steriade, M., Domich, L., and Oakson, G. (1986). Reticu- properties of thalamic input to the subgranular layers of primary laris thalami neurons revisited: activity changes during shifts somatosensory and auditory cortices in the mouse. Journal of in states of vigilance. J Neurosci, 6:68–81. Neuroscience, 31(36):12738–12747. 142Steriade, M. and Llin´as,R. R. (1988). The functional states of 151Wang, J., Narain, D., Hosseini, E. A., and Jazayeri, M. (2018). the thalamus and the associated neuronal interplay. Physiolog- Flexible timing by temporal scaling of cortical responses. Nature ical Reviews, 68(3):649–742. Neuroscience. 143Steriade, M. and Pare, D. (2007). Morphology and electrore- 152Wang, R. and Wang, Z. (2014). Energy distribution property sponsive properties of thalamic neurons. In Gating in Cerebral and energy coding of a structural neural network. Frontiers in Networks, pages 1–26. Cambridge University Press. Computational Neuroscience, 8:14. 144Steriade, M., Parent, A., and Hada, J. (1984). Thalamic pro- 153Wang, R., Zhang, Z., and Chen, G. (2009). Energy coding and jections of nucleus reticularis thalami of cat: A study using energy functions for local activities of the brain. Neurocomput- retrograde transport of horseradish peroxidase and fluorescent ing, 73(1):139 – 150. tracers. The Journal of Comparative Neurology, 229(4):531– 154Wang, R. and Zhu, Y. (2016). Can the activities of the large 547. scale cortical network be expressed by neural energy? a brief 145Szekely, P., Sheftel, H., Mayo, A., and Alon, U. (2013). Evo- review. Cognitive neurodynamics, 10(1):1–5. lutionary Tradeoffs between Economy and Effectiveness in Bi- 155Woodward, N. D., Giraldo-Chica, M., Rogers, B., and Cascio, ological Homeostasis Systems. PLoS Computational Biology, C. J. (2017). Thalamocortical Dysconnectivity in Autism Spec- 9(8):e1003163. trum Disorder: An Analysis of the Autism Brain Imaging Data 146Uhlhaas, P. J. and Singer, W. (2010). Abnormal neural oscilla- Exchange. Biological Psychiatry: Cognitive Neuroscience and tions and synchrony in schizophrenia. Nature Reviews Neuro- Neuroimaging, 2(1):76–84. science, 11:100 EP –. 156Yamins, D. L. K. and DiCarlo, J. J. (2016). Using goal-driven 147Urbain, N. and Deschˆenes, M. (2007). Motor Cortex Gates deep learning models to understand sensory cortex. Nature Neu- Vibrissal Responses in a Thalamocortical Projection Pathway. roscience, 19(3):356–365. Neuron, 56(4):714–725. 157Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seib- 148van Wijngaarden, J. B. G., Zucca, R., Finnigan, S., and Ver- ert, D., and DiCarlo, J. J. (2014). Performance-optimized hier- schure, P. F. M. J. (2016). The impact of cortical lesions on archical models predict neural responses in higher visual cortex. thalamo-cortical network dynamics after acute ischaemic stroke: Proceedings of the National Academy of Sciences, 111(23):8619– A combined experimental and theoretical study. PLOS Com- 8624. putational Biology, 12(8):1–16. 158Zhou, H., Schafer, R. J., and Desimone, R. (2016). Pulvinar- 149Viaene, A. N., Petrof, I., and Sherman, S. M. (2011a). Prop- cortex interactions in vision and attention. Neuron, 89(1):209– erties of the thalamic projection from the posterior medial 220. nucleus to primary and secondary somatosensory cortices in 159Zikopoulos, B. and Barbas, H. (2007). Parallel driving and mod- the mouse. Proceedings of the National Academy of Sciences, ulatory pathways link the prefrontal cortex and thalamus. PLOS 108(44):18156–18161. ONE, 2(9):1–19. 150Viaene, A. N., Petrof, I., and Sherman, S. M. (2011b). Synaptic