Toward a Neurobiologically Informed Framework

MH6445 Year 2 Progress Report 1 Toward a Neurobiologically Informed Framework for Modeling Human Cognition

Interdisciplinary Behavioral Science Center Grant P50-MH64445

Year 2 Progress Report May 2004

Our Interdisciplinary Behavioral Science Center is dedicated to the integration of principles and findings from neuroscience into an evolving theoretical framework for modeling human cognition. This framework---parallel distributed processing---emerged in the 1980's as an alternative to modular and symbolic approaches. The framework treats human cognition essentially as a set of emergent phenomena, arising from the interactions of many simple processing units, and accounts naturally for the inherently adaptive and context sensitive character of human cognitive function. Work within this framework over the past 20 years has led to explicit models that provide detailed accounts of many aspects of cognitive function. These models have provided insights into the role of context in perception, the basis of human sensitivity to systematic structure in experience, and the functional organization of a wide range of cognitive processes, including aspects of visual perception, attention, memory, language, and the control of cognition and action. There are, however, gaps in our understanding of the mechanisms, and there are controversies concerning the mechanistic basis of many specific aspects of cognition. There are three general aims of our research:

Center Aim 1. Explicit models embodying emergent accounts of cognition. An overall aim of our Center is to investigate the utility of treating cognition as a set of emergent phenomena. The approach stands in stark contrast to the modular view, in which cognition arises from a simpler concatenation of separate processes that can be encapsulated within functionally isolable subsystems. One reason for the appeal of the modular view is that it is conceptually tractable; the alternative that we offer is more complex and less intuitive. To make progress within our approach, it is necessary to formulate explicit accounts of particular target phenomena, implement these accounts in computational models, compare them to experimental data, and evaluate their successes and failures. It is only in this way that we can assess the accuracy of a specific account and understand how it can be improved upon.

Center Aim 2. Principles of learning, processing, and representation. Our framework relies on principles characterizing processing, representation, and learning which require further specification. We will place particular emphasis on the principles of learning, which in our framework occurs via connection adjustment. For example, we will re-consider the use of Hebbian learning rules, which are in line with considerable biological evidence but which had previously been neglected in our framework due to apparent computational limitations. We will also focus on the assumptions underlying the principles of processing and representation, including the use of distributed representations in which many processing elements participate in the representation of a particular cognitive entity; the organization of excitatory and inhibitory interactions among the processing elements; and the presence of intrinsic variability in these processes. MH6445 Year 2 Progress Report 2

Center Aim 3. Incorporation of insights from neuroscience. In the past, the work of our group has concentrated on developing explicit computational models and assessing them against behavioral evidence. An important new direction is to incorporate neuroscience into the framework across all of the projects in our Center. Relative to other modeling approaches addressing human cognition, our framework is ideally suited to this development, since the basic elements of our models are analogous to the neurons and synapses that form the substrate of information processing in the brain. This new direction will take three forms: (1) We will draw on neuroscience to inform the specification of the principles and to constrain the development of models of specific phenomena; (2) we will compare the performance of the models to neurobiological as well as behavioral data; and (3) we will carry out experiments using neuroscience methods to test predictions of the models and to provide additional data that will shape future model development.

These aims will be addressed in eight projects. Each of the first seven designates a specific domain or aspect of human cognitive function, while the eighth examines the theoretical foundations of the framework itself. All projects are guided by observations arising from both behavior and neuroscience; some involve experimental investigations while others focus mostly on modeling but are coordinated with ongoing experimental investigations funded outside the Center. The eight sections that follow address progress in each of these projects. In addition to the eight projects, out Center also has a section that provides basic administrative functions, shared computing resources, bridging across projects, and interdisciplinary research training for young investigators. A final section reports on the activities in the Core, emphasizing the progress of an interdisciplinary research trainee therein supported. MH6445 Year 2 Progress Report 3

Project 1: Functional and Neural Organization of Semantic Memory

Investigators: David C. Plaut, Karalyn Patterson, Cathy J. Price, Matthew A. Lambon Ralph

This project is directed at understanding how the brain represents and processes conceptual knowledge about words, objects, actions, and people, through the coordinated use of functional neuroimaging, behavioral studies of brain-damaged patients and normal subjects, and computational modeling. We adopt the perspective that the fundamental role of semantic knowledge is to mediate among perceptual, linguistic, and action-based representations, abstracting away from the surface structure of each modality to capture the underlying relationships among entities in the world. Considerable evidence suggests that semantics is not a single, undifferentiated system; rather, a number of brain regions---largely within the temporal lobe but extending into parietal and frontal areas in some respects---appear to be partially specialized for different types of semantic information. Moreover, different categories of entities (e.g., animals vs. artifacts, objects vs. people) place distinctive demands on various parts of the system. This evidence has classically been interpreted as implicating modality- and/or category- specific modules within semantics. Our contrasting hypothesis, drawing on all three Center Aims, is that the relevant functional specialization within semantics is far more graded (Center Aim 1), emerging through the operation of two types of constraints on learning: 1) differences in the nature of the statistical structure among the surface representations of various categories of entities (Center Aim 2); and 2) differences in connectivity among brain areas, both within and between hemispheres (Center Aim 3).

We have instantiated some aspects of this perspective in PDP models that account for a variety of empirical findings. Extending the approach into a comprehensive account of semantic memory requires tightly coordinated empirical and computational work to determine more precisely the contribution of different brain areas to representing different types of semantic information, and to specify how this information interacts to support performance across a wide range of semantic tasks. Accordingly, the project has three specific aims: 1) to advance our understanding of the contributions of each cerebral hemisphere, and of specific cortical areas within each hemisphere, to semantic memory; 2) to advance our understanding of the extent and nature of modality- and category-specific organization in semantic memory; and 3) to develop a neuroanatomically constrained simulation model of the organization of semantic memory.

Work during the second year of the project focused primarily on Aims 1 and 2; subsections below are organized by primary methodology, and descriptions of future work are integrated with the summaries of related current work.

Functional Neuroimaging

A PET study with normal adults was designed to address two issues in current focus in the study of semantic memory. First, how are we to provide a sensible account of the various findings on category specificity---the most prominent of these being a significant advantage in some semantically impaired patients for knowledge/naming of manmade artefact concepts relative to natural-kind concepts? Second, how are we to resolve the apparent neuroanatomical discrepancy between findings from neuropsychology and from functional imaging with respect to the role of the temporal pole? Strikingly consistent results from patients with semantic dementia would lead one to conclude that a single (bilateral) region of anterior temporal cortex is critical for representation and processing of semantic knowledge across all stimulus modalities and all types MH6445 Year 2 Progress Report 4 of conceptual knowledge. From most functional imaging research on normal adults, by contrast, one would conclude that semantic memory relies on no single common region, and further that the widely distributed network responsible for different aspects of semantic processing is left- lateralized and includes posterior temporal and frontal cortex, but not the anterior temporal lobe.

This study has resulted in two articles. The first (Rogers, Hocking, Mechelli, Patterson, & Price, submitted) reports robust animal-specific activation in the lateral posterior fusiform gyri when stimuli were categorized at an intermediate level of specificity (e.g. dog or car), but equal activation for animals and vehicles when the same photographs were categorized at a more specific level (e.g. Labrador or BMW). The conclusion is that lateral posterior fusiform does not encode domain-specific representations of animals, or visual properties characteristic of animals. Instead, these regions are strongly activated whenever an item must be discriminated from many close visual or semantic competitors. Apparent category effects arise because, at an intermediate level of specificity, animals have more visual and semantic competitors than do artefacts.

The second article (Rogers, Hocking, Nopenney, Mechelli, Gorno-Tempini, Patterson, & Price, in preparation) reports that the anterior temporal lobes are most strongly activated by identification of concepts at a specific level. Specific relative to general categorization activated anterior temporal cortices bilaterally, despite matching of these conditions for difficulty. Critically, these activations corresponded exactly to the site of maximal atrophy in patients with relatively pure semantic impairment, and also to regions commonly activated by the recognition of individual faces.

The conclusions of this second article are further supported by the findings of another PET study, performed in Cambridge and partially supported by the IBSC grant (Kellenbach, Hovius, & Patterson, in press). Specifically, in direct comparisons between three tasks (judgments regarding object structure, typical object color or associative knowledge about the object), the associative decision task selectively activated the left anterior middle/superior temporal gyrus and temporal pole relative to both object structure and color, and also the homologous right temporal pole relative to color only.

A number of functional imaging studies have identified category-selective regions that respond preferentially to faces versus other objects---consistent with reports of category-specific deficits in brain-damaged patients. However, the neuronal interactions which mediate such category- selective activations are currently unknown. We investigated the extent to which these patterns of activation are modulated by bottom-up or top-down mechanisms, by combining functional Magnetic Resonance Imaging and Dynamic Causal Modeling (DCM). DCM is a newly developed analysis technique which can be used to estimate, and make inferences about, the influence that one region exerts over another and how this is affected by experimental changes (Friston, Harrison, & Penny, 2003; Penny et al., in press).

In one study (Mechelli, Price, Noppeney, and Friston, 2003), we investigated the neuronal interactions that mediate category-selective activity during passive viewing and delayed matching tasks of faces and objects in 6 subjects. Category effects in occipito-temporal cortex were mediated by content-sensitive forward connections from early visual areas, suggesting that category-selective patterns of activation in the ventral pathway are engendered by the visual input. As pictures of faces and other objects contain different visual attributes, the functional MH6445 Year 2 Progress Report 5 specialization observed in occipito-temporal cortex is likely to be the result of a hierarchical, bottom-up, "feature" analysis.

In another study (Mechelli, Price, Friston, and Ishai, in press), we investigated the neuronal interactions that mediate category-selective activations during passive viewing and visual imagery in 5 subjects. Our results confirmed that, during visual perception, category-selective patterns of activation in extrastriate cortex are mediated by content-sensitive forward connections from early visual areas. However, during visual imagery, category-selective activations were mediated by content-sensitive backward connections from prefrontal cortex. Thus, our investigation revealed that the neuronal interactions which mediate category effects in the ventral occipito-temporal cortex are task-dependent---a finding that needs to be embraced by theories of category selectivity.

While the present studies helped us characterize the neuronal interactions that mediate category- effects, the nature of the information processed in the category-selective regions is still debated. Future studies will therefore investigate whether category-selective activations reflect domain- specific representations, or whether they arise from secondary differences between categories. We have previously demonstrated that, in the lateral posterior fusiform cortex, greater activation for animals relative to tools does not reflect animal-specific representations but arises because, at an intermediate level of specificity, animals have more visual and semantic competitors than do artefacts (Rogers et al., submitted). Similarly, we will be testing the hypothesis that category- selective activations in other occipito-temporal regions reflect relative differences in one or more visual or semantic dimensions. This will require comparing different categories (e.g. animals and tools) while varying a number of variables of interest such as visual complexity, frequency, familiarity, typicality, age of acquisition and semantic relevance.

Behavioral Studies of Brain-Damaged Patients

An important yet puzzling aspect of category-specific semantic impairments is the degree to which they vary as a function of etiology, even among patients with comparable location and severity of brain damage. We are pursuing our examination of this issue via detailed comparisons of semantic dementia (SD) with both Herpes Simplex Viral Encephalitis (HSVE) and with transcortical sensory aphasia (TSA).

We have now recruited 23 patients with HSVE of whom 8 demonstrate measurable semantic impairment on standard tests. Of these, 7/8 exhibit a significant category-specific effect: 6 in the typical direction and 1 with relatively better knowledge of living than artefact concepts. A paper based on this single case of artefacts > natural kinds has been submitted for publication (Lowe, Knapp, & Lambon Ralph, submitted). Our future work on this comparative study will ask a) what are the critical lesions/damage for semantic impairment and/or category-specificity? b) how do these map onto the Rogers et al. (2004) PDP model of semantic memory? and c) how does the semantic impairment and its consequential deficits (e.g., naming) compare between HSVE and SD?

As noted above, our studies of semantic dementia suggest that the anterior, temporal lobes bilaterally are critical in the formation and representations of amodal semantic representations. The syndrome associated with semantic impairments after stroke, however---transcortical sensory aphasia---is typically observed following either a left temporo-parietal or, more rarely, a left frontal infarction. We have begun to recruit stroke patients with measurable semantic MH6445 Year 2 Progress Report 6 impairment to compare with semantic dementia. (SD) Patients are selected if they are impaired on nonverbal comprehension tests (picture Pyramids and Palm Trees) but not if they demonstrate a verbal-only comprehension impairment. We have currently recruited 12 such patients, of whom half would fit a TSA classification (i.e. preserved word repetition) and the other half would be better classified as Wernicke's or global aphasia. As in our comparison between SD and HSVE, we will investigate the distributions of lesions in these stroke patients and compare each cases series directly to SD, using the same neuropsychological assessments. This comparison will also be important in Project 2, where we are investigating the impact of semantic impairment on language tasks such as reading, repetition, verbal short-term memory, etc.

We have also recently completed a first set of studies on the status of color knowledge in semantic dementia. The cortical regions that support color perception and color imagery are located in the posterior ventral temporal cortex, in areas that appear to be structurally and metabolically intact in SD. The assessment of color knowledge in this patient group thus provides a strong test of our hypothesis that anterior temporal lobe regions critically support knowledge for all different kinds of attributes, including modality-specific attributes represented elsewhere in cortex. Patients were tested on coloring black-and-white drawing of common objects, selecting which of two line drawings was colored correctly, and color naming. The results suggest that knowledge of object colors and color names is, indeed, susceptible to generalized semantic impairment, and that typical or domain-general aspects of color knowledge are somewhat more robust to this impairment than atypical or idiosyncratic aspects. This work is awaiting write-up, and we are planning a second series of tests to assess color perception, color categorization, and judgments of similarity to determine whether color concepts themselves are degraded in SD.

The ways in which patients with SD over-generalize towards more typical aspects of domain knowledge have been extensively documented, but the ways in which they under-generalize may also be revealing about the structure of conceptual knowledge. The best known example is perhaps the fact that, as semantic memory deteriorates, SD patients may continue to use their own specific versions of familiar objects in an appropriate fashion but will no longer recognize an equally good but unfamiliar instance (Bozeat et al., 2002). Other observations suggest to us another kind of under-generalization. For example, SD patients' have an extremely impoverished ability to learn short sets of words, tested by free recall or even recognition, and even if the words are still relatively "known" to the individual patient (Graham et al., 2002). This contrasts interestingly with their rather impressive ability to remember phonological sequences. Similarly, SD patients seem able to re-learn old or even learn new words (names of things) belonging to a category for purposes of performing category fluency; but only if they can reproduce them in the same fixed sequence (Graham et al., 1999). We are planning a series of experiments, involving both pictures/objects and words, to investigate the nature of this type of under-generalization.

More generally, in behavioral and neuroimaging experiments with both patients and normal subjects and also in some modeling work, we plan to explore the versatile task of category fluency. From the point of view of theoretical aspects of semantic memory, the advantage of this task is that it can be performed both for different types of concepts (e.g., living vs. manmade; natural taxonomic categories like birds or fruits vs. ad hoc categories like "things you can put in your pocket"; verbs or actions as well as nouns or names of things) and also at different levels of specificity (animals vs. water creatures vs. fish; vehicles vs. makes of car). From the point of MH6445 Year 2 Progress Report 7 modeling, the task represents a challenge in that---unlike most of the semantic tasks that we have worked with to date, in which each "trial" or output is essentially an independent event such as naming or sorting---category fluency is a task that unfolds over time and requires a considerable degree of executive function and working memory as well as semantic knowledge. The imaging studies should give us another opportunity to explore aspects of our hypotheses about a widespread network of areas that contribute to semantic processing, but an essential role for anterior temporal structures, especially with increasing demands for specificity.

Behavioral Studies of Normal Subjects

Because the level or specificity of the conceptual knowledge required to perform a semantic task seems to have such a pervasive impact, we have been attempting to further our understanding of this factor from several different perspectives. In addition to the imaging studies described above, we have completed a behavioral experiment addressed to the following apparent puzzle: the performance of semantic dementia patients in sorting or naming objects decreases monotonically with increasing level of specificity (i.e., superordinate > basic > subordinate), whereas normal subjects show an advantage for the basic level with making category verification judgments. Rogers and McClelland (in press) suggested that the superordinate advantage shown by SD patients is due to the fact that their degraded semantic representations are insufficiently precise to support basic-level responses while still enabling more general superordinate distinctions. This account suggests that healthy controls should show an SD-like pattern of behavior (i.e., most accurate categorization at general relative to more specific levels), if they can be made to respond before the semantic system has fully settled, when its internal state is within the right general region but has not yet arrived at the correct specific pattern. To test this idea, we used a version of the tempo-naming procedure (Kello & Plaut, 2000) applied to category verification. We predicted that the basic-level advantage displayed by normal individuals when asked to make rapid judgments with no response deadline would be replaced by a superordinate- level advantage (as shown in accuracy by SD patients) if they were required by the tempo procedure to make more rapid decisions. The results confirmed this prediction and are awaiting writing-up for publication.

We also have completed the first in a series of experiments using eye-tracking to explore the perceptual processing of stimuli on which participants are to make semantic judgments. The experiment employs the same procedure described above of category verification at three different levels; the addition of eye-tracking will enable us to assess whether patterns of eye movements in looking at a picture of (for example) a robin differ as a function of the judgment that the participant is about to make, viz. to decide that it is an animal (superordinate), a bird (basic) or a robin (superordinate). The technical challenges of this study---both to collect and to analyze the data---have required many months of trial and error and re-trial; but they are now largely solved, and data analysis from the first study is well under way. If and when we have established reliable differences in patterns of normal eye movements as a function of semantic level, our plan is to test several patients with mild SD. Our hypothesis is that the degradation of conceptual knowledge in SD will disrupt the structured patterns of object inspection that naturally (and unconsciously) occur in normal individuals in the service of different kinds of semantic judgments.

A final study (in collaboration with Faraneh Vargha Khadem and Mortimer Mishkin) employs an artificial learning environment, in which participants learn about the visual appearances, categories and names of satellites constructed to vary in typicality of their classes (scientific, MH6445 Year 2 Progress Report 8 communications or military satellites). This is another project that required a long gestation period of design and pilot testing. Furthermore, because it involves learning a whole new domain of knowledge, which is tested in a number of different ways at various points during learning, it requires multiple experimental sessions per participant and is therefore a considerably time-demanding paradigm. We are, however, excited about the potential of this study to inform us about various aspects of the learning and representation of conceptual knowledge. Our plan is to test small groups of participants from four different populations: young normal, older normal, patients with developmental amnesia (DA), patients with SD. The DA component of this venture is a particular focus of interest because one of the goals of our program of research on semantic memory is to understand how these young patients, with substantial early reduction of hippocampal volume resulting in substantial anterograde amnesia for episodes, nevertheless appear to acquire a rather normal fund of conceptual knowledge. Data on the satellite-learning task have now been collected from 8 normal individuals (4 young and 4 older), and demonstrate dramatic effects of both typicality and age. We are about to embark on the first testing with DA cases.

Computational Modeling

The primary focus of effort in the project to date has been empirical; planned computational modeling, building on our past efforts (Lambon Ralph et al., 2001; Plaut, 2002; Rogers et al., 2004; Rogers & McClelland, in press), will incorporate insights from this empirical work concerning the relative contribution of left- and right-hemisphere semantics and the finer-grained distinctions among brain areas within each hemisphere. We will employ realistically structured surface representations for perceptual, linguistic, and action-based knowledge based on our previous analyses of the structure of these domains. We will also evaluate the implications of network architecture, processing, and learning principles to explore their impact on the emergence of graded modality- and category-specificity within learned internal semantic representations.

Supported Publications

Coccia, M., Bartolini, M., Luzzi, S., Provinciali, L., & Lambon Ralph, M. A. (in press). Semantic memory is an amodal, dynamic system: Evidence from the interaction of naming and object use in semantic dementia. Cognitive Neuropsychology.

Lambon Ralph, M. A., Patterson, K., Garrard, P., & Hodges, J. R. (2003). Semantic dementia with category specificity: A comparative case-series study. Cognitive Neuropsychology, 20, 307- 326.

Rogers, T. T., Lambon Ralph, M. A., Garrard, P., Bozeat, S., McClelland, J. L., Hodges, J. R., & Patterson, K. (2004). The structure and deterioration of semantic memory: A neuropsychological and computational investigation. Psychological Review, 111, 205-235.

Rogers, T. T., Lambon Ralph, M. A., Hodges, J., & Patterson, K. (2003). Object recognition under semantic impairment: The effects of conceptual regularities on perceptual decisions. Language and Cognitive Processes, 18, 625-662. MH6445 Year 2 Progress Report 9

Rogers, T. T., Lambon Ralph, M. A., Hodges, J. R., & Patterson, K. (2004). Natural selection: The impact of semantic impairment on lexical and object decision. Cognitive Neuropsychology, 21, 331-352. MH6445 Year 2 Progress Report 10

Project 2: Interactive processes underlying language: Lexical processing

Investigators: David C. Plaut, Karalyn Patterson, Cathy J. Price, Matthew A. Lambon Ralph

The principal aim of this project is to demonstrate that a variety of lexical processes (e.g., reading, past tense verb morphology, language production, verbal short-term memory) are all based on predictable interactions amongst more basic functions in dedicated brain regions (Centre Aim 1) including semantic memory and phonology. The impact of conceptual knowledge on lexical processes links these studies closely to Project 1. In addition, we are also exploring how plasticity-related changes after brain damage may reorganise the division of labour between the primary systems (computational modelling work linking to Project 7: cf. Centre Aim 2). During the past year behavioural data have been collected from both stroke and neurodegenerative patient populations as well as normal participant - in English and Japanese subjects, related neuroimaging studies have been conducted in normal subjects, and various aspects explored using computational models. This combination has allowed us to apply both behavioural and neuroscience methods in order to tackle this project (Centre Aim 3). Progress and future plans are summarised below in relation to a number of language activities.

Reading

Neuroimaging. In a first study, we compared the neural correlates of words with regular spellings (e.g. lamp), words with irregular spellings (e.g. yacht), and pseudo words (e.g. ledan), using functional MRI on healthy right-handed 22 subjects. Reading words and pseudo words relative to viewing false fonts increased activity in a left-lateralized network including the inferior prefrontal cortex, the superior temporal cortex and the anterior fusiform gyrus - consistent with previous studies. Three distinct regions, within the left inferior prefrontal cortex, expressed differential patterns of activation: the pars triangularis showed increased activation for irregular relative to pseudo words; the dorsal premotor cortex showed increased activation for pseudo relative to irregular words; the pars opercularis showed increased activation for irregular relative to regular words and for pseudo relative to regular words. The double dissociation between irregular and pseudo words, which activated the pars triangularis and the dorsal premotor cortex respectively, parallels the double dissociation between semantic and phonological processing previously reported in the very same areas (Mummery et al. 1999; Roskies et al. 2001; Devlin et al. 2003). Thus, our results are consistent with cognitive theories that have proposed that reading pseudo words increases phonological processing, whereas reading irregularly spelled words increases semantic processing (Plaut et al. 1996). In addition, the finding that the pars opercularis shows greater activity for irregular words and pseudo words, is consistent with previous reports that this region is sensitive to the difficulty or time required to read the items (Fiez et al. 1999). However, it is currently unknown whether any of the areas involved in word and pseudo word reading are specific to orthographic stimuli. For instance, previous functional imaging studies have identified a common network for reading words and naming pictures (Price and Devlin, 2003).

We performed a second study to test whether, within the common network cited above, different neuronal populations respond to words and pictures. This was achieved using repetition priming: when two similar stimuli are presented within a short interval, reduced neuronal response is observed for the second ("target") relative to the first ("prime"). Neuronal populations involved in reading but not picture naming might therefore be primed with words (but not pictures). Conversely, neuronal populations involved in picture naming but not reading might be primed MH6445 Year 2 Progress Report 11 with pictures (but not words). Each subject was presented with a prime for 600 milliseconds, followed by a fixation cross for 200 milliseconds and the target for another 600 milliseconds. Each stimulus was either a word or a picture, resulting in 4 types of pairs: word-word, picture- picture, word-picture and picture-word. Sixteen healthy right-handed subjects were scanned while reading/naming the words/pictures overtly as soon as they appeared on the screen. We are now performing the analysis of the data to identify neuronal populations involved in either reading or naming specifically. In addition, the prime and target referred to items which were either semantically related (e.g. KEY-lock), phonologically related (e.g. rolling pin - ROLLER SKATE), identical (e.g. dog-DOG) or unrelated (e.g. RADIO-glove). Thus, this study may allow us to identify semantic and phonological activations within the reading system. For instance, neuronal population involved in semantic but not phonological processing might be primed with semantically (but not phonologically) related items. Conversely, neuronal population involved in phonological but not semantic processing might be primed with phonologically (but not semantically) related items.

While the present studies helped us identify functionally segregated areas within the reading system, little is known about how these regions interact with each other. Future directions will therefore include the investigation of functional integration within the reading system. Specifically, we will be testing a number of hypotheses which are motivated by the neuropsychological literature as well as previous functional imaging studies (Price et al. 2003). For instance, is the increased activation for irregular words in the pars triangularis mediated by increased connectivity from anterior fusiform areas? Conversely, is the increased activation for pseudowords in the dorsal premotor cortex mediated by increased connectivity from posterior fusiform areas? In addition, we will be testing the hypothesis that the frontal and temporal regions involved in both reading and naming may express different functional integration during the two tasks. These hypotheses will be addressed by combining functional MRI data with the analytical technique Dynamic Causal Modelling (Friston et al. 2003).

Cross-language neuropsychology. One aim of Project 2 was to test the applicability of our primary-systems/connectionist approach to languages other than English. The association between semantic dementia and surface dyslexia is an almost perfect one for English speaking patients. A recent study investigated the impact of impaired semantic representations on reading in a Japanese patient (Fushimi et al., 2003 - see abstract below for details). As predicted, the degree of consistency of character-sound correspondences affected his performance on both words and nonwords in a graded manner as did the frequency of the words.

Computational modeling. The IBSC has also partially funded explorations on the impact of plasticity-related changes on language processes. Initially, this has used computational models of reading aloud to investigate how the remaining parts of the reading system can adjust to improve overall reading performance. We have attempted to mimic spontaneous recovery by taking a fully-trained reading model (a single-route model that translations orthography directly to phonology), damaging it through removal of hidden units or connections, and then re-exposing it to the full training set (Welbourne & Lambon Ralph, under revision - abstract below). After initial damage this model, like previous explorations, produced a global dyslexia with the performance on all types of word and nonword affected to very similar degrees. Interestingly, after a period of simulated recovery, the model's behaviour crystallises out into a pattern of surface dyslexia, very similar to that observed in the neuropsychological literature. This initial exploration highlights a number of key factors: (1) spontaneous recovery may reflect, in part, the internal reorganisation of computational resources/function (cf. plasticity-related changes - MH6445 Year 2 Progress Report 12

Project 7); (2) the emergent pattern of surface dyslexia reflects the interaction of plasticity- related changes with the intrinsic difficulties of the domain in question (low frequency, inconsistent words benefit the least), which is another key principle of Project 7 (and Centre Aim 2); (3) the model suggests that the neuropsychological profile observed in patients reflects not only the pattern of damage to underlying cognitive/language systems but also the influence of recovery on these systems and the interaction between them. In this specific case, the behavioural dissociation between regular and irregular word reading is produced through recovery alone and does not reflect a pre-existing modular distinction (Centre Aim 1) as expected from the assumptions underlying classical neuropsychology (transparency and subtractivity).

This framework has also allowed us to investigate the potential impact of deliberate intervention during recovery (simulated speech therapy). We have attempted to do this by exposing the recovering network not only to the standard set of training patterns with their associated natural frequencies but also to a small subset of training patterns presented with very high frequency, analogous to a set of items used in sessions of speech therapy (see Welbourne & Lambon Ralph, submitted - abstract below). This work showed (a) that intervention produced benefits over and above that observed in spontaneous recovery alone; (b) that regular words produced better generalisation than irregular words; and (c) that early intervention led to better performance than late intervention. The timing of simulated therapy seems to be closely related to the plasticity- related changes that underpin age-of-acquisition effects in normal participants (see Project 7).

Future work in this aspect of the project will explore recovery and interventions in a full model of reading that also incorporates semantic representations. The inclusion of these representations is a major technical issue. Even when the form of the semantic representations has been decided, the translation of orthography/phonology to semantics is inherently difficult and time-consuming to train. Moreover, the introduction of semantic representations requires decisions regarding the architecture within the model. In a series of very small simulations, we are currently exploring the impact of connecting semantics to phonology and orthography in a variety of different ways. These include the "triangle" framework (containing all three possible pairwise connections between the three representations types) and the "junction" model (the three representations are connected through a set of shared hidden units).

Past tense morphology

Neuropsychology. The Bird et al (2003) study on past-tense verb processing in phonologically impaired anterior aphasic patients, published last year, is a central component of IBSC Project 2. Taken with Patterson et al. (2001), these two papers demonstrated that the differences observed between regular and irregular verbs are consistent with the impact of semantic and phonological representations on this specific language activity (Centre Aim 1). The impact of phonological representations on past tense morphology, especially in relation to regular verbs, has been explored in two recent studies (partially supported by the IBSC). The first used an analysis of the patient errors (from Bird et al., 2003) to explore the nature of the patients’ phonological impairment (Braber et al., submitted – abstract below). This highlighted phonological complexity (in terms of CVC structure) as a critical aspect of the patients’ impairment. To explore this possibility further, an additional study was conducted that compared a new set of Broca aphasic patients for whom the degree of irregular > regular difference varied (Lambon Ralph et al., submitted – abstract below). The patients who demonstrated the largest regularity difference also exhibited the largest effect of CVC complexity on repetition accuracy as well as the greatest MH6445 Year 2 Progress Report 13 impact of syllabic stress pattern on both repetition and picture naming. In contrast, the impact of delay on repetition performance was not related to the size of the regularity effect. It would appear, therefore, that the critical factor is the patients’ ability to deal with atypical phonological patterns (in terms of CVC or stress) whereas the maintenance of phonological activation – which was impaired across all patients – is an unrelated aspect of their phonological impairment.

Future studies will tackle two issues. First, as noted in all of our recent papers on verb inflection, our phonological hypothesis is specifically geared to an understanding of the irregular > regular pattern in nonfluent aphasia; it does not explain why the patients are generally so poor at tasks concerning tense. We will investigate whether this is a semantic, syntactic, or specifically ‘morphological’ deficit. Secondly, we (and other researchers – e.g., Tyler et al., 2002) have now demonstrated that nonfluent aphasics are very impaired at judging that the spoken stem and past- tense forms of regular verbs (e.g. “press” and “pressed”) are different. Several important aspects of this phenomenon still need to be resolved: (a) In our study, which measured accuracy only, the patients were equally impaired at “press/pressed” and “chess/chest”. In Tyler’s study, which included RT measures as well, accuracy did not differ significantly between these two critical conditions, but RTs did: the patients, when correct, were faster to indicate their detection of the non-morphological difference [“chess/chest”]. We need to follow up this point with RT data based on our own materials; (b) If we replicate the Tyler et al. finding, or even if we do not: more evidence is needed to try to determine whether the patients actually do not hear the auditory difference between “press” and “pressed”, or whether – at a subsequent stage – they fail to interpret an auditorily perceived difference as indicating two different words. We are hoping to use an EEG MMN (Mis-Match Negativity) paradigm [in collaboration with Dr Friedemann Pulvermuller] to test these two alternatives.

Cross-language studies. Like reading (see above), one aim for Project 2 was to extend our primary- systems/connectionist approach to other languages. Initial investigations of verb morphology in Japanese suggest that, like English, this language domain can be explained in terms of the interaction between phonology and semantics (Fushimi et al., submitted - see abstract below for details). Future studies will investigate verb inflection in Japanese patients with either semantic or phonological deficits, to test the generality of our findings in English.

Language production. We have begun to use comparisons between stroke and progressive aphasia to investigate the range of primary brain systems that are implicated in language production. In work partially funded through the IBSC, we have been exploring the nature of the production deficits observed in patients with progressive nonfluent aphasia as opposed to those seen in patients with Broca's aphasia after stroke. Unlike their stroke counterparts, the progressive patients are influenced by the nature of the production task - they are much better (in terms of accuracy and fluency) when required to produce words in naturally- constrained tasks such as single-word repetition and reading than in more open situations such as spontaneous speech or text reading (for details see: Graham et al, in press; Patterson et al., submitted - abstracts below).

Verbal short-term memory. In all the language activities reviewed above, evidence for the role of semantic memory can be found. In the domain of verbal short-term memory, however, the loss of conceptual representations is particularly obvious in that patients produce numerous phoneme migration and substitution errors across the list of words to be recalled (e.g., "mug, dog, bed" ? "bug, dog, med"). Following on from earlier work by Patterson and colleagues, we have been exploring the role of semantic and phonological representations in verbal short-term memory MH6445 Year 2 Progress Report 14

(partially-funded by the IBSC). The details of these numerous studies can be found in the abstracts below (see Jefferies et al., various). To summarise briefly, we have been able to demonstrate that vSTM performance is governed by the patients' degree of semantic impairment. Although all SD patients reported in the literature do produce these characteristic errors in vSTM, not all show the predicted difference between relatively well-known versus degraded concepts. Much, if not all, of this inconsistency appears to be related to the set size of words from which lists were constructed in these studies. The known-degraded difference is strongest when lists are formed from a large number of different words.

The phonological errors produced in this task are reminiscent of aphasic patients with predominantly phonological impairments. This raises the question as to whether the SD patients do, in fact, have subtle phonological impairment in addition to their primary semantic deficit. In a series of recent studies we have been able to demonstrate that the known-degraded difference in vSTM is independent of the patients' ability on classical phonological test (e.g., rhyme judgement, nonword repetition, phoneme awareness tasks) and, furthermore, that when patients do fall down on these assessments it may be related to the impact of semantic impairment on their phonological representations.

In addition to studies of vSTM in SD patients, we have also extended the work into studies of normal participants. Although there is some evidence for the impact of semantic impairment on normal vSTM it tends to be rather small and thus hard to measure. We have been using a new mixed word-nonword paradigm in both immediate serial recall and matching tasks to explore the influence of semantic factors on normal subjects. In such circumstances, normal subjects make phonological errors just like those observed in semantic dementia not only on the nonwords but also on the words. Likewise they are less able to detect phoneme movements in matching span tasks if the lists contain both words and nonwords. As we would predict, the participants' accuracy in these tasks is modulated by semantic factors such as imageability. The fact that these effects are found in recall and matching tasks suggests that semantics and phonology are interacting continually to support this language activity. They are less consistent with proposals that argue for a late-stage recall "redintegration" mechanism that gives rise to lexical-semantic effects in vSTM.

Supported Presentations and Publications Including Publications in Progress

Braber, N., Patterson, K., Ellis, K., & Lambon Ralph, M. A. (under review). The relationship between phonological and morphological deficits in Broca's aphasia: Further evidence from errors in verb inflection.

Fushimi, T., Komori, K., Ikeda, M., Patterson, K., Ijuin, M., & Tanabe, H. (2003). Surface dyslexia in a Japanese patient with semantic dementia: Evidence for similarity-based orthography-to- phonology translation., Neuropsychologia., 41, 1644-1658.

Fushimi, T., Patterson, K., Ijuin, M., Sakuma, N., Kureta, Y., Tanaka, M., Kondo, T., Amano, S., & Tatsumi, I. (in preparation). Inflecting Japanese Verbs: Two separate mechanisms or one graded system?

Graham, N. L., Patterson, K., & Hodges, J. R. (2004). When more yields less: speaking and writing deficits in nonfluent progressive aphasia. Neurocase, in press. MH6445 Year 2 Progress Report 15

Jefferies, E., Bateman, D., & Lambon Ralph, M. A. (submitted). The role of the temporal lobe semantic system in number knowledge: Evidence from late-stage semantic dementia. Neuropsychologia.

Jefferies, E., Frankish, C. R., & Lambon Ralph, M. A.. (submitted). Lexical and semantic binding in short-term memory: Evidence from normal recall and semantic dementia. Journal of Memory and Language.

Jefferies, E., Jones, R., Bateman, D., & Lambon Ralph, M. A. (in press). A semantic contribution to nonword recall? Evidence for intact phonological processes in semantic dementia. Cognitive Neuropsychology.

Jefferies, E., Jones, R., Bateman, D., & Lambon Ralph, M. A. (in press). When does word meaning affect immediate serial recall in semantic dementia? Cognitive and Affective Behavioural Neuroscience.

Jefferies, E., Patterson, K., Jones, R. W., Bateman, D., & Lambon Ralph, M. A. (2004). A category-specific advantage for numbers in verbal short-term memory: Evidence from semantic dementia. Neuropsychologia, 42, 639-660.

Lambon Ralph, M. A., Braber, N., McClelland, J. L., & Patterson, K. (in preparation). What underlies the neuropsychological pattern of irregular > regular past-tense verb production?

Patterson, K., Graham, N. L., Lambon Ralph, M. A., & Hodges, J. R. (submitted). Progressive non-fluent aphasia is not a progressive form of non-fluent (post-stroke) aphasia.

Welbourne, S. R., & Lambon Ralph, M. A. (under revision). Exploring the impact of plasticity- related recovery after brain damage in a connectionist model of single word reading. Cognitive, Affective & Behavioral Neuroscience.

Welbourne, S. R., & Lambon Ralph, M. A. (submitted). Towards a Platform for Modelling Rehabilitation in Patients with Acquired Dyslexia. Neuropsychological Rehabilitation. MH6445 Year 2 Progress Report 16

Project 3: Interactive Processes in Language: Sentence Processing

Maryellen MacDonald, Julie Fiez, Matt Lambon-Ralph, Karalyn Patterson, David Plaut

Specific Aims. This project investigates the role of phonological (acoustic- and/or articulatory- based) representations in language comprehension. The sentence-level focus of the project complements the lexical focus of Project 2. A specific aim of the work is to investigate the relationship between language comprehension processes and verbal working memory (WM). We pursue an emergent account of verbal WM (addressing Center Aim 1) and argue against the commonly held view (e.g., Just & Carpenter, 1992) that WM is a computational workspace in which language processing activities are executed. Instead, we see verbal WM as the activation of information during the course of language production and comprehension (McClelland & Elman, 1986), and these same processes can be invoked when someone is confronted with a WM task. For example, phonological information is activated in the course of planning of articulation in production, reading, auditory comprehension, and can also be used performing some working memory task with verbal material. The pursuit of this emergent view with direct reference to phonological information is motivated by previous claims that phonological information has little role in the comprehension process beyond initial word recognition; and this view is echoed in the working memory literature, in which phonological WM (e.g. Baddeley's '"phonological loop") is thought to have little role in normal comprehension processes. Thus the work has potential to reframe issues in both language comprehension and WM.

The project includes behavioral and fMRI studies of WM and sentence comprehension, investigations of patients with impaired WM and/or comprehension, and computational modeling. The project thus addresses Center Aim 3 (constraints from neuroscience), and a specific aim is to integrate data and modeling from these very different domains in a broad account of sentence comprehension and the role of phonological activation in it.

Progress in Years 1-2. Our original plan was to begin with behavioral studies, followed by neuroimaging studies, then studies of patients and computational modeling. We have not yet begun the modeling work and are just starting patient work, reported in Future Directions below. We decided to conduct the initial behavioral and imaging sentence comprehension studies in parallel, using a common set of sentences that varied in the phonological overlap among the words. In verbal WM tasks, phonological similarity among items typically creates interference and reduces recall rates. If phonological activation plays a central role in sentence comprehension, then we should similarly observe interference effects from phonological similarity in comprehension tasks.

Materials. Sentences containing object relative clauses were chosen as the structure within which to manipulate phonological overlap, because this structure well known for a clear locus of comprehension difficulty. We developed 24 pairs of experimental sentences as shown in (1) below. The relative clauses are identified with brackets; they are called object relative clauses because the noun being modified by the relative clause (e.g. player in (1a)) is the object of the action described in the clause. In the phonological overlap condition, two nouns (e.g. player and mayor) and two verbs (met and bet) have similar acoustic/articulatory properties. In the non- overlap condition, the first noun and first verb were changed to reduce this overlap-thus the conditions differ by only two words. Across conditions, the critical words were chosen to have equal frequency and length, and the sentences were matched for plausibility. All sentences were MH6445 Year 2 Progress Report 17 nine words long, and the two verbs in sentence (the typical site of comprehension difficulty) were always words 6-7.

1a. Phonological Overlap: The player [that the mayor met] bet the editor.

b. Non-overlap: The coach [that the mayor aided] bet the editor.

We developed "filler" sentences with varying syntactic constructions. Presentation of fillers helps obscure the repetition of the object relative structure in the experimental items. We built in three levels of difficulty in a subset of the fillers (12 sentences at each difficulty level) in order to assess performance and brain activation as a function of comprehension difficulty, without phonological overlap.

The sentence comprehension task was a standard one in which participants read sentences one word at a time by pressing a key to advance to each word. The dependent measures were reading time per word and accuracy on a yes/no sentence comprehension question that followed each sentence.

For the behavioral studies, MacDonald's graduate student Dan Acheson developed a serial word recall task in which half of the word lists had phonological overlap. The word sets in the overlap and non-overlap conditions were matched for word length, frequency, and imageability.

Preliminary Behavioral Results. We have tested 70 young adult participants on both the serial recall and sentence comprehension tasks. We intend to test at least another 30 in order to have sufficient numbers to examine individual differences in the two tasks.

Consistent with previous studies, serial word recall was reliably higher in the non-overlap compared to the overlap condition. We also found clear effects of overlap in sentence comprehension. As shown in Figure 1 (at the end of this section), reading times were significantly longer at the two verbs (words 6-7) in the overlap compared to non-overlap conditions. Comprehension question accuracy was also reliably lower for the overlap sentences than for the non-overlap sentences.

We interpret these results to indicate an important role of phonological information in sentence comprehension and speculate that this same phonological activation is recruited in verbal WM tasks. Investigation of individual differences will begin shortly and will address questions such as whether the size of the phonological overlap effect in the WM task predicts effect size in comprehension.

Neuroimaging Overview & Method. The goal of the neuroimaging studies is to explore the neural basis of the phonological effects observed in our behavioral studies. In our first study, we presented a variant of the sentence comprehension task during scanning, and in a separate run tested participants on a delayed serial recall (DSR) task requiring phonological rehearsal of nonwords. Given the effects of overlap in our behavioral tasks, we hypothesized that (a) regions implicated in phonological rehearsal would be active during sentence comprehension, and (b) these regions would show a sensitivity to manipulations of phonological overlap. Moreover, if the syntactic complexity manipulation in the fillers modulates demands on domain-general working memory resources, then (a) regions implicated in central executive processing (e.g., dorsolateral prefrontal cortex) would be active during sentence comprehension and would further MH6445 Year 2 Progress Report 18 show an increase in activity as comprehension difficulty increases, and (b) similar increases would be seen for both manipulations of syntactic complexity and phonological overlap.

Ten participants have been tested. Data were acquired using a 3 T magnet and an EPI sequence designed to maximize the BOLD response (TE=30, TR=3000, flip angle of 80 degrees) and that provided full-brain coverage (36 slices, with approximately 3 mm cubic voxels).

A total of 144 sentences were presented in six runs, using a slow event-related design in which each sentence presentation was treated as a single trial. Each run consisted of 24 experimental and filler items presented in random order. Each word was presented for 333 ms, resulting in a total presentation duration of three seconds per sentence.

In experimental trials, sentence presentation was directly followed by a 12-second rest period. We feared that attention to a comprehension question could interfere with the BOLD signal related to sentence comprehension, so comprehension questions (followed by a rest period) were presented for a subset of the filler items only.

In the DSR task, five pronounceable nonwords were visually presented for 2400 msec each, followed by a blank screen (400 msec), resulting in a total presentation time of 15 seconds. Participants actively rehearsed the items during a 9-second rehearsal interval. A visual cue was then given to initiate overt articulation of nonwords (6 seconds), followed by a resting period of 18 seconds. There were three runs with ten trials each.

Neuroimaging Results. We are in a preliminary stage of data analysis. We first compared regions of significant activation across tasks. In sentence comprehension, we contrasted filler sentences to a fixation baseline and found widespread activation along the superior temporal gyrus, extending into both anterior and posterior portions of Brodmann area (BA) 22. Posteriorly, additional foci of activity were also observed within the angular gyrus and the middle temporal gyrus. Frontally, there was robust activation in several subregions of Broca's area (BA 45, 44, and 47), and the supplementary motor area (SMA). These regions were also significantly active in the experimental object relative items. These findings are significant in their own right, because most prior imaging studies have focused on single word processing, of those that have investigated sentence processing most have addressed speech comprehension.

In the DSR task, we observed significant activation in dorsolateral prefrontal cortex, Broca's area, SMA, and the cerebellum, replicating previous work. Comparison of the regions of convergent activation across the two tasks reveals an interesting pattern, shown in Figure 2 at the end of this section. As hypothesized, the regions of common activation localize to areas associated with phonological processing, particularly speech production and articulatory rehearsal - Broca's area, SMA, and the cerebellum. Contrary to claims that sentence comprehension relies upon domain-general working memory resources, we did not find regions of common activation in areas associated with executive control and processing - specifically, while dorsolateral prefrontal cortex was engaged during our DSR task, it was not significantly active during sentence comprehension.

Our preliminary results of syntactic complexity within the filler sentences indicate that a number of regions, including the anterior superior temporal gyrus and an anterior region within Broca's area (at or near BA 47) were sensitive to these manipulations (Figure 3a). Interestingly, none of the regions showing sensitivity to syntactic complexity were active during DSR. Taken together MH6445 Year 2 Progress Report 19 with the convergence analysis described above, our results suggest that syntactic complexity effects may be neurally manifested within areas specifically involved in sentence comprehension, rather than affecting or recruiting more general processes (e.g., increasing demands on phonological processing or executive control).

We have not found robust effects of phonological overlap in the object relative sentences (Figure 3b). While disappointing, this result is consistent with previous inability to observe isolated effects of phonological similarity in DSR tasks. In a few regions, we have found a trend towards increased activity during the phonological overlap compared to the non-overlap condition. Interestingly, these increases have been found in regions that are also active during the DSR task. Such overlap would be consistent with our original hypothesis that general phonological representations support on-line sentence comprehension. As we proceed with data analyses, we will consider whether additional participants should be tested to increase power.

Future Plans

Behavioral work. We plan to submit a manuscript reporting the first behavioral study within six months. We will continue to examine the relationship between phonological overlap effects in WM and comprehension through examination of individual differences. We will also extend the phonological overlap effect to an additional sentence structure; the object relative materials were originally developed to be convertible (by rearranging some words) to another syntactic structure, subject relative clauses, which are typically somewhat easier to comprehend. We can therefore observe the effects of identical amounts of overlap on a different sentence type.

In work partially supported by the IBSC, MacDonald is investigating the relationship between sentence comprehension and production, including the role of phonological interference in sentence production tasks. This work is important to the IBSC in several respects: First, phonological interference effects in WM tasks most likely come from language production (not comprehension) processes, and thus an even better chance of relating phonological activation in WM and language processing may be through language production. Second, the continuing pursuit of a constraint-satisfaction account syntactic comprehension processes has led to a new integration of language comprehension and production research. In this account, comprehension preferences previously attributed to a syntactic comprehension module are instead traced to sensitivity to distributional patterns of word orders in language use. These distributional patterns are themselves traced to the unique demands of the language production process, in which a message to be conveyed must be converted into a serially-ordered set of elements (phrases, words, articulatory gestures, etc.) Ongoing and planned work investigates how the demands of production (including difficulty from phonological interference among to-be-uttered words) yields particular patterns of word order choices in speaking and writing, effectively yielding high- vs. low-frequency syntactic patterns. Key comprehension results can then be reanalyzed as basic frequency effects, and the frequencies themselves are traced to planning processes in production.

Neuroimaging. We will complete data analysis and prepare a report of our first study over the next six months. We also hope to extend our initial findings by developing a fast event-related protocol that will allow us to probe activity at different points during sentence processing. This may be useful in identifying temporally-localized effects of phonological overlap during comprehension. Additional studies will investigate the relationship between reading skill, working memory span, and phonological awareness. MH6445 Year 2 Progress Report 20

Patient studies. Our original proposal was to recruit patients who present with reduced phonological WM but apparently preserved sentence comprehension. We hypothesized that if phonological information were central to sentence comprehension, such patients should show comprehension impairments under careful testing. We are looking for such patients with the assistance of the Stroke Clinic at Hope Hospital, Manchester (UK). We expect this task to be a significant challenge on two grounds: (1) this specific dissociation is very rare; and (2) these patients have, by definition, few other cognitive and language impairments and thus will not seek speech pathology services. Fortunately, the stroke clinic runs a short neuropsychological assessment for all their patients, including phonological WM and single word comprehension measures. We are therefore planning to use these measures as a way to identify potentially suitable cases with whom we can complete more detailed neuropsychological assessments.

We will also investigate patients who show an association (rather than dissociation) between phonological WM and comprehension. Patients with progressive non-fluent aphasia, who are considered one of the subtypes of fronto-temporal dementia or FTD (Snowden et al., 1996), have a progressive deterioration of language fluency resulting from gradually increasing brain dysfunction in left frontal regions (Nestor et al., 2003). They invariably have reduced phonological WM and typically have measurable syntactic comprehension deficits on standard aphasia testing. Because the patients (sadly) deteriorate, these two putatively associated impairments can be tracked in parallel. Currently 8-10 cases of progressive non-fluent aphasia are being longitudinally followed in the Cambridge cohort of FTD patients. Next year, we will begin to test them on specially designed tasks of both phonological WM and language comprehension.

MH6445 Year 2 Progress Report 21

Figures For Project 3:

g

n 150 i d

a Or with Overlap e 100 ) R s

d

m OR without Overlap

( 50 e

t s s e u j 0 d m i A T

h -50 t g n

e -100 L 1 2 3 4 5 6 7 8 9 Word Position

Overlap : The player that the mayor met bet the editor. No overlap : The coach that the mayor aided bet the editor.

Figure 1. Length adjusted self-paced word reading times for Object-Relative (OR) with and without phonological overlap. The two verbs in the sentence are words 6-7. The overlap manipulation affects words 2 and 6 only. Y-axis shows length-adjusted residual reading times.

Figure 2. Overlapping activation for delayed serial recall (blue) and sentence comprehension (green). Overlap is seen in dorsal Broca’s and premotor cortex, but not in dosolateral PFC (blue only) or superior temporal (green only)

+0.4 +0.4 overlap easy no overlap mid diff

e

g e l

n g a

a n n

h a

g

c h i

l % s c

a

n

g

-0.4 -0.4i

% s

1 2 3 4 5 6 1 2 3 4 5 6 Time Point Time Point A: Syntactic Complexity B: Phonological Overlap

Figure 3. Activation in temporal regions (such as green ROI shown here) is specific to sentence processing. Graph A shows an effect over time of syntactic complexity in the filler sentences. These regious do not show effects of phonological overlap, however (Graph B) IBSC Year 2 Progress Report 22

Project 4: Mechanisms of Cognitive Control

Investigators: Jonathan D. Cohen, Todd Braver, Randall C. O’Reilly

Specific Aim 1: Modeling of working memory, reinforcement learning and performance monitoring.

Explicit model of error detection using reinforcement learning. One of the goals of work under this Aim was to explore the relationship between mechanisms of reinforcement learning and those responsible for performance monitoring. One model, that has begun to pursue this goal, is the reinforcement learning (RL) model of the error-related negativity (ERN). The ERN is an event related potential (ERP) that is consistently associated with, and immediately follows the comission of errors on speeded response tasks. One interpretation of this finding has been that it reflects the operation of an explicit error detection mechanism. Holroyd & Coles (2001) proposed a model in which the ERN was explained in terms of the operation of a temopral difference RL mechanism. In this model, the ERN reflects a negative temporal difference signal (indicating that circumstances are worse than expectded) when a performance error has occurred. An alternative hypothesis is that the ERN reflects the operation of a conflict monitoring mechanism (Botvinick et al., 2001).

Recent work (Yeung et al., in press) has demonstrated that post-response conflict can be used a reliable predictor of performance errors, thus providing a mechanism for error detection that does not require specific knowledge of the content of the response. Furthermore, post-response conflict exhibits a time course and sensitivity to task parameters that is very similar to the ERN. While the conflict monitoring hypothesis has been implemented in an explicit computational model (Botvinick et al., 2001; Yeung et al., in press), the RL-ERN model (Holroyd & Coles, 2001) posits an error detection mechanism (needed to calculate temporal differences) that had not yet been implemented. Furthermore, the Holroyd & Coles model relied on a processing mechanism that lacked realistic, continuous dynamics. Work over the past year has addressed this limitation. We have now implemented the RL-ERN model in a connectionist architecture that simulates the continuous dynamics of processing, and includes an explicit mechanism for error detection. The latter is composed of conjunction units that compare the state of stimulus categorization units with that of the response units. The RL mechanism allows the system to use feedback during training to assign positive and negative values to each stimulus-response conjunction, according to whether it represents a task-appropriate or inappropriate mapping. Once learned, this information can be used by the system as an internal mechanism for error detection (an error is detected when a negatively valued conjunction unit, representing a task- inappropriate stimulus-response mapping, is activated). The model is able to simulate a detailed set of findings regarding the evolution of the ERN under conditions of feedback-based learning, as well as the sensitivity of the ERN to the manipulation of task variables such as stimulus- response frequency. This work has been submitted to JEP: General (Holroyd et al., submitted).

Importantly, the development of this model now permits a precise, and explicit examination of the reltaionship between the error detection and conflict monitoring hypotheses concerning the ERN, and their relationship to RL. One insight that has already emerged from this work is that both mechanisms involve a form of conflict monitoring, which differs according to the IBSC Year 2 Progress Report 23 representations that are being compared and the dynamics over which this occurs. This, in turns, suggests behavioral and neurophysiological experiments that can test contrasting predictions of the two models. In these respects, this line of work accords well with all three center aims, by developing explicit models of cognitive processes - in particular, mechanisms of learning - and doing so in a manner that takes account of and can make predictions about neuroscientific findings.

PFC/basal ganglia working memory (PBWM) model. A second goal has been to further develop our model of PFC-basal ganglia interactions involved in working memory, that was described in the original application. The emergent dynamics of learning and cognitive control supported by this model are consistent with all three of the center aims: aim 1 (explicit models embodying emergent accounts of cognition) is clearly addressed, and future plans call for much more direct application of the model to empirical data; aim 2 (principles of learning, processing, and representation) is addressed by virtue of the model advancing novel mechanisms of learning and processing that go beyond those typically employed in cognitive modeling (and related work addresses novel forms of representation that emerge from these mechanisms); and aim 3 is also clearly addressed in that the model is based very directly on detailed neuroscience data of the basal ganglia and prefrontal cortex, representing a critical advance in biological plausibility over extant models (e.g., recurrent backpropagation) that deal with temporally extended information. The basic principle behind the PBWM model is that the basal ganglia provide a selective, dynamic, adaptive gating mechanism for modulating the working memory function of the prefrontal cortex. We published the original ideas for how the basal ganglia circuitry connecting with the prefrontal cortex could produce a dynamic gating mechanism in Frank et al. (2001). The latest work shows how neural circuitry associated with the basal ganglia and associated areas that produce dopamine neuromodulation (nucleus accumbens, ventral tegmental area, basolateral amygdala, and the substantia nigra pars compacta) can produce a powerful learning mechanism for training the matrisomes of the dorsal striatum to send appropriately-timed gating signals to the prefrontal cortex. Such a mechanism is critical for eliminating the persistent homunculi that have been associated with the control of working memory systems in extant theories. We have shown that these learning mechanisms can solve challenging working memory updating tasks, such as the 1-2-AX task.

A paper based on the initial efforts on this model was submitted to Neural Computation. Based on the reviews received, and disappointing performance on other tasks, considerable additional revision of the learning mechanisms has been undertaken. This has required developing a new version of the temporal-differences (TD) learning mechanism, which does not use differences across time to simulate dopamine spikes. Instead, this new mechanism, called PVLV (perceived value and learned value) involves differences between two reward value estimates indexed by the current input state. These two estimates differ in their rate of learning. We have found that our tasks were not sufficiently predictable across time to support the demands of TD learning, and that this new PVLV mechanism works much better in such unpredictable environments. We are currently testing this PVLV mechanism on a broad range of reward-learning data, and writing up the new PBWM model based on it for resubmission to Neural Computation.

We have also been continuing to develop a new model that uses the PBWM learning mechanisms to explore a greatly expanded version of the cross-task generalization paradigms. IBSC Year 2 Progress Report 24

Our goal is to develop a single model that can simulate a wide range of data across paradigms such as the Stroop, WCST, AX-CPT, Eriksen, Sternberg, etc. This work was funded by an NIH RO1 starting Jan 1, 2004. An ONR grant supporting related work on PFC/BG mechanisms, integrated with paradigms such as visual search and navigation, was also renewed. Finally, graduate student Michael Frank in the lab has one paper in press in Journal of Cognitive Neuroscience, and another submitted to Behavioral and Brain Sciences describing applications of the basal ganglia modeling work to understand a wide range of phenomena in the literature on normal and damaged BG function (e.g., Parkinson's, ADHD, etc). We are also conducting empirical tests of these models.

Development of representations in prefrontal cortex (PFC). A critical and longstanding challenge to models of prefrontal cortex function is the nature and development of representations in PFC that support its role in cognitive control. We have developed a framework in which to explore interactions between exposure to a broad range of tasks, and specializations that we have hypothesized are unique to PFC, including the capacity for active maintenance in the face of interference, and an adaptive gating mechanism that is driven by dopamine-mediated reinforcement learning. We have shown that when these specializations are included in a simulated PFC, and the system is exposed to a sufficient number of different types of tasks, representations can develop that are sufficient to support cognitive control that generalizes across specific stimuli and tasks. This is not true when the simulated PFC lacks either of the critical elements, or training does not include a broad-enough range of tasks. This same model was able to perform both the Stroop task and the WCST, two standard tasks used to measure prefrontal function, using the same prefrontal representations developed entirely through experience. This work has been submitted for publication (Rougier et al., in submission). This model represents an important advance in our understanding of how the prefrontal cortex supports flexible cognitive function. It helps advance center aims 1 and 2, by exploring interactions between specific learning and processing mechanisms, the types of representations to which they give rise, and the impact that these have on processing capabilities.

Specific Aim 2 fMRI study of feedback, error detection and conflict monitoring. One of our goals under Specific Aim 2 is to conduct empirical studies that will help further our understanding of how different types of information about performance (e.g., explicit feedback, error detection and conflict monitoring) are used to make adjustments in control, and their relationship to reinforcement learning. Toward this end, we have completed the fMRI component of proposed Experiment 8, in which subjects underwent scanning while performing a task in which they had to learn simple stimulus-response relationships. Behavioral data from the same paradigm formed an important testbed for the RL-ERN model described above. In this experiment, we used fMRI to test a prediction of this model: That the same brain areas responsive to errors in performance would also be responsive to negative feedback. Specifically, we used fMRI to compare ACC activity immediately following the response on correct vs. error trials, as well as following presentation of correct vs. error feedback stimuli. As predicted, we found a caudal and dorsal area of ACC that was more active for error trials than for correct trials, whether or not the source of the information was related to the response or to the feedback stimuli. These activations were associated with the same conditions in which the response ERN and feedback ERN are largest IBSC Year 2 Progress Report 25

(Holroyd & Coles, 2002). These results suggest that both the response ERN and the feedback ERN are generated in a similar region of ACC, and that this area is engaged in both error processing and reinforcement learning. Interestingly, this area was also activated on correct trials in the same experiment, under conditions of greatest conflict. These findings, now in press in Nature Neuroscience (Holroyd et al., in press), suggest that the same, or similar regions of ACC may be involved in both error detection and conflict monitoring, further motivating the development of a model that integrates both mechanisms. This work helps advance center aim 3, by testing predicions of our model using combined behavioral and neuroscientific methods.

Relative contributions of striatal and frontal systems in reward evaluation. A central component of our theory of basal ganglia function is that dopaminergic inputs convey a predictive hebbian learning signal. One challenge that has been made to this hypothesis is that, at least in rodent studies, reinforcement learning appears to exhibit time scale invariance (e.g., Gallistel 2001) - that is, animals seem to compute rate of reward, irrespective of absolute time courses of delivery. However, this observation is at odds with studies in humans, which demonstrate that people show differential sensitivity to rewards offered over different time scales. We have begun to explore the neural bases of this phenomenon, with the goal of better characterizing the time scale characteristics of striatal learning systems. As a first experiment, we conducted a study of intertemopral choice, in which subjects were offered choices between two rewards. For some pairs, one option offered an immediate reward (vs. a larger delayed reward), while others involved choices between two delayed rewards (one of which was more delayed but more valuable than the other). A common behavioral finding is that when the opportunity for immediate reward is available, subjects more steeply discount the delayed option than when the two choices are offered with a similar temporal offset, but both are in the future. We predicted that immediate reward opportunities would prefentially engage dopaminergic striatal systems, while frontal systems would be engaged by all choices. Our findings confirm this prediction, and furthermore demonstrate that when subjects opt for a delayed reward in the face of an immediate one, frontal activity is greater than striatal activity. The work has been submitted for publication (McClure et al., in preparation). Our findings support the idea that dopaminergic learning mechanisms may not show time scale invariance. At the same time, they pose interesting new challenges to our theory regarding interactions between dopamine-mediated mechanisms of reinforcement learning and gating in the basal ganglia and prefrontal cortex. These will be the subject of further modeling work in the coming year. This work aligns with center aim 3, by generating neuroscientific data that can serve to inform and constrain the further development of our modeling efforts. fMRI studies of hierarchical organization in PFC. The second goal of Specific Aim 2 was to use targeted brain imaging data (Center Aim #3) to constrain our theoretical model of PFC function and organization. In particular, we have aimed to test the idea that there is a hierarchical organizational structure in lateral PFC that aligns with the posterior-anterior axis. Two dimensions of variation are of interest: 1) temporal duration of activation maintenance; and 2) content of representations. Thus, we have hypothesized that anterior PFC regions will tend to have activity maintenance that is sustained for longer periods than posterior PFC regions. Moreover, we have suggested that anterior PFC will be involved in representing higher-order and more abstract information, such as task-level context and higher-order goals that require IBSC Year 2 Progress Report 26 decomposition into subgoals, or the integration of information that is currently being actively maintained.

In a first study examining this question, a triple dissociation was observed in lateral PFC, between posterior-ventral, dorsolateral and, anterior PFC sub-regions (Braver & Bongiolatti, 2002). Posterior-ventral PFC was engaged by the requirement to transiently access task-relevant but weak semantic features of items during a working memory task. Dorsolateral PFC was engaged by the demand to actively maintain trial-level context information over a delay period. Anterior PFC was selectively engaged under conditions in which trial-level context had to be maintained while the subgoal task of semantic access was carried out, and the context and subgoal information had to be subsequently integrated to generate the appropriate response. A second study examined dissociations related to the temporal duration of active maintenance (Braver et al., 2003). Anterior PFC showed sustained increases in activity throughout an entire block of trials requiring random switching between two different tasks (as compared to single task blocks). In contrast, dorsolateral and posterior-ventral PFC showed more transient patterns of activity associated with the actual timing of switches between tasks.

Two more recent studies have examined the question of integration within working memory. One of these studies examined the role of anterior PFC in episodic retrieval, the cognitive domain in which anterior PFC activity has been most commonly observed. (Reynolds et al., submitted). Additive factors logic was used to determine whether episodic retrieval and demands for goal-subgoal integration within working memory had interactive or additive effects on anterior PFC activity. Surprisingly, distinct subregions of anterior PFC were observed that were selectively engaged by either subgoal integration and episodic retrieval. A region of right anterior PFC was observed that showed sensitivity to both effects, but with different temporal dynamics within each trial. These results provide an important challenge for understanding the computations associated with episodic retrieval that engage anterior PFC.

A final study (Braver et al., in preparation) examined the question of how the requirements to integrate multiple sources of information within working memory might be distinct from simultaneously maintaining the same amount of information in a segregated form (Experiment 3 in Subaim 2.1). The study used a mental arithmetic paradigm, that required storing and updating partial products during a 4-step problem. A secondary working memory load was imposed, in the form a single digit presented prior to the trial. In the integration condition, the pre-trial digit had to be mentally integrated into the arithmetic problem, whereas in the dual-task condition the digit had to be just stored during the mental arithmetic phase and then subsequently recalled after completing the problem. Dorsolateral PFC regions were found to show increased activity during both of these conditions relative to single-task controls (single-digit recall and mental arithmetic). Interestingly, in the left anterior PFC increased activity was found selectively in the integration condition, during the mental arithmetic phase. In contrast, activity in right anterior PFC was increased selectively in the dual-task condition, during the phase of single-digit recall. This result suggests an important dissociation within anterior PFC for maintenance of "outer- loop" context information, between: a) sustained activity in anticipation of integration; and b) resuming the "outer loop" task after an inner-loop subgoal has completed. Such a finding may necessitate important refinements to our computational model of hierarchical PFC representation. IBSC Year 2 Progress Report 27

Future Plans:

As outlined above, each line of work has natural and well-specified extensions that are currently being pursued. Under Specific Aim 1, we plan to use the newly developed model of error detection to compare predictions that this makes with those of the conflict monitoring model regarding the ERN, and associated behavioral phenomena (e.g., post-error and post-conflict sequential adjustements in behavior). Data from experiments designed to test these predictions will be used to determine the contribution that each mechanism makes to performance monitoring, which will in turn be used to develop an integrated model that incorporates both types of mechanism. An additional direction of simulation work is to develop a model which integrates conflict and error monitoring, by examining how conflict-like representations might develop from error-based ones in the ACC via a dopaminergically-mediated learning mechanism.

Also under Specific Aim 1, we plan to develop further, and apply the PBWM model to a wide range of cognitive control/working memory tasks (as noted above), and explore the learning mechanisms in the context of the growing body of literature on the nature of dopamine signals in animal learning paradigms. We also plan to extend the model by incorporating a model of the orbital/ventromedial prefrontal cortex, which we hypothesize provides top-down biasing to the reward-learning areas of the basal ganglia, supporting such phenomena as the ability to favor larger but more temporally distant rewards in the face of more immediate but less valuable rewards. In the context of the PFC representation modeling, we plan to attack the issue of how this system can support generativity (the ability to perform truly novel behaviors), in addition to generalization.

Under Specific Aim 2, we plan to further explore the factors that influence engagement of striatal, orbitofrontal, and prefrontal cortical areas under conditions of intertemporal choice. These data will help further our understanding of the contribution that each of these systems makes to reward valuation, reinforcement learning, and decision making over a range of time scales, and the concomitant development of our models of these mechanisms. Additionally, further imaging studies of anterior PFC are proposed, which explore the 1-2 AX-CPT paradigm (Experiment one of Subaim 2.1) in terms of the duration over which task-level context must be maintained (Experiment 3) and the specific effects of stimulus-response mapping probabilities. For example, one specific hypothesis to be tested is that task-level context information is only actively maintained in anterior PFC if it can be used to substantially re-configure attention to particular trial level context information (e.g., compare conditions in which Task 1 vs. Task 2 alters the probability of receiving particular cue-probe sequences against conditions in which Task 1 vs. Task 2 alters stimulus-response mapping rules, but not the expected frequency in which different mappings might occur).

Supported Presentations and Publications Including Publications in Progress

Braver, T. S., Reynolds, J. R., & Donaldson, D. I. (2003). Transient and sustained cognitive control during task switching. Neuron, 39, 713-26. IBSC Year 2 Progress Report 28

Braver, T. S., DePisapia, N., & Slomski, J.A. (in preparation). Goal-subgoal coordination in anterior PFC: Hemispheric dissociations.

Holroyd, C. B., Nieuwenhuis, S., Yeung, N., Nystrom, L., Mars, R. B., Coles, M. G. H., & Cohen, J. D. (in press). Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nature Neuroscience.

Holroyd, C. B., Yeung, N., Coles, M. G. H., & Cohen, J. D. (submitted). A mechanism for error detection in speeded response time tasks.

McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (submitted). The Ant and the Grasshopper: Separate neural systems are involved in valuing delayed monetary rewards over short and long time scales.

Nieuwenhuis, S., Yeung, N., Holroyd, C. B., Schurger, A., & Cohen, J. D. (in press). Sensitivity of electrophysiological activity from medial frontal cortex to utilitarian and performance feedback. Cerebral Cortex.

O'Reilly, R.C. & Frank, M.J. (submitted). Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia.

Reynolds, J. R., McDermott, K. M., & Braver, T. S. (submitted). Direct comparison of anterior prefrontal cortex activity during subgoal processing and episodic retrieval.

Rougier, N. P., Noelle, D. C., Braver, T. S., Cohen, J. D., & O'Reilly, R.C. (submitted). Prefrontal Cortex and the Flexibility of Cognitive Control: Rules Without Symbols IBSC Year 2 Progress Report 29

Project 5: Interactive Processes in Perception: Neurophysiology of Figure- Ground

Investigators: Tai Sing Lee and Carl Olson

The general aims of this project are to investigate how different visual areas interact during perceptual inference, and to understand the computational nature and development of such interaction through neurophysiological experiments. The overall plan is to first characterize the neural activities to specific stimuli in the early and higher order areas, and then in a later stage of the project to investigate the interaction between these areas during perceptual processing and learning. Our project has been progressing along several lines that are consistent with the general aims of the Center.

Theoretical studies on cortical inference:

We reviewed some recent neurophysiological findings in the context of several interactive computational theories. We found the evidence is suggestive of a view that cortical inference can be modeled in terms of a hierarchical Bayesian system (Lee 2002, Lee 2003, Lee and Mumford 2003), in which the higher visual area provides top-down prior beliefs to constrain the computation in the early visual areas. The evidence however is mostly indirect and we conclude that it will be important to plan experiments to gather more direct information on these issues. In the process, we found that sequential Bayesian belief updating (or particle-filtering) appealing as a potential paradigm for conceptualizing neural computation. To explore these ideas, we have developed an explicit model for computing visual motion parameters based on V1 neural activities using the particle-filtering paradigm. In this work (Kelly and Lee 2003), we derived the Volterra kernels of the transfer function of the V1 neurons, and then used particle filtering in conjunction with the derived Volterra kernels to decode the visual signals. The particle-decoding paradigm works by generating multiple hypotheses (samples) based on the history of the hypothesis, propagating with priors and performing importance sampling on them based on observations. The algorithm works well in decoding a class of slowly moving sinewave gratings.

While the decoding work is interesting by itself, we see this as an testing bed for understanding plausible cortical inference mechanisms. In our model, the Volterra kernel part could be considered as the brain's internal generative model of the V1 activities, the filtering of the particles by the Volterra kernels corresponds to the generation of the top-down prediction of the various hypotheses. The interaction of the prediction and the observation in V1 contributes to importance sampling that re-distributes the particles in the hypothesis space. The particles are propagated in the feedforward connections from one area to another or the current connections within each area. To model the neural circuit more realistically, we have to build a circuit for inferring velocity rather than the phase of the sinewave grating. We are now working on a model for motion velocity estimation, and a model for static image analysis that requires an interaction of the global context and local information, as suggested by some of our earlier studies (Lee and Nguyen 2001). This work is consistent with all three major aims of the Center and involves both computational study and neurophysiological study. IBSC Year 2 Progress Report 30

Studies in the early visual areas:

In our neurophysiological study of the early visual cortex, we have been investigating neural processes and representations (Center Aim 2) in the context of figure-ground and target detection (Specific Aim 2). We have found that when the monkeys were trained to look for a certain pop- out object, for example a convex ball in a field of concave distractors, V1 neurons responded to it more rigorously (and exhibited a stronger pop-out effect) than to the untrained ones. This year, we seek to understand the neural responses to this class of stimuli in a more systematic and general fashion. First, we investigated whether the pop-out effect in these stimuli can be influenced by the luminance contrast of the stimuli. We found that the pop-out effect is dependent on luminance contrast. It is strongest for intermediate stimulus contrast, weaker for very strong or very weak contrast. This suggests that contrast should be an important factor in our future study. We are in the process of gathering more data as well as to understand the rationale and mechanisms responsible for this phenomena. Second, we attempted to verify whether the pop-out effect we observed based on shape from shading stimuli were in fact due to 3D interpretation of stimuli. We tested V1 and V2 neurons to their sensitivity to pop-out in 3D disparity produced by random dot stereograms, and observed very significant pop-out responses in V1 and V2 to 3D RDS signals. Whether the two pop-out effects are consistent or not require more study. In the coming year, we will also investigate whether the early visual neurons would respond consistently to geometrical structures defined by the stereo signals and the shading signals, and how they would respond when the two visual cues are consistent or when they are in conflict. This will illuminate the rules by which multiple fundamental cues interact during perceptual inference. Both of these series of studies are on-going but we are planning to submit one or two Neuroscience abstracts on these topics.

The second major thread of this project is to understand the neural basis and time course of perceptual learning in V1 and V2 (specific Aim 3 of the project, Aim 2 of the Center). In particular, we are asking how much practice it will take for an object to become a salient figure relative to others, and how do V1 neurons' activities evolve during this process? We have set up a 128 channel recording system and have implemented a Utah bionic array for chronic recording in an awake monkey early this year. The monkey survived the surgery, and was trained subsequently to perform the fixation task. Current recording revealed that all 98 channels exhibited some visual evoked responses to stimulus flashed on the screen. However, the signal to noise ratio has been too poor to carry out our study at present. We are still investigating ways to improve on the signal to noise ratio. Meanwhile, we are ordering a new array to do a second implant. The Utah array is the most efficient and promising technique for studying the effect of familiarity and statistical learning on the figure-ground process (specific Aim 3 and 4). Though we will keep trying, the lack of immediate success of the array, despite the presumed improvement in the manufacturing technology, suggests that a more cautious and conventional single-unit approach is warranted. In the coming year, we will pursue this neural plasticity component of the project using single-unit recording technique in parallel with further experiments with the array technology.

Studies of Inferotemporal Cortex IBSC Year 2 Progress Report 31

Repetition priming and repetition suppression have been a major focus of our studies of inferotemporal cortex over the past year. Repetition priming is a form of rapid perceptual learning, whereby prior exposure to an object allows faster recognition of that object on subsequent exposures. Repetition priming may be related to repetition suppression among neurons in inferotemporal cortex (a decline in response strength occurring over multiple presentations of the same stimulus). However, the relation of repetition priming to repetition suppression has never been studied at the single-neuron level. To study the relation between the two phenomena, we first had to develop a priming paradigm suitable for use in monkeys and demonstrate, for the first time, in its context, that monkeys exhibit repetition priming. We trained two monkeys to perform a symmetry decision task. In this task, the monkey reported whether a visual stimulus was symmetrical or asymmetrical by making either an upward or a downward saccade. Over the course of the experiment, each stimulus was presented twice, with lags between presentations ranging from zero to sixteen intervening different stimuli. Consistent with human behavioral studies of repetition priming, the monkey's reaction time was shortest for zero lag repeats and increased monotonically with lag. The next step was to monitor neuronal activity in inferotemporal cortex during task performance. We have so far collected data from 77 neurons in one monkey. The results demonstrate clearly that responses of IT neurons were suppressed by repetition at short lags, but not at long lags. We conclude that repetition priming observed behaviorally occurs in parallel to repetition suppression at the level of single neurons (McMahon and Olson, 2003 Abstract).

A second focus of research has been on understanding the mechanisms of a form of oscillatory activity which we recently discovered in inferotemporal cortex. Some inferotemporal neurons respond to visual stimuli by firing action potentials in a series of bursts at a frequency of around 5 Hz. We speculated that the oscillatory responses depend on competitive interactions among neurons selective for different stimuli. To test this hypothesis, we recorded neuronal responses to a preferred foveal stimulus (the 'object') presented either in isolation or against the backdrop of an already present peripheral non-preferred stimulus (the 'flanker'). The presence of the flanker enhanced the oscillatory component of the response to the object. Working in collaboration with other Center researchers whose orientation is computational (Samat Moldakarimov and Carson Chow), we have now constructed a model that can account for the experimental data. The model consists of two pools of neurons, one representing the object and the other the flanker, with reciprocal inhibition, self-excitation and adaptation. All synaptic connections exhibit synaptic depression depending on the pre-synaptic activity. We have found that depending on the strength of the inputs, synapses, adaptation, or depression, there is either oscillatory activity, both pools are active (on-on), both pools are quiescent (off-off), or one pool is active and the other is silent (on-off). The results both help to explain phenomena we have observed and make predictions for future experiments (Moldakarimov, et al, 2003 Abstract).

Both lines of research described above are well advanced but are not yet by any means complete. During the coming year, we will continue to pursue both within the general framework already laid down.

Supported Papers and Presentations IBSC Year 2 Progress Report 32

Kelly, R., & Lee, T.S. (2003) Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels. Advances in Neural Information Processing Systems, MIT Press. In Press.

Lee, T. S. (2002) Top-down influence in early visual processing: A Bayesian perspective. Behaviors and Physiology 77(4-5): 645-650.

Lee, T. S. (2003) Computations in the early visual cortex. J. Physiology (Paris), 97(203), 121- 139.

Lee, T. S., & Mumford, D. (2003) Hierarchical Bayesian inference in the visual cortex. Journal of Optical Society of America, A. 20(7): 1434-1448.

McMahon, D. B. T., & Olson, C. R. (2002). Linearly independent selectivity for shape and color in single neurons in monkey inferotemporal cortex. Program No. 260.9. Washington, DC: Society for Neuroscience, 2002. .

McMahon, D. B. T., & Olson, C. R. (2003). Neural activity corresponding to repetition priming in monkey inferotemporal cortex. Program No. 385.22. Washington, DC: Society for Neuroscience, 2003.

Moldakarimov, S., Rollenhagen, J. E., Olson, C. R., & Chow, C. C. (2003). A model of low frequency oscillatory visual responses in macaque inferotemporal cortex. Program No. 701.20. Washington, DC:, Society for Neuroscience, 2003. IBSC Year 2 Progress Report 33

Project 6: Basic Mechanisms and Cooperating Systems in Learning and Memory

Investigators: James L. McClelland, Julie A. Fiez, Randall C. O'Reilly

This project pursues the further exploration of the principles of that govern learning in parallel distributed processing networks (Center Aim 2), incorporating findings and insights from cellular and systems neuroscience (Center Aim 3). There are four key ideas to be explored through the investigations in this project: 1) Learning and memory depend fundamentally on an essentially Hebbian mechanism of learning; that is, the brain tends to strengthen whatever response it makes to given inputs; 2) This Hebbian process takes place within dynamic attractor networks that are fundamentally competitive and interactive in character; 3) The tendency to strengthen elicited responses can lead to progress in learning, but it can also lead to failures; and if it is left to operate without modulation or control, a mechanism of this sort will be too weak to lead to sophisticated cognitive competencies. This leads to the idea that acquisition of cognitive competencies in a Hebbian system depends on the modulation of the Hebbian learning process in target brain regions in response to task demands, outcome information, and constraining input from other brain regions. 4) Overt manifestations of successes and failures in learning or memory reflect both the basic mechanisms involved and their integration within a system of interacting brain structures (Center Aim 1). Here these ideas are explored in two content areas, where simulations incorporating points 1--4 will address patterns of experimental findings.

Specific Aim 1: Modeling successes and failures of phonological learning in adulthood. Here we extend a simple preliminary model that captures aspects of phonological category learning, in an effort to provide a more detailed and more biologically constrained account of some of the factors that contribute to successes and failures in the acquisition of new phonological distinctions in adults. This work incorporates specific assumptions embodying points 1--3 above to address data from behavioral and functional imaging experiments.

Specific Aim 2: Modeling successes and failures of semantic learning in amnesia. This part of the work investigates patterns of findings seen in semantic learning tasks in amnesics and normal individuals, in hopes of further exploring the utility of points 1--3 above, and also addressing point 4 by explicit consideration of the nature of the contributions of different brain regions in these tasks.

Successes and Failures of Phonological Learning in Adulthood. The Case of Japanese Adults Learning English [r] and [l]

Computational investigations. The goal of the modeling project is to begin to build toward the formation of a biologically informed model of speech category learning. Our efforts (Vallabha and McClelland, in preparation) have focused on category formation and category structure in speech, also addressing broader issues in acquired equivalence and acquired discrimination in category learning in general. The eventual goal is to use the account as a basis for understanding the initial formation of a robust category discrimination structure that then provides the basis for acquisition of non-native spoken language distinctions in adulthood. IBSC Year 2 Progress Report 34

Briefly, research on speech perception has confirmed that speech sound categories have a graded internal structure, with some sounds being perceived as more typical exemplars of the category. Infants show sensitivity to this prototype structure by around the age of 6 months and also exhibit a ``magnet effect'' (Kuhl, 1991), wherein sounds are less discriminable near the category prototypes than near non-prototypes. These results are remarkably similar to the phenomenon of acquired similarity in category learning, viz. stimuli associated with the same outcome become less discriminable from each other (Goldstone, 1998). Subsequent research (Guenther et al., 1999) suggests that acquired similarity and acquired distinctiveness are both possible in auditory perceptual learning. Thus, it seems possible to explain the results from speech perception and category learning in a single unified model.

A biological motivation for this work comes from studies by Michael Merzenich and his colleagues (e.g., Merzenich & Jenkins, 1995). In one study (Recanzone et al, 1992), monkeys were trained to discriminate the frequency of vibrotactile stimuli. After training, neural recordings from area 3b showed changes in the receptive fields, and increased and more coherent firing rate activity among the neurons. Possibly, changes such as these are involved in category learning also. A final, overarching, consideration in the modeling work was to explore all these phenomena in the context of Hebbian learning.

Our model addresses learning of the English /r/-/l/ contrast in Japanese speakers with well- established pre-existing knowledge of the sound structure of their own language. It should be noted that the model is not a theory of speech perception or phonology, but rather a theory of category acquisition that can be embedded within a larger framework. The model consists of a network with four pools of units. The input pool (IP) projects fully to a representation pool (RP); this layer is fully and bidirectionally connected to a third competitive learning pool (CP) thought to be part of the perceptual representation system. Processing and learning in this system is based exclusively on self-organization using competitive Hebbian learning rules with bi-directional symmetry of connections (e.g. Grossberg, 1976). In the first phase of training, the network is presented with unlabeled instances of the Japanese apico-alveolar tap and the Japanese velar approximant. These two sounds are acoustically closest to the American English /l/ and /r/, so their presentation allows us to simulate the warping of Japanese listener’s perceptual space near /r/ and /l/. In the second phase, corresponding to /r/-/l/ training, a fourth output pool (OP) is introduced. This pool has bi-directional connections to the representation pool and is used for mapping percepts onto overt “r” and “l” responses. The OP works like the CP, but has the additional property that it can be influenced by information about the correct response. Thus, the network supports /r/-/l/ training along a continuum from no supervision to partial supervision (e.g. feedback is only provided for some of the trials) to complete supervision. Feedback can also shape learning via reinforcement-based modulation of the Hebbian learning process. A question being addressed in the work is whether these approaches produce different learning outcomes.

The model described above has been used to model a number of aspects of Japanese adults' speech perception as revealed by investigations by others (Yamada & Tohkura, 1992) and perceptual learning of the English /r/-/l/ contrast as revealed in the published experimental study of McCandliss, Fiez, Protopapas, Conway, and McClelland (2002). That study used a simple training regime in which the subject hears a sound derived from resynthesized naturally spoken minimal pairs (e.g. /rock/-/lock/) and must indicate if the sound begins with /r/ or /l/ with a IBSC Year 2 Progress Report 35 button press. The study investigated the effects of using an adaptive training regime (starting with exaggerations of natural English /r/ and /l/ stimuli, increasing the exaggeration after each error and decreasing it after 8 successive correct responses) vs. fixed training with unexaggerated stimuli that are initially difficult for the subjects (A-X discrimination performance less than 70% correct). The study also investigated the role of feedback: Half of the subjects in each of the adaptive and fixed conditions received feedback after each response and the other half received no feedback. Overall we have been highly successful modeling nearly all aspects of the data, and a paper describing the work is now in preparation.

We have found through explorations of this model that two aspects of the McCandliss et al. data are an especially strong constraint. (1) The rate and robustness of learning is greatest in the fixed-feedback condition and least in the fixed-no-feedback condition, with the two adaptive conditions falling in between. Part of the reason for this ordering may be that the adaptive training is too conservative in moving toward the actual native range, and therefore the subjects may have progressed more slowly than they could have. (2) Learning in the fixed-no-feedback condition is slow, and the model can explain this by assuming that subjects in this condition become stuck in a single attractor state. However, subjects in this condition do eventually learn, and there are other signs that they are not really stuck. The dilemma is that measures preventing the network from being stuck tend to result in learning that is not slow enough. Fixes for this are easy to envision, but seem ad hoc, and we continue to examine the possibility of accounting for this aspect of the results on a more principled basis as we finalize the paper for publication.

An important future direction for the work is to examine the dynamics of speech processing in view of the constraints imposed by the fact that speech sounds must be processed very rapidly. In running speech phonemes occur at rates in the range of 10/second, yet current models (ours included) make use of attractor dynamics that settle into fixed points on a relatively extended time scale, and require external reset before the next input can be processed. To begin to address this, a new modeling project involving Carson Chow, Jay McClelland, and Samat Moldakarimov is underway, bridging between the current effort and Project 8b. As discussed in that section, we are investigating how recurrent inhibitory circuitry might be used to allow an experience- dependent process to learn (a) to quickly and strongly activate an attractor-like state which then (b) engages inhibitory interneurons in proportion to the extent of excitatory activation so as to clear the state of the system in readiness for subsequent inputs. In this approach experience would result both faster settling and faster clearing of the network state making way for subsequent inputs, in accordance with the effects of experience seen in responses to vibrotactile stimuli in the work of Recanzone et al. (1993).

Experimental Investigations. Ongoing work with collaborators outside the IBSC (Lori Holt and Erin Ingvalson) is continuing the behavioral investigation of perceptual learning in Japanese adults. Based on earlier reports in the literature (Yamada & Tohkura, 1992) we have identified a key perceptual cue for the discrimination of English [r] and [l]. This cue – an F3 transition that starts low and rises for /r/ vs. starts high and falls a little for /l/ is reliably used my native English speakers but is not used even by many native Japanese who have extensive exposure to English and can discriminate /r/ vs. /l/ based on other correlated cues. We have now constructed a set of synthetic speech continua, each ranging from [r] to [l] in the presence of a different vowel. Strikingly, many of the Japanese adults that we have so far tested appear to be nearly completely IBSC Year 2 Progress Report 36 insensitive to this cue, and efforts to train them using the fixed training with feedback procedure that worked so effectively in McCandliss et al have proven unsuccessful (Ingvalson, Holt, and McClelland, 2003). Two ongoing thrusts of this work are to (1) further investigate the basis of the Japanese adult’s difficulty in using the cue in question and (2) further investigate various learning procedures that might be used to enhance learning to use the F3 transitions. We have established that Japanese adults are able to discriminate the contrasting F3 transitions when these occur in isolation from the other formants normally present in full speech, and we have some evidence that they can learn to increase their reliance on this cue in full speech, using a fading procedure.

Semantic and Episodic Memory

Over the last forty years there have been many investigations of the learning of meaningful associations in amnesia, using cued recall tests. Earlier work (e.g. Cutting, 1972) indicated that amnesics (as well as normals) benefit considerably from the existence of a prior meaningful association between a cue and a target item, especially when the target is among the strongest pre-experimental associates of the cue. More recent work (Heyman et al, 1993, Hamman & Squire, 1995, Bayley and Squire, 2002) indicates that the presentation conditions of the material at the time of study play an important role in determining success in a cued recall test. Amnesic subjects can show substantial learning if they repeatedly study each cue-target pair without first attempting to generate the correct response. When subjects are encouraged to generate a response to each cue before studying the cue-target pair, performance is generally poorer, in some cases showing no apparent signs of learning at all.

A theory that has been developed by Kwok and McClelland (in preparation) addresses many aspects of these findings. The theory is based on three principal tenets:

1. Complementary learning systems. Building on the earlier proposal by McClelland, McNaughton, and O'Reilly (1995) the theory assumes that the brain employs complementary learning systems, one located in the medial temporal lobes and one located in the neocortex. The medial temporal lobes provides a fast learning mechanism that employs sparse patterns of neuronal activity to minimize overlap of similar memories. The neocortex provides a much slower learning rate and relies on overlap of patterns representing similar material to encourage generalization and the discovery of systematic structure shared across items in memory.

2. The role of context in selection of situation-specific associations. Building on the ideas of Humphreys, Bain and Pike (1989) the theory relies on the idea that accurate performance in a cued recall task depends on associating a representation of the study context with a representation of each cue-target pair. Context, cue and target are all represented in the neocortex, and fast learning in the medial temporal lobes links the representations of context, cue, and target together. At test, the context-cue compound jointly constrain the response of the system, allowing correct retrieval of the target that was studied with the cue rather than other material that may have been associated with the same cue in different contexts.

3. The role of Hebbian synaptic plasticity in strengthening both correct and incorrect responses that are elicited by cue information. In the theory learning occurs by adjusting the strengths of IBSC Year 2 Progress Report 37 connections among neurons according to Hebb's postulate: When neuron A participates in firing neuron B, strengthen the connection from A to B. Given that the target pattern is represented as a pattern of neural activity over one population of neurons, and the cue-plus-context are represented as patterns of activity over other populations, the effect of Hebb's rule is to strengthen the network's tendency to produce whatever pattern arises over the target population whenever the cue-plus-context are present. Target patterns generated in response to the cue- plus-context situation are strengthened, whether correct or incorrect, thereby providing a basis on which self-generated error responses can be strengthened.

Kwok and McClelland have implemented the theory in a simulation model that has been used to account for several findings in the literature including the following:

1. The role of strength of prior association on cued recall performance in normals and amnesics (Cutting, 1978). A key finding in the Cutting study is that study of pairs that have strong prior associations (e.g., dog-bone) leads to a large increase in the probability of producing the response word (bone) in a cued recall test for both normal and amnesic subjects. New learning works together with existing knowledge to enhance performance rather than simply providing a second independent opportunity to learn. The model captures these effects. For example, in the simulation, studying a pair of strongly associated words produces a large increase in cued recall of the response even after extensive damage to the MTL fast-learning system, while showing no evidence of improvement from presentation of of members of a pair with no prior association.

2. The advantage of study-only relative to generate-study for the mild to moderately amnesic patients used in the experiments of Hamman and Squire (1995) and Hayman et al (1993). Hamman and Squire (1995) found a study-only advantage relative to a generate-study condition using cues (such as Medicine Cures) and targets (such as HICCUPS) that are meaningful but of low generation probability in their mixed group of mild to moderate amnesics. As in many cued recall studies, normal controls showed a generate-study advantage. To model the Hamman and Squire data, Kwok and McClelland used very weak pre-experimental associates and assumed that the patients' amnesia was the result of substantial but sub-total damage to the fast learning system in the MTL. The use of a Hebbian learning rule in the model meant that both correct and incorrect responses elicited during the generate phase in the generate-study condition were strengthened. Damage to the MTL in the model resulted in slower learning of the actual studied association, providing a much greater likelihood that anmesics would produce and strengthen errors rather than correct responses during the generate phase.

3. Gradual strengthening of low-probability semantically related cue-target pairs under study only conditions in the profound amnesic EP who has a nearly complete lesion to the MTL (Bayley and Squire, 2002). This study used the same materials previously used in the Hamman and Squire study (previous testing of EP by Hamman and Squire with only a few exposures resulted in no evidence of learning). A set of cue-target compounds (e.g. Medicine Cures / HICCUPS) was presented 24 times over 12 weeks under study only conditions and tested after every 12 study trials. Over the course of training there was a gradual increase in the probability of producing the target in response to the cue from 0 to 18.8%. For this case it was assumed that the amnesia was due to a total lesion of the MTL, leaving only the slow-learning neocortical learning system. IBSC Year 2 Progress Report 38

A property of the model is that there is a strengthening of any self-generated responses to the cue and a strengthening of the correct, experimenter provided association when it is studied. The reason for the study-only advantage in the simulation of Hamman and Squire (1995) and for the gradual success in learning in the study only condition of Bayley and Squire (2002) is that the study-only procedure prevents self-generation of incorrect prior associative responses, and therefore prevents the strengthening, allowing the effects of strengthening of the correct experimenter-defined responses to show through. In the generate-study condition, such strengthening also occurs, but incorrect self-generated responses that are also strengthened compete with correct responses, thereby obscuring the learning that it thought to have occurred.

Based on this account, it follows that it should be possible to find evidence that the correct experimenter-provided association is learned even under generate-study conditions, provided the competition from incorrect self-generated responses can be prevented from overshadowing the correct response. In an experimental study with the severely amnesic patient VC (Bird, Cipolotti, Kwok, and McClelland, in preparation) we sought to explore this issue by assessing learning of an experimenter-defined cue-target association under study-only vs. generate-study conditions using a fragment of the correct target word in addition to the associative retrieval cue. Thus, VC studied cue-target pairs like "Medicine Cures / HICCUPS" under both generate-study and study- only conditions, and received tests with both a standard recall cue (e.g., "Medicine Cures ____") to allow replication of the study-only advantage found in such experiments, and a cued-recall- with-fragment test using fragments compatible with the correct response but not with possible incorrect self-generated responses (e.g., receiving "Medicine Cures _IC___S"). The experiment produced very clear-cut results. First, the substantial study-only advantage with cued recall was replicated. Second, however, the difference between study-only and generate-study was completely eliminated in the cued-recall-plus-fragment test. This latter condition produced a high level of correct responding overall and a numerical tie in correct responses for the two conditions. Both conditions produced far more correct responses than a control condition with similar cues plus fragments but using items that had not been studied at all. Subsequent control experiments investigated the extent to which learning in the cued-recall-with-fragment condition was associative. To address this, two sets of 20 target words were studied using the standard study-only procedure. Half of the items were tested with the same cues used during study plus fragments, while the other half were tested with different cues plus fragments. For example, the item ‘BANANA’ might have been studied with ‘Grocer Sells _AN__A’ and tested with ‘Pudding Contains _AN__A’. Once again never-studied cue-plus-fragment items were also included in the test. The results showed an advantage for items studied with the same cues relative to items studied with different cues, though the latter showed a slight advantage over never-studied controls, suggesting both associative and item-based effects of the study experience on performance in this task.

The Kwok and McClelland model has been used to simulate performance in this task, and shows similar effects to those seen in the experiment. However, the model in its current form is not as good as the experimental subjects are in using the fragment cue to block retrieval of a competing response. The model also has some other shortcomings perhaps springing from the simplicity of the hippocampal network implementation. However, it illustrates several important principles IBSC Year 2 Progress Report 39 including the importance of cooperation between multiple brain regions and the plusses and minuses of Hebbian learning.

Supported Presentations and Publications in Progress

Bird, C., Cipolotti, L., Kwok, K., & McClelland, J. L. (in preparation). Learning meaningful associations in amnesia: Experimental investigations of items learned under generate-study conditions.

Kwok, K., & McClelland, J. L. (in preparation). A computational account of the roles of hippocampus and neocortex in the acquisition of semantically related associations.

Vallabha, G. K., & McClelland, J. L. (2003). Learning the sounds of speech: A Hebbian account. Presented at the 44th Annual Meeting of the Psychonomic Society, November 6 - 9, Vancouver, Canada.

Vallabha, G.K., & McClelland, J.L. (in preparation). A Hebbian model of speech perceptual learning. IBSC Year 2 Progress Report 40

Project 7: Age and Experience Dependent Processes in Learning

Mark S. Seidenberg, Matt Lambon-Ralph, and James L. McClelland

Project 7 examines age and experience-dependent processes in learning, particularly language learning. There are prominent age-related phenomena in language learning: the "critical period" early in life, during which a first language is learned rapidly and seemingly effortlessly, contrasts with language learning later in life (e.g., learning a second language), which is typically highly effortful, with some components of language particularly difficult to master (e.g., phonology, morphology). The bases of these changes in learning capacity are unknown but they are often as interpreted by analogy to critical or sensitive period effects in domains such as vision (e.g., development of ocular dominance columns). Thus, they are seen as reflecting a loss of neural plasticity that limits learning after the "closing" of the critical or sensitive period caused by neurobiological developments such as dendritic proliferation and pruning.

The research in Project 7 considers these phenomena from a different perspective, examining the extent to which age-related effects in language learning and related domains can be explained in terms of an explicit theory of learning, derived from the PDP approach. The basic hypothesis is that, although various early neurobiological developments establish necessary preconditions for language and other types of learned behaviors, the phenomena associated with the "critical period" and subsequent loss of plasticity can be understood in terms of basic computational properties of learning in PDP networks.

The research involves developing explicit models that simulate major features of the characteristic developmental trajectory in language acquisition, including the critical period and subsequent decrease in plasticity (Center Aim 1: Explicit models embodying emergent accounts of cognition), and exploring how these phenomena derive from basic properties of PDP networks (Center Aim 2: Principles of learning, processing, and representation). Most of the research involves computational models and behavioral experiments linked to the models; however, some issues will also be addressed using fMRI, providing evidence about the brain bases of learning which will further constrain our accounts of these phenomena (Center Aim 3, Incorporation of constraints from neuroscience). Our longer term goal is to link our research to findings concerning the biological bases of human learning emerging from other projects, allowing us to move from relatively artificial neural networks to more biologically realistic ones.

The basic working theory will be described briefly, followed by a summary of ongoing research projects and future plans:

Early neurobiological development sets up preconditions for learning in humans and other species. Given appropriate experiential input, early language learning is rapid. In PDP networks this occurs for specifiable technical reasons related to fact that the network is initialized with random weights and uses a logistic activation function. Weight changes are largest early in training, when unit outputs tend to fall toward the middle of the activation function, i.e., on the steep linear portion. Learning to perform a task correctly means that the weights have assumed values that push activations toward extremal values (e.g., -1, 1), producing small error signals. Except perhaps under extreme conditions in which the input to the network is radically changed, IBSC Year 2 Progress Report 41 these weights are difficult to adjust further because of their contributions to successful performance, and small weights are effectively pruned.

The net result is a loss of plasticity associated with success in learning a task. Weights that are highly favorable for skilled performance of one task (e.g., using a first language) may be unfavorable for learning other tasks (e.g., learning a second language). We call this the Paradox of Success. This loss of plasticity is not a wholly negative thing. In the case of language, the knowledge that is acquired is systematic and represented in a way that enables similarity-based generalization, which is the basis for comprehending and producing novel utterances ("generativity"). This contrasts with the learning of unsystematic facts, such as most names for people, objects, and places, which afford little generalization but do not show critical period effects or a decline in plasticity over time (modulo effects of aging on, e.g., hippocampal function). Thus, a loss of plasticity may be the price we pay for the capacity to generalize.

On this view, the concept of "critical period" needs to be distinguished from "changes in plasticity." There is a change in plasticity associated with learning language; according to our analysis, however, this is related to facts about learning not a neurobiogically-defined window of opportunity. This observation is supported by evidence from studies of the learning of species- typical behaviors in animals such as rats and zebra finches (Doupe & Kuhl, 2000). These types of learning also occur during a critical or "sensitive" period early in development; however, this period can be greatly extended if the animal is not exposed to systematic input (e.g., raised in white noise). Thus the "critical period" seems more related to what is learned rather than when it is learned. It happens that this period of accelerated learning typically occurs early in an organism's life, but not because it is dictated by an intrinsic biological timetable.

This account invites reconsideration of much of the data seen as supporting the existence of a critical period for language, and raises many other issues (e.g., concerning second language learning). Note that we are not claiming that no critical periods exist for any species; clearly they do (e.g., critical periods for gene expression, imprinting in the ethological sense, and so on; see many articles in Bailey et al., 2001). We acknowledge that normal neurobiological development in the human also has time-dependent aspects (determined by genetic and experiential factors). Our more specific claim, however, is that is a category error to equate language learning with the development of ocular dominance columns or other highly specific aspects of the development of sensory systems. Language is a complex system involving auditory perception, speech-motor production, and conceptual knowledge, and it involves multiple distinct types of information including phonology, morphology, syntax, the lexicon, and so on. The acquisition of this widely distributed system, involving many brain structures, is likely to differ in character from the development of highly specific neural structures subserving early vision. Thus, although we hope to link what is now a computational-behavioral account of these phenomena to evidence concerning early brain development in humans, our working assumption is that landmark features of language learning, including "critical period" phenomena, can be understood in terms of computational principles from the PDP approach.

Progress Report and Ongoing Research

We have begun to address many of these issues as part of the IBSC IBSC Year 2 Progress Report 42

Age of acquisition effects. We initially developed these ideas in the context of the phenomenon called age of acquisition effects. Studies of adult word processing (principally reading aloud) suggested that the age at which a word was learned (in the child's first language) had a long term effect on skilled performance, independent of factors such as word frequency (e.g., Morrison & Ellis, 1995). Thus, early words were entrenched with poorer performance on later learned words, a loss of plasticity or capacity to learn. These findings were suggestive insofar as entrenchment of early-learned patterns had been mentioned early in the development of the PDP framework (Munroe, 1986). Lambon-Ralph and Seidenberg & Zevin independently began studying the validity of the AoA findings and developing a computational grounding for them. Our findings converged remarkably well: they suggested that age of acquisition effects in reading are due to confounding factors (such as length, concreteness, and frequency) that affect both word learning and skilled performance. Modeling allows these factors to be controlled exactly, since the same words can be used in conditions that vary the timing and frequency of exposure. The advantage for early-learned words is washed out by exposure to words that overlap with them in structure (Zevin & Seidenberg, 2002). At the same time, our models predict age of acquisition effects for more arbitrary mappings, such as learning the names for famous people), as in Ellis and Lambon-Ralph's (2000) simulations. Lambon-Ralph and Ehsan (in press) elegantly showed that whereas word naming does not show age of acquisition effects, naming pictures of the same objects does. Thus it is the nature of what is learned - the arbitrary vs. systematic mapping between codes - that determines whether early learned items show a long-term advantage.

Zevin and Seidenberg (2002) rationalized many of the seemingly conflicting AoA phenomena and made the further prediction that cumulative frequency of exposure to words, rather than the timing of exposure, affects skilled performance, which was confirmed in a behavioral study by Zevin and Seidenberg (2004). These results call into question the basis for "word frequency" effects that date back to the 19th century, as well as the almost reflexive reliance on frequency norms based on samples of adult texts (e.g., Kucera & Francis, 1967; the Celex database) in language research.

Future plans: A further prediction is that there should be age of acquisition effects on spoken word recognition, assuming confounding stimulus factors such as complexity and neighborhood size are controlled. Simple, monomorphemic spoken words involve largely arbitrary relations between form and meaning, the condition that is expected to produce early entrenchment. We will therefore conduct a study using spoken words similar to the Zevin and Seidenberg (2004) reading experiments. Both behavioral and event-related fMRI data will be collected (at the UW neuroimaging facility). The experiment will examine whether behavioral effects due to entrenchment of early-learned words are reflected in differing neural representations for these words, compared to later learned words of equal complexity.

Plasticity in non-human species. Studies of several nonhuman species suggest that the learning of species-typical behaviors (e.g., songs in zebra finches) can occur beyond the normal critical or sensitive period, if the animal is prevented from learning (by withholding systematic sensory input). These findings are consistent with the idea that it is learning rather than endogenous neurobiological developments that are largely responsible for the critical period. Zevin, IBSC Year 2 Progress Report 43

Seidenberg, and Bottjer (submitted) obtained complementary evidence concerning unlearning in adult zebra finches who had learned their songs normally. Placing the animals in white noise for an extended period of time results in unlearning of the song (i.e., production of deviant song). This shows that hearing the bird's own song is required for retaining it; thus the song "template" must be actively maintained, suggesting a lifelong process of adjustment and tuning, i.e. learning. Zevin et al. also found that the birds were capable of some relearning after re-exposure to their own songs (via tape) or to a new song tutor (adult male). These results suggest that the birds retain the ability to learn well past the critical period. However, degree of learning was variable across individuals and no bird's song returned to entirely normal.

Future plans: Seidenberg and Zevin will present an invited paper at the Attention & Performance XXI meeting in July integrating cross-species evidence concerning biological and experiential bases of critical/sensitive periods.

Computational modeling. We are conducting additional simulations that examine a broader range of conditions and phenomena related to plasticity. A current focus is on forging a theoretical, computational linkage between entrenchment effects (advantage for early-learned patterns, producing proactive interference) and "catastrophic interference" effects (advantage for later-learned patterns, producing retroactive interference, which is severe if different types of patterns are strictly blocked; McCloskey & Cohen, 1989). We know that catastrophic interference effects can be systematically reduced by relaxing the blocking of stimuli (Hetherington & Seidenberg, 1989), that is, providing even a small number of "refresher" trials on previously learned patterns. Conversely, the entrenchment of early-learned patterns may depend on continued experience with them, analogous to the zebra finch's need for continued exposure to retain its song. The computational bases of these phenomena have not been investigated sufficiently to be well understood. Our focus is on subsuming these additional phenomena within the theoretical framework that arose out of the AoA research.

Future plans: This is ongoing research; we expect it to generate testable behavioral predictions about degree of entrenchment or unlearning as a function of amount and timing of experience and type of mapping (arbitrary, systematic). However, a longer term goal is to relate these phenomena to issues in math education, specifically (a) evidence that early math learners exhibit maladaptive entrenchment effects for specific formats in which problems are presented (McNeill & Alibali, in press). Thus children and even many college students incorrectly answer a problem such as 2 + 9 = 5 + ______(they answer 16), because it is presented in a nonstandard format. Here entrenchment based on overlearning of specific problem types interferes with learning deeper principles such as equivalence. Our models can be adapted to this type of learning and make predictions about how to optimize the learning process to promote learning of systematic knowledge rather than specific problem exemplars. (b) Many current math curricula use a "spiral" teaching method in which topics such as arithmetic, fractions, estimation, and so on are taught for relatively brief units each year. For example, the child gets exposed to arithmetical problem solving, achieves a certain level of expertise, and moves on to another topic. Arithmetic is then returned to in subsequent school years, in theory at a deeper level. This is resembles the conditions that create retroactive interference in our networks: blocked training on one type of problem, followed by blocked training on other, often unrelated problems, resulting in unlearning of the earlier material. Thus the savings across the widely spaced learning episodes IBSC Year 2 Progress Report 44 typical of the "spiral" approach is likely to be small. Again we can examine these issues computationally with an eye toward identifying conditions that minimize both entrenchment of overly specific patterns and retroactive interference from subsequent learning.

Testing predictions about learning different aspects of language. Our theory suggests that difficulty of learning a second language depends on interactions among several factors: degree of competence in the first language (related to amount of experience) at the time of exposure to a second language; typological relations between languages (how similar/different they are); type of linguistic knowledge (e.g., phonology, morphology, syntax, lexical learning have different characteristics). This is a complex factor space which we are beginning to investigate in behavioral studies. In a study in progress we are examining English (L2) proficiency in native speakers of either Chinese or Serbian, which differ from English in different ways. Chinese has a minimal inflectional system; Serbian has a very rich one, which like English encodes tense on verbs and number on nouns. Chinese and English, in contrast, are more similar with respect to reliance on contrastive word order, which is relatively unconstrained in Serbian. Finally, neither Serbian nor Chinese has the type of determiner seen in English. We have tested 40 subjects (20 Serbian speakers, 20 Chinese) on an English grammar test that assesses knowledge verb tense, word order, and determiners. The prediction is that subjects should perform better on those aspects of English that are also represented in their native language. However, we will also examine whether performance is affected by age, amount, and type of exposure to the second language.

Future plans. This research is ongoing and will be completed in the coming year. We expect it to develop into a more systematic research program concerning second language learning.

Additional research: We are conducting other studies related to Center and Project 7 goals that can only be mentioned briefly here.

Plasticity in damaged systems. This research uses computational models of reading to examine recovery of function following brain injury. The two articles by Welbourne and Lambon-Ralph (in press, submitted) simulate effects of brain injury on reading using the Plaut et al. (1996) model. They also examine different ways of retraining the model designed to capture both natural (spontaneous) recovery and recovery that can be attributed to specific therapeutic intervention.

Limitations on learning L2 phonology. These provide perhaps the classic example of a critical period effect/loss of plasticity in language learning. We have been developing a model that takes acoustic representations (of CVs) as input and produces articulatory representations output, via interlevel hidden unit representations. The model is trained on English phonology and then tested on sounds recorded from a Zulu speaker. The model generates predictions about which types of novel sounds should be easy or difficult to discriminate, which have been tested in a behavioral study with UW undergraduates. In general, it appears that peoples' capacities to discriminate nonnative sounds may be better than the literature suggests, certainly compared to the much greater difficulty of reproducing the sounds. IBSC Year 2 Progress Report 45

Effectiveness of interventions with dyslexic readers. Harm et al. (2003) used a computational model of reading to examine the effectiveness of some prominent methods used to remediate dyslexia. The results indicated that methods that attempt to remediate deficits related to phonological representation will only be effective if introduced early, before the model/child has done a great deal of learning with these degraded representations (i.e., they become entrenched). Interventions that emphasize the mapping from orthography to phonology were more effective and not as dependent on timing. In effect it is easier to restructure phonology in ways that facilitate reading via teaching spelling-sound correspondences rather than by attempting to restructure phonology directly. These findings are consistent with the results of recent meta- analyses of the effectiveness of different remediation methods.

Supported Publications, Submissions, and Presentations

Lambon-Ralph, M. A., & Ehsan, S. (in press). Age of acquisition effects depend on the mapping between representations and the frequency of occurrence: Empirical and computational evidence. Visual Cognition.

Welbourne, S. R., & Lambon Ralph, M. A. (final revisions). Exploring the impact of plasticity- related recovery after brain damage in a connectionist model of single word reading. Cognitive, Affective & Behavioral Neuroscience.*

Welbourne, S. R., & Lambon Ralph, M. A. (submitted). Towards a platform for modeling rehabilitation in patients with acquired dyslexia. Neuropsychological Rehabilitation.*

Zevin, J. D., Seidenberg, M. S. & Bottjer, S. W. (submitted) Limits on reacquisition of song in adult zebra finches exposed to white noise. Journal of Neuroscience.

Zevin, J., & Seidenberg, M. S. (2004). How much but not when: Cumulative frequency, but not frequency trajectory, affects skilled word reading. Memory & Cognition, 32, 31-38.

The following papers related to Project 7 were also listed under Project 2:

*also listed under Project 2. IBSC Year 2 Progress Report 46

Project 8: Theoretical Foundations

J. L. McClelland, Randall C. O'Reilly, Michael Lewicki, and Carson Chow

In this project, we step back from the effort to bring our framework into direct contact with experimental data and consider the principles underlying the framework itself (Center Aim 2). Specifically, we consider the principles of learning, processing, and representation that guide the formulation of specific models in our framework. We seek to refine the core principles that form the basis of our efforts to develop explicit computational models of cognitive processes, and we seek to understand the implications of these principles and to consider whether they reflect principles arising at other levels of analysis.

David Marr pointed to three levels of analysis of cognitive function: The computational, the algorithmic, and the implementational. The computational level is the level of analysis at which the problem itself is stated, in terms of the goal of the computation, and the information available to address these goals. The algorithmic level is the level at which the method of solution is stated, in terms of the representations and processes that are used to achieve the goal. The implementational level is the level at which one considers the details of the machinery that actually carry out the processes described at the algorithmic level. In this taxonomy, one can see our framework as offering principles of learning, processing, and representation formulated at an algorithmic level, addressing such matters as the activation functions used, the organization and connectivity of the units, and the learning algorithm, somewhat abstracted from the underlying details of biophysical implementation.

Marr felt it should be possible to derive appropriate representations and processes from computational principles alone, reflecting the once-predominant view that details of implementation were irrelevant to the selection of algorithm. We have argued instead that the underlying biological machinery provides constraints and affordances that shape the processes and representations used, and the basic tenets of our framework are intended to capture crucial aspects of the characteristics of that machinery. Instead of Marr's top-down approach, we therefore favor a more interactive one, in which top-down and bottom up constraints conspire to shape the algorithms of cognition. We follow this interactive approach in the investigations described here. Incorporation of biological constraints is a central overall aim (Center Aim 3), and the work of this project is weighted toward their consideration while also considering the computational level.

Our effort is divided into three parts. Part 1 focuses on learning and begins at the algorithmic level, proposing an integrated learning algorithm intended to unify supervised and unsupervised learning. It considers computational effectiveness but places greater weight on biological plausibility, in that it is strongly shaped by our increasing understanding of mechanisms of synaptic plasticity in the brain. Part 2 focuses on representation, beginning at the computational level. It considers whether the representations used in the brain (e.g., the receptive field properties of neurons at various levels of visual processing) can be understood as appropriate solutions to the essential computational problem, namely that of inferring the structure in the world from sensory information. Part 3 focuses on the dynamics of processing and is grounded in our growing appreciation of the details of these dynamics as they are observed in real neurons. IBSC Year 2 Progress Report 47

It considers (a) the implications of these details for behavior and cognition; and (b) whether these implications can be captured at the more abstract level of the parallel-distributed processing framework via suitable reformulation and extension of the principles. All three parts ultimately target our algorithmic level, in that each has implications that may lead to improved statements of the principles of our framework. Furthermore, while each focuses on one aspect of the framework (either learning, representation, or processing), each has implications for the others. For example interactive processing is essential to the operation of the integrated learning algorithm in Part 1, and in Part 3, biophysical details of processing at the neuronal level influence the learning procedure as it is formulated at the algorithmic level. Each part of this project has its own background, aims, and methods. We present the parts separately, returning at the end to the overall operational plan.

Project 8.1: Integration of Supervised and Unsupervised Learning (R. O'Reilly)

Neural network learning mechanisms can be divided into two broad categories, which go by various names: (a) unspervised, self-organizing, or model learning, versus (b) supervised, error- driven, or task learning. As the names suggest, unsupervised learning does not require any supervisory signals to guide learning---the network simply organizes itself in response to the inputs that it receives. The computational objective is to develop an internal model of the environment that will enable the organism to more efficiently represent its structure and statistics. In contrast, supervised learning involves an external signal that drives learning, effectively telling the network what its output should be. This kind of learning is particularly useful for adapting networks to the contingencies of specific tasks. There are a number of good reasons to combine these learning mechanisms into one coherent algorithm (O'Reilly, 2001). In short, these mechanisms provide complementary strengths. Supervised learning provides the ability to learn arbitrary tasks, while unsupervised learning provides a useful tendency to encode general properties of the environment. Without unsupervised learning, supervised learning can become overly narrowly focused on a specific task. Without supervised learning, unsupervised learning has difficulty focusing in on relevant aspects of the environment, and generally cannot learn abitrary mappings.

This project seeks to develop biologically-based neural network learning mechanisms that combine unsupervised and supervised components, in order to take advantage of their complementary strengths. This work will build upon existing work on the Leabra algorithm, which combines both Hebbian and error-correcting learning (O'Reilly, 1996). The Leabra algorithm has been applied in over 40 different implemented models of a wide range of cognitive phenomena (including perception, attention, learning and memory, language, and higher-level cognition), demonstrating its wide applicability, and the Hebbian aspect has been shown to improve its generalization performance (O'Reilly, 2001). But Leabra is in essence a supervised learning algorithm; this leads to two difficulties. First, it requires a distinction to be made between two phases of processing, one when the teacher is present, and one when it is absent; and second, it is essentially undefined for cases in which there is no external teaching signal. We have formulated a successor to Leabra that overcomes these two difficulties, and we are investigating its computational effectiveness and its usefulness in addressing issues at the behavioral and cognitive levels. IBSC Year 2 Progress Report 48

The Leabra algorithm. The Leabra algorithm simply combines the Hebbian and error-driven equations above, using weighting constants to determine the relative contributions of each:

+ + + + - - wij = kh(y j(x i- wij)) + ke(y jx i - y jx i) where kh + ke are learning rate constants, and Hebbian learning takes place in the outcome (+) phase. In the case where kh>0 and ke=0, the algorithm reduces a common Hebbian learning rule, here called simple Hebbian Leabra.

Inhibitory competition. One additional neural mechanism employed in Leabra is inhibitory competition. Inhibitory competition arises when mutual inhibition among a set of units (i.e., as mediated by inhibitory interneurons) prevents all but a subset of them from becoming active at a time. Competition allows only the most strongly excited representations to prevail, with this selection process identifying the most appropriate representations for subsequent processing. Also, the Hebbian and error-driven learning mechanisms are affected by this selection process such that only the selected representations are refined over time, causing differentiation and distribution of representations (Kohonen 1984, Rumelhart & Zipser 1986, Grossberg, 1976). Leabra employs a k-winners-take-all (kWTA) inhibitory function, which allows only the top k (out of n total) neurons to become active at any time. This mechanism can be viewed as approximating the effects of recurrent inhibition in the cortex, as has been demonstrated in side- by-side comparisons (O'Reilly & Munakata, 2000). One characteristic of the inhibitory dynamics is that the feedback inhibition is activated in response to activation within the excitatory units in a layer. This is similar to the approach used in many biological and cognitive models.

Limitations and proposed research. Although the Leabra model overcomes several objections to the use of error-driven learning mechanisms on the grounds of biological plausibility, it apparently requires some kind of external controller to manage expectation and outcome phases. Furthermore, as currently formulated, the algorithm is not well-defined for unsupervised learning problems, and in fact for that case a variant has been used in which the error-correcting feature has simply been switched off by setting ke to 0. We think it is unlikely that the brain distinguishes between learning situations in this way. The successor to Leabra will address these limitations.

The insight leading our research is a reframing of the activation phases in Leabra in purely temporal terms. In this reframing, activation states evolve in time, and the minus phase corresponds to an early phase of the process, while the plus phase corresponds to a later phase. Note how the algorithm can now be applied both to supervised and unsupervised learning. In the supervised case, input is presented to some units, and additional input is subsequently presented to others, contributing to processing in the later phase. In the unsupervised case, the input is all presented at once, but there is still a time-evolution of the activation process. In both cases, late activations may be different from earlier ones, with important consequences for learning. Mathematically, the two pieces of the Leabra learning rule can be integrated into a rule that applies equally to the supervised and unsupervised cases. Algebraic manipulation of the learning rule, replacing minus with e and plus with l produces the Early-vs-late-phase or ELP learning rule: IBSC Year 2 Progress Report 49

l l l l e e wij = kh(y j(x i- wij)) + ke(y jx i - y jx i)

The learning rule combines a Hebbian term reflecting the extent to which sender and receiver are l l co-active in the late phase y jx i with two countervailing terms. The first, Hebbian-derived term, l l kh(y j(x i- wij)) serves to normalize the Hebbian correlation computation as analyzed by Oja e e (1982). The second term y jx i representing the extent of sender and receiver activation in the early phase has been shown to implement back propagation. Thus, the algorithm remains firmly grounded in the computational theory of neural network learning while providing a single integrated framework for both the supervised and the unsupervised case. The work in this project has these aims:

Aim 1: Understand the implications of the ELP algorithm for unsupervised learning and map formation. One major focus will be on the implications of this early-phase term in the context of unsupervised learning. Specifically, we will explore how this term interacts with changes in activation states that occur naturally after an input stimulus is presented, in the context of inhibitory competition in a simple, two-layer net. As noted above, the inhibition is only engaged after excitatory neurons become activated, causing an early peak in activation. Therefore, there will be a subset of neurons that fall into the penumbra of activation for a given input, being activated in the early phase but not in the late phase. These neurons in the penumbra will experience overall weight decreases in proportion to the strength of their early phase activations.

Aim 2: Evaluate the Ca++ concentration model for explaining the early-phase term in the ELP algorithm, and explore factors that influence the boundary between the phases. Both the supervised aspect of Leabra and the ELP unsupervised mechanism depend on the idea that the early-phase activation produces a negative weight change (LTD). We mentioned that this kind of LTD could occur if one assumes that early phase activations are brief enough so that calcium levels only reach the level of concentration that produces LTD, not LTP. We will test this idea by developing explicit models of calcium ion concentration as it develops over the course of activation settling.

Aims 3 and 4. The final two aims of this project are (3) to evaluate the ELP algorithm in supervised learning and in combined supervised/unsupervised learning situations, and (4) apply the ELP algorithm to positive and negative priming effects.

Progress: We made progress on three fronts: a) Aim 2: Evaluating the Ca++ concentration model for explaining ELP/Leabra learning; b) formulated a possible new formulation of the ELP/Leabra learning mechanism that might bridge between the detailed Ca++ model and the more abstract formulations; and c) developed a new idea related to the overall theme of developing biologically-plausible models of supervised/error-driven learning.

On the first topic, Jilk, Cer and O'Reilly have replicated a detailed biophysical model of synaptic plasticity (Shouval, Bear, and Cooper, 2002; SBC) to establish a platform for exploring Ca++ concentration dynamics via NMDA and VDCC receptors, and their subsequent effects on de-/phosphorylating CaMKII (which then drives LTP/LTD). This model captures the LTD-then- IBSC Year 2 Progress Report 50

LTP curve as a function of increasing Ca++, as discussed in the proposal. The original SBC model was designed to explore spike timing dependent plasticity (STDP), but we instead ran it with poisson spike trains and computed average outcomes over many such trains. The first step was to determine to what extent this model produces a computationally effective Hebbian learning rule (which is relatively simple to analyze), and then move on to the multi-phase ELP/Leabra learning rules. We used the average LTP/LTD results to generate numerical lookup- table based synaptic learning rules, and compared them to the performance of the standard CPCP/Oja-style Hebbian learning rule. In general, the biophysically derived learning rules did not perform as well as the analytic rule, but their performance could be improved by adjusting various aspects of the biophysical model, which brought them closer to the CPCA/Oja rule. In particular, increasing the external Mg++ concentration was critical in making the model more sensitive to the activation of the postsynaptic neuron, and using a Ca++->CaMKII function based on the work of Lisman also helped. The work on ELP/Leabra showed that we could get the basic dynamic where transient activation in the early/minus phase produced LTD, while sustained activation across all phases produced LTP. Due to the large space required to define this learning rule (activations of sending and receiving units across both phases, plus the initial weight value), the initial code (written in a hybrid of C and Excel) became unwieldy, and we are currently in the process of porting it to PDP++, where we can have better graphical analysis tools available. These results were presented at a poster at the Computational Neuroscience conference in Alicante, Spain, last summer.

The new formulation of ELP/Leabra, which is awaiting implementation by my student Seth Herd, has the potential to provide a bridge between the detailed Ca++ model and the abstract formulations of ELP/Leabra. The basic idea is simply to treat the Ca++ as the pre-post activity coproduct, and compute the average of this value over a trial of settling. Then, this average can be subject to the standard non-monotonic LTD-then-LTP Ca++ curve (used in the above model) to derive weight changes. The insight, as in ELP, is that transient activation will result in a low average, and thus LTD, while sustained activation will result in a higher average, and LTP.

The final model explored the potential role of dopaminergic modulation of learning in cortical networks based on error feedback (i.e., in the reinforcement learning framework). In existing models of this type (e.g., Mazzoni, Andersen & Jordan, 1991), a scalar reward value related to the performance of the network is used to modulate all of the units, effectively increasing their weights for correct responses, and decreasing them for incorrect ones. In this new model, this initially scalar value actually produces a vector error signal, by effectively backpropagating each unit's reward-modulated activation state through the entire network. This happens naturally in the GeneRec/CHL framework for error-driven learning, simply as a result of activation propagation through bidirectionally connected networks. In simulation tests comparing backpropagated reward modulation with standard static modulation in the context of the family trees problem, the backpropagated version produced significantly better learning compared to the static case. To learn at all (i.e., within 1000 epochs), some percent of trials required a vector output target (i.e., standard backprop-like error-driven learning), which acts to ``seed'' the network with some knowledge of the correct responses. The backpropagated reward modulation can leverage a relatively small percentage of seed trials (e.g., 2%) and result in accelerated learning, whereas the static case learned at the same rate as when no additional learning took place outside of the seeded trials. This work is currently being written up for publication. IBSC Year 2 Progress Report 51

Project 8.2: Computational Models for Investigating the Role of Feedback in Perceptual Inference (M. Lewicki)

The goal of this project is to develop theoretical models for studying the computational role of feedback in visual perception of natural images. This research tests the hypothesis that feedback sub-serves various forms of perceptual inference, allowing the visual system to arrive at correct (or plausible) interpretations of ambiguous sensory data. The underlying theoretical framework in which these models are developed is Bayesian inference. This idea idea has attracted considerable interest, because it could explain fundamental aspects of visual perception (see, for example, Knill and Richards, 1996). Most importantly, it could provide a theory that explains how the lower levels in the visual system uses information fed back from higher levels.

Earlier models that have incorporated Bayesian inference (e.g. Lewicki and Sejnowski, 1997) allowed for hierarchical models of inference using Bayesian belief networks, but these were limited to binary patterns and thus could only provide an abstract model of the inference process. One of the main goals of this project is to develop these ideas in more biologically relevant models of visual function that provide insight in the neural coding and processing of natural images.

The fundamental hypothesis of the model proposed in this project is that feedback aides in computing optimal sensory representations that would otherwise be noisy or ambiguous. This computation is accomplished by using higher-order, contextual information fed back from higher-level areas to disambiguating lower-level sensory representations. It is essential in this model that the higher-level representations provide useful information about the higher-order visual context. Therefore, one of our primary goals for the first phase of this research was to develop computational models that use unsupervised methods to discover representations of higher-order context in natural images.

Current theoretical methods for learning sensory codes can explain how a population of visual neurons encode natural images, but they suffer from two basic limitations. First, the transformation from the image to the representation is linear, essentially a set of filters, so only a limited class of computations can be achieved. Second, these models assume that the goal is to learn a set of statistically independent visual features (or filter outputs) which provides no means to capture higher-order statistical regularities.

Our recent results (Karkin and Lewicki, 2003a, 2003b, 2004) have extended unsupervised methods for learning optimal codes for natural images and overcome both of these limitations. The key issue is the model represents and learns higher-order visual structure. In this model, higher-order structure is modeled by learning an efficient code for the variance structure of the filter outputs of the first-layer filters. Given a representation of natural images in terms of oriented Gabor filters, a salient statistical regularity is the covariation of the filter outputs in different visual contexts. Many classes of images will have the property that they tend to activate some filters and not others. For example, visual patterns like wood grain will tend to activate filters aligned to the grain. Although it is not possible to predict the exact values, the variation of the filters aligned with the grain will be larger than those orthogonal to it. The IBSC Year 2 Progress Report 52 statistical regularities among the filter output variances will depend on the particular class of images. In the model this structure is described by learning an efficient higher order code. An interesting consequence of this hierarchical model is that it reduces to standard efficient coding models when the higher-level layers are inactive. When the higher layers are active, this describes how the distribution of images has changed from the default, and this represents the local image context. The higher-order structure in the model is an efficient, distributed code for changes in the statistical distribution of natural images for particular image in view.

By creating a theoretical framework for the unsupervised learning of higher-order structure, we are now much closer to our goal of developing a theoretical models for the role of feedback. The next step in this project is explore ways in which the higher-order representations can be used to improve the optimality of lower-level representations. This will involve developing methods for modeling noise and uncertainty. We are actively pursuing these issues. This research has also opened up a number of interesting avenues for future research that we are currently exploring. These include: theoretical models for sensory gain control and models that learn efficient sensory codes based on model neurons with limited coding capacity. All of these pursuits are aimed at developing more accurate models of sensory processing and early vision.

Project 8.3: Biophysical Properties of Neurons and Their Implications (C. Chow)

A central tenet of neuroscience is that human behavior and cognition arises from the complex biophysical dynamics of neurons and their synaptic connections. However, it is computationally infeasible to attempt to model high level cognitive processes using detailed realistic models of neural cells. As a result, connectionist neural network models that capture the massively parallel nature of the brain using simplified computational units have been developed. These models, which are closely related to what mathematicians call rate models, have been successful in explaining certain aspects of cognition and behavior although much remains unanswered. This section aims to address questions regarding the level of biophysical detail that must be included in any model to capture the salient aspects of behavior. Must biophysically realistic spiking models be used, or can rate models be adapted to capture the implications of these models for understanding behavior? We address Center Aim 1 (emergent mechanistic accounts) and Center Aim 3 (modeling with neural constraints) in the framework of three questions: (1) How can biophysically based models help to elucidate the relationship between neurophysiology and behavior? (2) What are the implications of adding biophysical plausibility to computational models of cognitive function? (3) How can relevant biophysical details be incorporated into reduced rate models?

Our approach is to begin with the neurophysiological constraints as specified in Center Aim 3 and examine which emergent aspects as probed in Center Aim 1 arise ``naturally'' out of the underlying dynamics. In this way, it may be possible to tease out which aspects of cognition and behavior are intimately tied with the neural implementation. The project will incorporate Center Aim 2 by examining which phenomena require highly specified synaptic weights that are acquired by learning and which require very little fine tuning.

Progress: In the past year, the work has focused on understanding the dynamics of visual responses of neurons in higher visual areas such as V4 and inferotemporal cortex (IT) IBSC Year 2 Progress Report 53

(Modakarimov, Chow, and Olson, 2003). In particular, we have been interested in reconciling three experimental observations of neuronal behavior when two images are presented in various configurations. The first phenomenon is binocular rivalry where a different image is presented to each eye. Here the activity of neurons throughout the visual pathway but in particular V4 and IT exhibit stochastic oscillations near 1Hz that are correlated with a given percept. The second observation is a 5Hz oscillation in IT neuronal activity when a preferred foveal stimulus (the 'object') is presented against the backdrop of an already present peripheral non-preferred stimulus (the 'flanker'). The third experimental observation is a `normalization' of neural activity when an object and flanker are presented simultaneously. The resulting activity lies in between the activity of the object and flanker presented alone.

We considered a cortical circuit model consisting of two pools of excitatory neurons where one pool represents one image (object) and the other represents the other (flanker). The strength of the input to each image depends on the contrast of the given image. The two pools mutually inhibit each other. To respect Dale's law and anatomy this is done through `long range' projection to inhibitory interneurons located in the other pool. The neurons exhibit spike frequency adaptation and the synapses exhibit synaptic depression. We have considered two versions of this cortical circuit. One version consists of biophysically plausible conductance- based Hodgkin-Huxley neurons with parameters taken from experimental data. The other version is a neural network rate model that is a reduction of the biophysical model.

Depending on the parameter values we show that the three dynamical phenomena observed in experiments are generic behaviors of the cortical circuit model we consider. In particular, we find that the effective inhibition between pools is the major determinant of the ensuing activity. When the mutual inhibition is relatively weak, we see normalization when object and flanker are presented. When the inhibition is slightly stronger we see 5Hz oscillations and when inhibition is stronger still we see slower 1Hz oscillations. For extremely strong inhibition we see a winner- take-all behavior where one pool is active and the other is silent. We show mathematically how and why these transitions will occur. We hypothesize that different levels of mutual inhibition will exist between various pools throughout the visual cortex and the what is observed in experiments will depend importantly on which images are presented and which neurons are probed.

The next step in the project is to attempt to obtain some experimental validation. An implication of the model is that the different behaviors observed are not a property of the pools representing images but arise from the competition between various pools that are connected to each other with varying degrees of effective inhibition. Hence, depending on the stimuli, a given neuron should be able to represent any of the observed behaviors. We would like to try to design a set of images that can be continuously modified so that the dynamics switch from 5Hz oscillations to normalization and back again. The theory group plans to collaborate with Carl Olson to design this experiment to test our hypothesis.

New Directions: A new project in collaboration with Moldakarimov and McClelland was recently started to look at using learned self-extinguishing attractors for timing and identification of transient sensory signals. Work by Recanzone, Merzenich, and colleagues suggests that experience in a task requiring precise timing of the interval between transient stimuli leads to IBSC Year 2 Progress Report 54 coherent and short-lived neural responses. The responses become faster and higher after training. Merzenich has further suggested that if experience with speech leads to similar increases in ``crispness'' of neural response it could facilitate the identification of speech sounds, since these often involve transients that occur in rapid succession. We are currently building a model designed to explore the possible neural processing and learning rules that might underlie these effects. Our model is based on a network of firing rate units of two types: excitatory and inhibitory. We assume long range excitatory connections and short range inhibition. We connect units via modified synapses among excitatory units, and from excitatory to inhibitory units. Synapses change according to a Hebbian-like learning rule. Connections from inhibitory to excitatory units are fixed. We have shown that applying transient signals to excitatory units over many trials can train the network to respond to the signal faster and with higher amplitude. One goal of the project is to build a neural mechanism to explain the Perceptual Magnet Effect (PME) in speech perception. PME is a distortion of perceptual space where two sounds seems closer if they belong to one category compared to another pair of sounds from different categories even though phonetic "distances" between the sounds in pairs are the same.

Supported Papers, Presentations and Submissions

Huber, D. E., & O'Reilly, R. C. (2003). Persistence and accommodation in short-term priming and other perceptual paradigms: Temporal segregation through synaptic depression. Cognitive Science, 27, 403-430.

Jilk, D. J., Cer, D., & O'Reilly, R. C. (2003). Effectiveness of Neural Network Learning Rules Generated by a Biophysical Model of Synaptic Plasticity. Poster presented at the Computational Neuroscience Conference, Alicante, Spain.

Karklin, Y., & Lewicki, M. S. (2003). A model for learning variance components of natural images. In Advances in Neural Information Processing Systems, volume 15. MIT Press.

Karklin, Y., & Lewicki, M. S. (2003) Learning higher-order structures in natural images. Network: Computation in Neural Systems 14 (3): 483-499.

Karklin, Y., & Lewicki, M. S. (2004) Modeling non-stationary distributions and higher-order structure with density components. Submitted to Neural Computation.

Moldakarimov, S., Rollenhagen, J.E., Olson, C.R., & Chow, C.C . (2003). A model of low frequency oscillatory visual responses in macaque inferotemporal cortex. Program No. 701.20. Washington, DC: Society for Neuroscience, 2003.*

*Also mentioned under Project 5. IBSC Year 2 Progress Report 55

IBSC Core: Integration, Computation, Training, Outreach, and Administration

Investigator: J. McClelland

The functions of the core are to foster integration across projects, to provide the necessary computational resources, to coordinate training and outreach activities in relation to the project, and to cover basic administrative functions necessary for all of the above.

Integration. Integration occurs through annual Investigator Meetings in Pittsburgh, held in May or June of each year; through two research meetings held in Pittsburgh, one on topics related to language and semantic cognition and one on topics related to biologically informed models. A conference room at the Center for the Neural Basis of Cognition in Pittsburgh has been furnished with the necessary electronics to allow for remote participation, using telephone conference call technology for voice and web-based remote desktop software to allow a presenter anywhere (say, Randy O'Reilly, in Boulder, Colorado) to control a screen display accessible to participants in the conference room in Pittsburgh as well as in remote locations including Princeton, Washington, and London. This has turned out to be an effective way to maintain communication among the participants.

Additionally, IBSC Project Director J. McClelland has undertaken visits to many of the participating laboratories, including the laboratories of MacDonald and Seidenberg at the University of Wisconsin, Jonathan Cohen at Princeton, and of Cathy Price, Karalyn Patterson and Matt Lambon Ralph in England. Other cross- and within-project visits to foster collaboration occur routinely. Some of these are coordinated with the annual investigators meeting in Pittsburgh.

Computational Resources. The IBSC grant has provided funds for the purchase of a suite of 5 dual Pentium 4 Xeon processors plus one quad Pentium 4 Xeon. These are in continual use by participants in this project and other researchers at the Center for the Neural Basis of Cognition whose work relates to the goals of the IBSC grant. Funds in the Year 3 budget are available to upgrade these facilities and/or address focal needs of project participants for computing at their home institutions, although we find that remote use is fine for modeling work.

In addition to the hardware resources, the IBSC grant employs a programmer/software support expert who supports users in implementation and deployment of simulation software and who has responsibility for the management of the IBSC's computational resources. She also serves as the technician who manages the hardware and software for the remote participation of IBSC participants in research discussions, as discussed above. Finally, the Core is providing partial support for the further development of the PDP++ simulator, under the Supervision of Randy O'Reilly, who has hired a software engineer working under his supervision for this purpose.

Outreach Activities. A plan articulated in the IBSC proposal was to engage IBSC participants and others in symposia and workshops related to the goals of the IBSC project, with the inclusion of other participants. Four such activities have occurred or will occur during the current grant year: IBSC Year 2 Progress Report 56

1. A symposium at the 2003 meeting of the Society for Neuroscience, chaired and organized by J. McClelland, on the subject of The role of the temporal neocortex in learning and memory. Participants included IBSC researchers J. McClelland, K. Patterson, as well as other participants A. Martin and Y. Miyashita.

2. A workshop on dynamics of decision making preceding the March 2004 CoSyne Meeting at Cold Spring Harbor which was organized by others, and included presentations by IBSC participants J. McClelland and Jonathan Cohen.

3. A symposium at the March 2004 Meeting of the Society for Experimental Psychology (UK) in honor of IBSC Karalyn Patterson, who delivered the EPS's Bartlett lecture at the meeting. The symposium, which coordinated with Patterson's lecture, addressed topics in semantic cognition, including development and disorders thereof. It was organized by IBSC participants M. Lambon Ralph and T. Rogers, with presentations by IBSC participants McClelland, Lambon Ralph, and Rogers as well as other participants F. Vargha-Khadem, Richard Wise, and John Hodges.

4. A symposium to occur in May, 2004, at the American Psychological Society Meeting in Chicago, Illinois to be coordinated with J. McClelland's William James Prize Lecture on Learning, Memory, and Cognitive Development in Parallel Distributed Processing Systems. The symposium is organized by IBSC researcher Karalyn Patterson and includes IBSC participants M. Lambon Ralph and T. Rogers as well as non-IBSC participant A. Cleeremans.

Training: The IBSC proposal laid out a plan to foster cross-disciplinary integration by training 1-2 IBSC Integrative Research Training Fellows who would bridge between laboratories employing computational and experimental approaches at the neural and behavioral levels. One Post-Doctoral Researcher, Steven Gotts, has been supported for two years and will be supported for one more year. A report of Gotts’ progress follows. In the absence of other candidates satisfying the conditions of the Fellowship, the funds for the other slot have been used to fund cross-disciplinary graduate research training. In the current grant year a student in Carnegie Mellon's Computational and Statistical Learning Ph. D. program has been funded to foster integration of Machine Learning methods for analysis of functional brain imaging data under the supervision of leading Machine Learning researcher Thomas Mitchell. This work has only been under way for a very short time and will be described in a future progress report. IBSC Year 2 Progress Report 57

IBSC Integrative Research Training Fellowship Project: Neural Mechanisms Underlying Positive and Negative Repetition Priming

Fellowship Holder: Stephen J. Gotts

Overview

Human subjects often exhibit long-lasting behavioral facilitation for stimuli they have previously identified and behavioral slowing for stimuli they have previously ignored - phenomena referred to as positive and negative repetition priming, respectively. The main goal of this training project is to develop a computationally explicit theory of the neural mechanisms underlying repetition priming and how these mechanisms interact with attentional selection requirements. The basic hypothesis is that stimulus processing involves cooperative and competitive interactions among large groups of neurons located throughout a variety of cortical regions. Repetition priming effects are argued to reflect the operation of long-lasting activity-dependent neural plasticity mechanisms within these regions. Each time a stimulus is processed, small changes are made to the synaptic strengths that mediate processing, shaping long-term neural representations and altering behavioral performance.

Specific Aim 1: Develop neural network account of positive and negative repetition priming in visual selection

This specific aim proposed both behavioral experiments with humans and simulation work in order to explore the notion that neural representations of two simultaneously presented visual stimuli compete automatically with each other, a competition that is biased and eventually resolved through the top-down representation of a target cue (Desimone & Duncan, 1995). The argument was made that when this "biased competition" view of visual selective attention is combined in a neural network with a model of synaptic plasticity that is supported by neuroscience studies of LTP/LTD (Bienenstock, Cooper, & Munro, 1982, or BCM), it will be possible to account for the range of behavioral effects associated with positive and negative priming in both normal subjects and brain-injured patients. The idea is that top-down support from pre-frontal and/or parietal cortices for the target stimulus leads to higher firing rates in the neurons representing the target in early and ventral visual areas. These neurons then indirectly suppress firing rates in neurons representing the distractor stimulus through local inhibitory connections. This attentional modulation of neural activity then interacts with learning such that neurons representing the target are more likely to undergo LTP, producing subsequent behavioral facilitation (positive priming), and neurons representing the distractor are more likely to undergo LTD, producing later behavioral slowing (negative priming). Three experiments were proposed: Exp 1.1) acquire empirical data in a visual selection task with words to replicate basic effects that have been observed previously and to provide a dataset for use in detailed model fitting; Exp. 1.2) develop a neural network model of visual selection that combines mechanisms of biased competition and the BCM rule in order to account for results from our human experiments; Exp 1.3) extend the neural network model to account for diminished or reversed negative priming observed in brain injured patients by reducing the strength of the top-down activity bias, leading to higher firing-rates in neurons representing the distractor and weaker LTD or even LTP. IBSC Year 2 Progress Report 58

In brief, we have made a good deal of progress in Experiment 1.1 and some progress in Experiments 1.2 and 1.3. Thus far, we have dedicated less time to the actual development of the neural network model and have focused more on the human experiments for a couple of reasons. The first is that it proved more difficult than we anticipated to get clean positive and negative priming results in Experiment 1.1 which were to be used in the development of the model. The second is that we came up with ideas about how to test the involvement of BCM-style learning more directly in psychophysical experiments (preliminary results discussed below), and this seemed important to pursue. Nevertheless, we have trained early versions of the model to perform visual selection with two simultaneously presented words in which selection can be done by location, color, or brightness (to correspond with different versions of Experiment 1.1 that we have piloted). We have not yet included the BCM rule in these versions to account for across-trial learning effects but have focused instead on known within-trial neural and behavioral effects such as reaction time slowing when a distractor is present versus absent and graded effects of brightness/luminance on neural activity and reaction time. In the coming year, emphasis will shift back more to developing the simulation work. We have begun exploring the BCM rule in small-scale versions of the model to account for the behavioral effects we have observed thus far. The work proposed in Experiment 1.3 in accounting for altered patterns of negative priming following brain injury really just involves some additional probing of the impact of reducing top down inputs which will be done in tandem throughout the development of the model; this is not expected to add much time. We have already explored the impact of reducing the top down inputs in the preliminary modeling work, and competitive interactions and reaction times are extended as expected.

There is an important addition to this specific aim involving an investigation of the neural basis of visual selective attention using neurophysiological recordings in rhesus macaque monkeys. I have been working in Bob Desimone's lab at the NIMH in Bethesda, MD, for a little over a year now learning to train animals in a selective attention task and to record neural activity from cortical areas V4 and two different prefrontal regions (the frontal eye fields, or FEF, and DLPFC in area 46). The possibility of me spending time in a neurophysiology laboratory was mentioned in the original proposal under the section on Training (p. 11). I have been coordinating the human behavioral work in Pittsburgh with a half-time research assistant who does the actual running of subjects, and I travel to Pittsburgh once or twice a month for meetings with the RA and David Plaut to discuss progress. This arrangement has been quite productive thus far, and I expect it to continue over the course of the coming year while I collect neural recording data in Bethesda. The focus of the project in Dr. Desimone's lab is to record simultaneously from area V4 (in which large attentional effects are found) and prefrontal regions during the performing of a selective attention task. This project (in collaboration with Leslie Ungerleider and Narcisse Bichot at NIMH, and Andrew Rossi at Vanderbilt University) should allow us to test details of the biased competition view of visual attention (discussed above) by examining the relative time course of prefrontal and ventral visual neural activity in cells that either represent information about the visual stimuli or information about the selection cue. Briefly, the behavioral task we are using is as follows. The monkey is trained to fixate centrally on a color cue (either red, green, or blue). As soon as the monkey initiates fixation, 3 drifting grating stimuli (one red, one green, and one blue) are presented parafoveally. The monkey is trained to release a bar when the stimulus matching the central color cue changes color, ignoring any color changes in the non- IBSC Year 2 Progress Report 59 matching distractors. A current focus in the Desimone lab has been the study of neural synchrony as a correlate of attentional processing, in addition to changes in average firing rate, and we will be measuring both phenomena simultaneously at the different recording sites. This study should complement the human behavioral experiments nicely, and it will undoubtedly inform the building of the models in Experiments 1.2 and 1.3. Behavioral training is now complete for the first monkey, and we are beginning to record from cells. Training is also underway in a second monkey. Given that overview of the work in this specific aim, let me now discuss preliminary results from Experiment 1.1 (on which a good deal of the effort has been spent over the past year). It proved more difficult than we had anticipated to replicate behavioral effects of negative priming which led us to shift our efforts more toward developing the human experiments, as mentioned above. Several pilot versions of Experiment 1.1 involved a large stimulus set of words with each word occurring exactly once in each condition for each subject. Discussions with a couple of researchers who have run negative priming experiments emphasized the need for the distractors in stimulus displays to be difficult to ignore, and we shifted away from using color as a selection cue (which we think was too easy for subjects, and hence produced no negative priming) and employed a task in which subjects had to make a semantic decision (animate/inanimate) about the dimmer of two stimuli (e.g. words such as "lion" and "bench"), and we manipulated the relative luminance of the target and distractor words (4 different levels of luminance using shades of grey against a black background). Results with this version were suggestive, but effects still failed to reach significance when we employed a large stimulus set. We have now had success in two complimentary versions of the experiment with a smaller stimulus set (one in which subjects make an animacy decision about the dimmer of two words, and one in which they make the decision about the brighter of two words). This is broadly consistent with reports by Strayer and colleagues demonstrating that it is easiest to observe negative priming when a small set of stimuli is used repeatedly with each subject.

In the experiment in which subjects make a decision about the brighter of two words, we are exploring a possible interaction of priming and the target/distractor luminance that may directly test involvement of BCM-like neural learning. Enhancing the luminance or visual contrast of stimuli is known to increase neural activity in visual cortical cells, and the biased competition theory of visual selective attention predicts that the relative luminance of target and distractor stimuli will influence performance in identifying the target. The relative luminance of target and distractor should also influence the direction and magnitude of priming effects to the extent that neural activity levels interact with neural plasticity. Our preliminary evidence is consistent with both of these expectations. Increasing the luminance of a distractor word while holding the luminance of the target word constant slows reaction times significantly [see Figure 1 at end of section; distractor luminance ranges from complete absence, 0, up to 3; target luminance is held fixed at 4].

Preliminary evidence also indicates that priming may interact with target/distractor luminance in a manner consistent with BCM-style learning. Figure 2A shows patterns of negative (on the left) and positive priming (on the right).

The behavioral conditions have been arranged from left to right in order of predicted increasing neural activity during the prime stimulus (priming is measured on the immediately subsequent IBSC Year 2 Progress Report 60 probe trial). Hence, the negative priming comparisons (when the prime distractor was ignored and then later identified) are on the left in order of increasing distractor luminance (-1, -2, -3; the minus sign indicates negative priming comparisons). The positive priming comparisons follow on the right in order of decreasing distractor luminance (which should increase activity in neurons representing the target stimulus): +3, +2, +1. Results here are shown for conditions in which the target luminance category on the prime trial was 4. Note the similarity of the priming pattern to the shape of the phi function in the BCM theory, as well as to empirical results on LTP/LTD from Dudek and Bear (1992) (see Figure 2B). While positive priming and negative priming effects are currently significant when collapsed across distractor luminance, the interactions are not yet significant (N=50). It is not surprising that showing more detailed interactions is more difficult, and it may require a large number of subjects. We are currently running additional subjects to establish statistical significance. Nevertheless, the preliminary results are very suggestive. In the coming year, we plan on examining this interaction in more detail. Future versions of the experiment will reintroduce color so that we can allow the distractor to be either brighter or dimmer than the target within the same experiment. We will also examine if the relative word frequency of target and distractor can have a behavioral impact similar to the luminance manipulation.

Specific Aim 2: Develop spiking-neuron account of the relationship between practice- related reductions in neural activity, enhanced neural synchronization, and positive repetition priming.

This specific aim proposed two simulation experiments that were aimed at addressing short-term and long-term positive repetition priming effects and their associated decreases in neural activity referred to as "repetition suppression": Exp 2.1 was to incorporate the short-term plasticity mechanism of synaptic depression into an artificial network of excitatory and inhibitory spiking neurons and show that temporary reductions in neural activity were associated with enhanced neural synchrony; Exp 2.2 was to evaluate the ability of spike-timing-dependent synaptic plasticity in the same network to enhance synchronization in the long-term. Experiment 2.1 is more or less finished, with a final version of simulations running now that use cells with more complete Hodgkin-Huxley style current-voltage equations rather than simple integrate-and-fire cells (since the inhibitory cells behave more appropriately with these equations). The abstract of the manuscript that I am currently preparing (with Carson Chow) is included in an Appendix. Two basic effects help to explain why synchrony is enhanced by synaptic depression. The first is that firing-rate reductions by themselves lead to enhanced synchrony when the dynamical interactions of a network are dominated by excitation (excitatory networks synchronize at lower rates and de-synchronize at higher rates). The second is that because synaptic depression is much stronger at excitatory than at inhibitory synapses and inhibitory neurons are naturally more synchronous due to rapid electrical coupling through gap junctions (that tend to enforce phase locking), the buildup of synaptic depression can shift the network to be dominated more by the synchronous dynamics of inhibition following stimulus repetition. Work has not begun yet on Experiment 2.2, and the addition of work in Specific Aim 1 (neurophysiological experiment with monkeys in the Desimone lab) and the focus in the coming year on Experiments 1.2. and 1.3 will likely require postponement of this aspect of the work until a later time. IBSC Year 2 Progress Report 61

Two other papers I am submitting are related to the proposed work in this specific aim, although they serve more directly to follow up on prior work. The first involves using synaptic depression and its interaction with neuromodulation to address habituation-like effects rather than priming effects in a group of brain-damaged patients with acquired semantic impairments. Work in my thesis showed that simulations like those in Experiment 2.1 can also show effects opposite to priming if neuromodulatory levels are too low (i.e. enhanced synchrony in the model requires a certain range of neuromodulatory concentrations). A connectionist model that I developed with David Plaut of performance in these patients (Gotts & Plaut, 2002, in CABN) was used to make novel behavioral predictions that we evaluated in follow up experiments with one such patient J.H. (in collaboration with Lisa Cipolotti). The abstract of the paper we are submitting is included in the Appendix. The second paper (also with Plaut) is a review within the context of connectionist models of the relationship between learning mechanisms thought to generate repetition priming and perseverative naming errors in aphasia. The review suggests that neuromodulatory deficits in aphasia can cause abnormal priming, leading to a high number of perseverations in tasks such as picture naming (tying together a model by Plaut and Shallice, 1993, in JOCN, and empirical work I conducted previously with Lisa Cipolotti: Gotts, Incisa della Rocchetta, & Cipolotti, 2002, in Neuropsychologia).

Supported Papers Submitted for Publication

Gotts, S. J., & Chow, C. C. (submitted). Neural activity decreases and behavioral priming through synaptic depression and synchrony

Gotts, S. J., & Plaut, D. C. (Submitted). Connectionist approaches to understanding aphasic perseveration.

Gotts, S. J., Plaut, D. C., & Cipolotti, L. (Submitted). Mechanisms underlying "access/refractory" semantic impairments: Testing predictions of a connectionist model IBSC Year 2 Progress Report 62

Figure 1

Figure 2

Figure 3 IBSC Year 2 Progress Report 63

Combined List of Supported Presentations, Publications, and Papers in Progress

Bird, C., Cipolotti, L., Kwok, K., & McClelland, J. L. (in preparation). Learning meaningful associations in amnesia: Experimental investigations of items learned under generate-study conditions.

Braber, N., Patterson, K., Ellis, K., & Lambon Ralph, M. A. (under review). The relationship between phonological and morphological deficits in Broca's aphasia: Further evidence from errors in verb inflection.

Braver, T. S., Reynolds, J. R., & Donaldson, D. I. (2003). Transient and sustained cognitive control during task switching. Neuron, 39, 713-26.

Braver, T. S., DePisapia, N., & Slomski, J. A. (in preparation). Goal-subgoal coordination in anterior PFC: Hemispheric dissociations.

Coccia, M., Bartolini, M., Luzzi, S., Provinciali, L., & Lambon Ralph, M. A. (in press). Semantic memory is an amodal, dynamic system: Evidence from the interaction of naming and object use in semantic dementia. Cognitive Neuropsychology.

Fushimi, T., Komori, K., Ikeda, M., Patterson, K., Ijuin, M., & Tanabe, H. (2003). Surface dyslexia in a Japanese patient with semantic dementia: Evidence for similarity-based orthography-to- phonology translation., Neuropsychologia., 41, 1644-1658.

Fushimi, T., Patterson, K., Ijuin, M., Sakuma, N., Kureta, Y., Tanaka, M., Kondo, T., Amano, S., & Tatsumi, I. (in preparation). Inflecting Japanese Verbs: Two separate mechanisms or one graded system?

Gotts, S. J., & Chow, C. C. (submitted). Neural activity decreases and behavioral priming through synaptic depression and synchrony

Gotts, S. J., & Plaut, D. C. (Submitted). Connectionist approaches to understanding aphasic perseveration.

Gotts, S. J., Plaut, D. C., & Cipolotti, L. (Submitted). Mechanisms underlying "access/refractory" semantic impairments: Testing predictions of a connectionist model

Graham, N. L., Patterson, K., & Hodges, J. R. (2004). When more yields less: speaking and writing deficits in nonfluent progressive aphasia. Neurocase, in press.

Holroyd, C. B., Nieuwenhuis, S., Yeung, N., Nystrom, L., Mars, R. B., Coles, M. G. H., & Cohen, J. D. (in press). Dorsal anterior cingulate cortex shows fMRI response to internal and external error signals. Nature Neuroscience. IBSC Year 2 Progress Report 64

Holroyd, C. B., Yeung, N., Coles, M. G. H., & Cohen, J. D. (submitted). A mechanism for error detection in speeded response time tasks.

Huber, D. E., & O'Reilly, R. C. (2003). Persistence and accommodation in short-term priming and other perceptual paradigms: Temporal segregation through synaptic depression. Cognitive Science, 27, 403-430.

Jefferies, E., Bateman, D., & Lambon Ralph, M. A. (submitted). The role of the temporal lobe semantic system in number knowledge: Evidence from late-stage semantic dementia. Neuropsychologia.

Jefferies, E., Frankish, C. R., & Lambon Ralph, M. A.. (submitted). Lexical and semantic binding in short-term memory: Evidence from normal recall and semantic dementia. Journal of Memory and Language.

Jefferies, E., Jones, R., Bateman, D., & Lambon Ralph, M. A. (in press). A semantic contribution to nonword recall? Evidence for intact phonological processes in semantic dementia. Cognitive Neuropsychology.

Jefferies, E., Jones, R., Bateman, D., & Lambon Ralph, M. A. (in press). When does word meaning affect immediate serial recall in semantic dementia? Cognitive and Affective Behavioural Neuroscience.

Jefferies, E., Patterson, K., Jones, R. W., Bateman, D., & Lambon Ralph, M. A. (2004). A category-specific advantage for numbers in verbal short-term memory: Evidence from semantic dementia. Neuropsychologia, 42, 639-660.

Jilk, D. J., Cer, D., & O'Reilly, R. C. (2003). Effectiveness of neural network learning rules generated by a biophysical model of synaptic plasticity. Poster presented at the Computational Neuroscience Conference, Alicante, Spain.

Karklin, Y., & Lewicki, M. S. (2003). A model for learning variance components of natural images. In Advances in Neural Information Processing Systems, volume 15. MIT Press.

Karklin, Y., & Lewicki, M. S. (2003) Learning higher-order structures in natural images. Network: Computation in Neural Systems 14 (3): 483-499.

Karklin, Y., & Lewicki, M. S. (2004) Modeling non-stationary distributions and higher-order structure with density components. Submitted to Neural Computation.

Kelly, R., & Lee, T.S. (2003) Decoding V1 neuronal activity using particle filtering with volterra kernels. Advances in Neural Information Processing Systems, MIT Press. In Press.

Kwok, K., & McClelland, J. L. (in preparation). A computational account of the roles of hippocampus and neocortex in the acquisition of semantically related associations. IBSC Year 2 Progress Report 65