ARTICLE Communicated by Jeffrey Elman

Object Recognition and Sensitive Periods: A Computational Analysis of Visual Imprinting

Randall C. OReilly Mark H. Johnson Department of Psychology, Carnegie Mellon University, Pittsburgh, PA 25213 USA

Using neural and behavioral constraints from a relatively simple bio- logical visual system, we evaluate the mechanism and behavioral im- plications of a model of invariant object recognition. Evidence from a variety of methods suggests that a localized portion of the domes- tic chick brain, the intermediate and medial hyperstriatum ventrale (IMHV), is critical for object recognition. We have developed a neural network model of translation-invariant object recognition that incor- porates features of the neural circuitry of IMHV, and exhibits behav- ior qualitatively similar to a range of findings in the filial imprinting paradigm. We derive several counter-intuitive behavioral predictions that depend critically upon the biologically derived features of the model. In particular, we propose that the recurrent excitatory and lat- eral inhibitory circuitry in the model, and observed in IMHV, produces hysteresis on the activation state of the units in the model and the prin- cipal excitatory in IMHV. Hysteresis, when combined with a simple Hebbian covariance learning mechanism, has been shown in this and earlier work (Foldidk 1991; O’Reilly and McClelland 1992) to produce translation-invariant visual representations. The hysteresis and learning rule are responsible for a sensitive period phenomenon in the network, and for a series of novel temporal blending phenom- ena. These effects are empirically testable. Further, physiological and anatomical features of mammalian visual cortex support a hysteresis- based mechanism, arguing for the generality of the algorithm.

1 Introduction

General approaches to the computational problem of spatially invariant object recognition have come and gone over the years, but the problem remains. In both the symbolic and neural network paradigms the re- search emphasis has shifted from underlying principles to special case performance on real world tasks such as handwritten digit recognition or

Neural Computation 6,357-389 (1994) @ 1994 Massachusetts Institute of Technology

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 358 Randall C. OReilly and Mark H. Johnson

assembly-line part recognition. We have adopted a different approach, which is to link the computational principles and behavior of a model to those of a biological object recognition system, wherein the research em- phasis is on the ability to use empirical data to constrain and shape the model. The biologically based approach is limited by the level of com- putational understanding about how specific neural properties produce specific behavioral phenomena. However, if it were possible to establish a mapping between a biological system and a model at both the behavioral and neural levels, then the relationship between neural computation and behavior could be understood in the simplified framework of the model. If the model makes unique and testable behavioral predictions based on its identified neural properties, then experimental research can be used to test the computational theory of object recognition embodied in the model. We have developed a model of invariant visual object recognition based on the environmental regularity of object existence: objects, though they might exhibit motion relative to the observer, have a tendency to persist in the environment. This regularity can be capitalized upon by introducing a corresponding persistence or hysteresis in the activation states of neurons responsible for object recognition. When combined with a Hebbian learning rule, these neurons become associated with the many different images of a given object, resulting in an invariant repre- sentation. This algorithm (Foldidk 1991; OReilly and McClelland 19921, as described in OReilly and McClelland (19921, relies on specific neu- ral properties that lead to hysteresis. Instead of attempting to test for evidence of the algorithm in the complex mammalian nervous system, we have taken the approach of studying the relatively well known and simpler vertebrate object recognition system of the domestic chick. Visual object recognition in the chick has been studied behaviorally for nearly 50 years under the guise of filial imprinting, which is the process whereby young precocial' birds learn to recognize the first conspicuous object that they see after hatching. The original work of Lorenz (1935, 1937) on imprinting has given rise to half a century of active research on this process by ethologists and psychologists, and more recently by neuroscientists interested in the neural basis of imprinting. Recently, the area of the chick brain that subserves this imprinting process has been identified, and some of its neurobiological properties studied. Therefore, it is now possible to assess a model of imprinting in the chick both with regard to its fidelity to these properties and the behavioral effects they produce. With a variety of neuroanatomical, neurophysiological, and biochem- ical techniques, Horn, Bateson and their collaborators have established that a particular region of the chick forebrain, referred to as the interme- diate and medial part of the hyperstriatum ventrale (IMHV), is essential

'That is, young which are capable of fending for themselves from birth.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 359

for imprinting (see Horn 1985; Horn and Johnson 1989; Johnson 1991 for reviews). This region receives input from the main visual projection areas of the chick, and may be analogous to mammalian association cortex on both embryological and functional grounds (Horn 1985). From a Golgi study of the histology and connectivity of the IMHV by Tombol et al. (19881, we were able to determine that the principal anatomical features necessary for hysteresis of the principal excitatory cells of this region as proposed by OReilly and McClelland (1992) are present. These features, recurrent excitatory connections and lateral inhibition, constitute an es- sential component of our object recognition model. In addition, evidence consistent with a Hebbian learning rule operating in this region has been found in morphometric studies of synaptic modification and changes in the density of postsynaptic NMDA receptors (e.g., Horn et al. 1985; Mc- Cabe and Horn 1988). Thus, both the hysteresis and the Hebbian learning properties of our model could be present in the region of the chick brain thought to be responsible for object recognition. Behavioral data are valuable to the extent that they can be accounted for by only a subset of possible models. Thus, the finding that chicks learn something about an object to which they have been exposed for a period of time is not a particularly strong constraint. However, there are several findings from the imprinting literature that appear to require more specialized mechanisms. Perhaps the best known of these is the critical or sensitive period for imprinting. A sensitive period means that a strong preference for a given object can be established only during a specific period of life, and that the animal is relatively unaffected by sub- sequent exposure to different objects. This kind of behavior is not typical of neural network models, which typically exhibit strong (even “catas- trophic”) interference effects from subsequent learning. Further, Lorenz’s original theory has been revised to reflect the fact that the termination of the sensitive period is experience-driven (Sluckin and Salzen 1961; Bate- son 19661, so that one cannot simply posit a hard-wired maturational process responsible for terminating the sensitivity of the network. Our model shows how a self-terminating sensitive period is a consequence of a system incorporating hysteresis and a covariance Hebbian learning rule. Another constraining behavioral phenomenon derives from research on temporal blending. It has been shown that chicks will blend two ob- jects if they appear in close temporal proximity to each other (Chantrey 1974; Stewart et al. 1977). This temporal dependency alone is consistent with the idea that the object recognition system uses hysteresis to develop invariant representations of objects. However, our model demonstrates a paradoxical interaction between stimulus similarity and temporal blend- ing, such that more similar stimuli experience relatively less blending. This phenomenon can be directly related to the same hysteresis and co- variance Hebbian learning rule properties as the sensitive period phe- nomenon.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 360 Randall C. OReilly and Mark H. Johnson

In order to claim that our model of object recognition is applicable to brains other than that of the domestic chick, we would need to find ev- idence of hysteresis and Hebbian associative learning in structures such as the mammalian visual cortex. Several lines of evidence are present. First, and most relevant for our model, the intracortical connectivity of the visual system contains recurrent excitatory connections and lateral inhibition (see Douglas and Martin 1990 for a review, and Douglas et 01. 1989; Bush and Douglas 1991 for models). There are also embryological similarities between the hyperstriaturn ventrale in chicks and neocortex in mammals (Horn 19851, and Johnson and Morton (1991) argue for a correspondence between IMHV and mammalian visual object recogni- tion centers in their relationship to "subcortical" processing. Finally, it is possible that the temporally extended parvocellular inputs to the form pathway of the mammalian visual system (e.g., Livingstone and Hubel 1988; Maunsell et al. 1990) are an additional source of hysteresis in mam- mals. With regard to the Hebbian associative learning, several studies have found evidence for associative LTP in visual cortex (e.g., Fregnac eta!. 1988). We adopt a multilevel approach to modeling the behavioral phenom- ena and neural substrate. The basic mechanism of translation-invariant object recognition can be specified and implemented with a relatively abstract neural network model, as it is based on rather general proper- ties of neural circuitry, and not on the detailed properties of individual neurons. Using an abstract neural network model offers the further ad- vantage of simplicity and explanatory clarity-since the model has only a few properties, the phenomena that result can be related more directly to these properties. However, caution is required in relating such a simple model to both detailed behavioral and neural data; in general this model makes qualitative rather than quantitative predictions. For this reason, it is necessary to also construct a more realistic neural model that is based on anatomical and physiological measurements of actual IMHV neurons. Such a model is planned for future research.

2 The Case for Visual Imprinting as Object Recognition

In order to use imprinting behavior in the chick as an example of an object recognition process, we must be confident that the IMHV region of the chick brain, known to be critical for imprinting, is also relevant for other object recognition tasks, as opposed to classical conditioning or operant learning tasks. Visual imprinting is studied using dark-reared chicks that are exposed to a conspicuous object for a training period that usually lasts for several hours. Hours or days later, the chick is given a preference test in which it is released in the presence of two objects-the training object and a novel object. The extent to which the chick attempts to approach the familiar object as opposed to the novel one results in a

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 361

preference score, which correlates with the strength of imprinting. We may infer from this behavior that chicks acquire information about the visual characteristics of objects to which they are exposed, making imprinting dependent upon object recognition. Several studies have shown that the stimulus must be moving in or- der to obtain strong imprinting (Sluckin 1972; Hoffman and Ratner 1973). This effect is unlikely to be simply due to the differential attraction of , since chicks will preferentially approach familiar objects even when static once they have been trained on it when moving. In the model, the input stimulus must be moving because this enables an in- variant representation to develop, and we propose that the motion during training is performing the same function for the chick. If IMHV is indeed a crucial site for imprinting then damage to it prior to imprinting should prevent the acquisition of preferences, and its destruction after imprinting should render a chick amnesic for existing preferences. These results have been confirmed (McCabe et al. 1982). Fur- thermore, lesions to IMHV had no effect on several other behaviors and learning tasks, demonstrating the relatively specific effects of such lesions (see Horn 1985; Horn and Johnson 1989; Johnson 1991 for reviews). For example, McCabe et al. (1982) showed that while small, localized lesions to IMHV impair the ability of chicks to acquire information about an imprinting object, these chicks were able to learn an association between a reward and a static pattern consisting of repeating visual features (e.g., dots, stripes). This suggests that neither low-level visual discrimination nor reward associations are impaired by IMHV lesions. Similarly, John- son and Horn (1986) trained chicks to press a particular pedal for a re- ward of exposure to a moving imprinting object. Intact chicks, and chicks with small lesions elsewhere in the forebrain, were able to both learn the operant component of the task, and imprinted onto the rewarding ob- ject. In , while chicks with IMHV lesions learned the operant component of the task, they were subsequently unable to recognize the rewarding object. Other evidence shows that IMHV is involved in other tasks that re- quire object recognition. For example, IMHV lesions impair the ability of chicks to recognize a particular individual member of their own species (Johnson and Horn 19871, and the ability to recognize individual mem- bers of the species in a mate choice situation (Bolhuis et al. 1989). Davies et al. (1988) demonstrated that chicks with IMHV lesions are unable to learn not to peck at an unpleasant-tasting colored bead in a one-trial pas- sive avoidance learning (PAL) task. The authors argued that this deficit comes from an inability to recognize the beads, since the intact chicks selectively withheld their pecking to the color of bead that had previ- ously been coated with the unpleasant-tasting substance. Perhaps the best description of the functional effect of IMHV lesions would be that it induces object agnosia (Horn 1985; Johnson 1991).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 362 Randall C. OReilly and Mark H. Johnson

3 The Neural Basis of Object Recognition in Imprinting

IMHV is a small “sausage-shaped region immediately around the mid- point between the anterior and posterior poles of the cerebral hemi- spheres. It receives input from the main visual projection areas of the chick, including a prominent input from the avian forebrain primary visual projection area (the hyperstriatum accessorium, HA), and is em- bryologically and functionally related to mammalian association cortex (Horn 1985). See Figure 1 for a diagram of this visual input to IMHV. In addition, IMHV is extensively connected to other regions of the avian brain (Bradley et al. 19851, including those thought to be involved in motor control, such as the archistriaturn. Recent cytoarchitectonic studies of IMHV have revealed that, in con- trast to the six-layered structure with many distinct cell types found in mammalian , there is no clear laminar structure of IMHV, and only four distinctive types of cells have been identified to date (Tomb61 ef al. 1988). Figure 2 shows the typical connectivity be- tween the four cell types that Tombol ef al. (1988) identified. There are two types of principal neurons (PNs) somewhat similar to mammalian pyramidal neurons (although they lack true apical dendrites) that are

Optic Tract From Contralateral Eye

Figure 1: Visual inputs to IMHV from the thalamic pathway. Shown is a saggital view (strictly diagrammatic, as some areas are out of the plane with others) of the chick brain, with visual information coming in through the optic tract, which then synapses in the optic nucleus of the . This then projects to area HA (hyperstriatum accessorium),which connects reciprocally with IMHV. This pathway may correspond to the +LGN eVl,V4 &IT pathway in mammals. There are other routes of visual input to IMHV, which are not shown in this figure (see Horn 1985). The brain of a 2-day-old chick is approximately 2 cm long.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 363

Figure 2: Schematic drawing summarizing the circuitry of IMHV at two levels of detail (simplified version in the box). Excitatory contacts are represented by open circles, and inhibitory ones by flat bars. Shown are the local circuit inhibitory neurons (LCN) and their reciprocal connectivity with the excitatory principal neurons (I"),and the recurrent excitatory connectivity between the principal neurons. In the detailed version, the thick solid lines are dendrites, while the axons are dashed or dotted lines. Both the inhibition and recurrent excitatory connectivity are used in the simplified model to produce hysteresis in the activation state of the IMHV. (After Tomb01 et al. 1988.)

probably excitatory. The type 1 PNs are spiny, large, and possess long bifurcating axons that probably project outside the region, while type 2 PNs are medium sized with thick spiny dendrites. The presence of a high density of spines is indicative of extensive afferentation, as is the case with the main input cells of the mammalian neocortex: the spiny stellate cells found in layer 4 (Douglas and Martin 1990). As shown in the figure, the two types of PN cells are interconnected such that they have a characteristic positive feedback loop. The other two classes of identified are medium and small local circuit neurons (LCNs) that are probably inhibitory, receiving excitatory input from the PNs and projecting inhibitory output back onto them. It is not known if they also receive excitatory input from the afferents that excite the PNs. Thus, the LCNs are probably performing at least feed- back inhibition, and possibly feedforward inhibition as well. Presumably, these inhibitory neurons are critical for moderating the intrinsic instabil- ity of the positive feedback present between the PNs. It should be noted that the types of PNs and LCNs and their characteristic interconnectivity found in IMHV are not commonly observed in neighboring regions of the chick brain that have been studied (Tomb01 et al. 1988).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 364 Randall C. OReilly and Mark H. Johnson

Two characteristics of the cytoarchitectonics of IMHV described above are incorporated in our model: the existence of positive feedback loops between the excitatory principal neurons, and the extensive inhibitory circuitry mediated by the local circuit neurons. We propose that these properties lead to a hysteresis of the activation state of PNs in IMHV, a feature that contributes to the development of translation invariant object-based representations. In our model, we assume that type 2 PNs are the main target of projections to IMHV from area HA, while type 1 PNs are probably the main source of projections from IMHV, for the reasons mentioned earlier. Changes in synaptic morphology and cell activity in IMHV have been recorded following imprinting. For example, Horn et al. (1985) showed that the mean length of the postsynaptic density of axospinous synapses within the left IMHV increased by 17% following 140 min of training, in- dicating a strengthening of synaptic efficacy, but no increase was found for shorter training or in dark-reared controls. In another study, McCabe and Horn (1988) measured the numbers of NMDA receptors, which are widely thought to be involved in associative LTP (Collingridge and Bliss 1987). They found a significant increase in the number of NMDA sites in the left IMHV of postimprinting chicks compared with dark-reared controls, which was positively correlated with the degree to which a chick preferred the familiar stimulus at testing. Other factors such as locomotor activity were not significantly correlated. Finally, electrophys- iological recording indicates that spontaneous activity levels in IMHV are relatively low, even during presentation of a stimulus, and that effects of training may be confined to IMHV (Payne and Horn 1984; Brown and Horn 1992). Significant effects are obtained only when several multicel- lular sites within IMHV are combined, suggesting that the representation is distributed within the region (Davey and Horn 1991).

4 The Self-organization of Invariant Representations

Despite the relative ease of its everyday execution, visual object recogni- tion is a difficult computational problem. One of the principal reasons for this difficulty is also a clue to a potential solution: there are a practically infinite number of different images that a given object can project onto the retina. Deciphering which of the many thousands of familiar ob- jects a given image represents is difficult because of this many-to-many correspondence. However, the ways in which a given object can pro- duce different images on the retina are limited to a few dimensions of variability, corresponding to translation, dilation, rotation, etc. The algorithm proposed by Foldihk (1991) and OReilly and McClel- land (1992) collapses across irrelevant dimensions by capitalizing on the idea that the visual environment naturally presents a sequence of im- ages of the same object undergoing a transformation along one or more

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 365

of these dimensions. The information contained in the translating se- quence of images is that all of them correspond to the same object in the world-one could refer to this information as "identity from persis- tence." In computational terms, the world imposes a temporal smooth- ness constraint on the existence of objects that can be used to regularize the ill-posed problem of visual object recognition (cf. Poggio et al. 1985; Yuille 1990). The temporal smoothness of the environment can be capitalized upon by a smoothness constraint (i.e., hysteresis, or the impact of previous ac- tivation states on subsequent ones) in the activation state of units in an artificial neural network, combined with an associative learning rule that causes temporally contiguous patterns of input activity to become repre- sented by the same subset of higher-level units. These higher-level units develop representations that are invariant over the differences between the temporally contiguous patterns.

4.1 The Algorithm. The specific biologically plausible implementa- tion of the general ideas described above, as proposed by OReilly and McClelland (1992) and used in the present simulations, takes advantage of commonly occurring features of neural anatomy to implement hys- teresis in the activation states of units. We propose that hysteresis comes from the combined forces of lateral inhibition, which prevents other units from becoming active, and recurrent, excitatory activation loops, which cause active units to remain so through mutual excitation. The need for recurrent excitatory activation loops and lateral inhibition requires an recurrent network in the tradition of McClelland (1981) and Hopfield (1984). For simplicity, the IAC activation function of McClelland (1981) was used, with stepsize 0.05, decay 1, and a range of -1 to 1 with a pro- vision that only positive activations are propagated to other units (see Appendix A for the exact equation used). Associative learning is implemented with a simple Hebbian covari- ance learning rule. The specific learning rule used in the present sim- ulations is a modification of the Competitive Learning scheme (Rumel- hart and Zipser 1986) (see Appendix B for the formulation), although other similar rules have been used with equal success (cf. Foldi6k 1991; OReilly and McClelland 1992). It is important that the learning rule be a covariance formulation (Sejnowski 19771, having both increase and decrease components that work together to shape the receptive fields of units both toward those inputs which excite them, and away from those that do not. However, the existence of an associative weight decrease phenomenon is a matter of some debate in the field. Nevertheless, there is considerable empirical evidence for a subtractive synaptic modification phenomenon known as associative long-term depression (LTD) in several different types of neurons and species (e.g., Stanton and Sejnowski 1989; Artola et al. 1990; Bradler and Barrioneuvo 1990; Fr6gnac et al. 1988).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 366 Randall C. OReilly and Mark H. Johnson

The temporal contiguity of the visual environment is captured by "visual" stimuli that appear in a series of different locations sequentially over time, all of which share the same features, and differ only in retino- topic location. Thus, only translational invariance is simulated, as this is the simplest form of invariance to model and analyze.

4.2 Network Model and Methods. The detailed architecture of the model (shown in Fig. 3) is designed around the anatomical connectivity of IMHV and its primary input area, HA. The input layer of the network, layer 0, represents area HA, which contains cells with properties simi- lar to the simple and complex retinotopic feature detectors described by Hubel and Wiesel in the cat visual cortex (Hubel and Wiesel 1962; Horn 1985). HA then projects to area IMHV, which we have divided concep- tually into two different layers2 according to the two types of PNs in IMHV described by Tombol et al. (1988). In the model, the first "IMHV layer represents the IMHV type 2 PNs, which are likely to receive affer- ent input, and layer 2 represents type 1 PNs, which are likely to produce efferent output from IMHV due to their long, bifurcating axons. Layer 2 sends outputs to layer 1 of the model, which in turn sends outputs to layer 2, creating the recurrent feedback loop. There were 10 x 8 or 80 units in layer 0, and 24 units each in layers 1 and 2. Strong lateral inhibition is present within each layer of the model, in the form of relatively large negative weights between all units in the layer. While they are not explicitly simulated, this reflects the influence of a large number of GABAergic inhibitory interneurons in IMHV (Tombol et al. 1988), and its relatively low levels of spontaneous activity. The strong inhibition was implemented in the model with fixed weight val- ues of 3.0, which enabled only one unit in each layer to become active at any time, making analysis of the system simpler. In the real system, it is assumed that inhibition does restrict activity to a relatively low level, but clearly the winner-take-all (WTA) extreme in the model is not realis- tic. The generalizability of the results to distributed patterns of activity throughout the system is an issue that will be addressed with the more detailed model. Preliminary results indicate that the same basic effects are found. Excitatory connections exist between the two principal cell types in IMHV. In the model, these excitatory connections were of the same strength as the excitatory connections from layer 0 to layer 1, and they were subject to the same learning rules. Both the inhibitory and excita- tory connections provide the hysteresis effect necessary for the learning algorithm. However, only the strength of the excitatory connections was adjusted by the learning rule. This is in accordance with biochemical and

*Note that the laminar distinction in the model between these two component cells of IMHV is not intended to suggest that the cells are arranged as such in the IMHV itself, but rather serves to reflect the functional distinction between the two types of principal neurons.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 367

A //// Layer 2 ///// (- IMHV PN 1)

'N 2)

A)

Figure 3: Network architecture used for the simplified model of IMHV, showing the three layers (layer 0 represents HA, layer 1 represents IMHV PN type 2, layer 2 represents IMHV PN type l), and their interconnectivity. The network is shown being trained on stimulus D, and the 3 different units numbered 1- 3 in layer 1 have partially invariant fields that capture local invariance over portions of the different positions of stimulus D. These different units project reciprocally to the layer 2 unit, which has a fully invariant representation of stimulus D by combining the partially invariant fields from layer 1.

neuroanatomical evidence consistent with excitatory axospinous connec- tions being the site of plasticity within IMHV, and the involvement of NMDA receptor sites in imprinting (Horn et al. 1985; McCabe and Horn 1988). While the units in layer 1 of the model received excitatory input from both layers 0 and 2, the units in layer 2 did not receive any excitatory input from higher layers that are known to exist in the chick but are not present in the model, resulting in lower levels of activation in this layer compared to layer 1. To compensate for this, layer 2 had a lower level of decay than the other layers (0.5 vs. l).3 As a result of being farther from the moving input, there is a greater degree of hysteresis in layer 2 than in layer 1 of the model, which is reflected in the kinds of receptive fields that form in the two layers. See OReilly and McClelland (1992) for more discussion of the graded invariance transformation over increasingly deeper layers.

31t is likely that the principal neurons (type 1 I") in IMHV receive reciprocal ex- citatory connections from the areas that they project to, since most of the regions that receive input from IMHV also have projections back to it (Bradley et al. 1985).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 368 Randall C. OReilly and Mark H. Johnson

Figure 4: Stimuli used in training the network, consisting of three feature bits active in any of eight different positions. For visual clarity, the positions were arranged along the horizontal axis, and the object features along the vertical, with 1 bit of overlap between two pairs of the four primary stimuli. Object AB was a hybrid stimulus used in some simulations that had 2 bits in common (overlap) with both A and B, while the others had 1 bit in common with their neighbor.

Training in the model consisted of presenting a set of feature bits (as- sumed to correspond to a given object) in sequential positions across the input layer 0 (simulated HA) with either right or left motion (randomly chosen). At each position of a stimulus, the weights between all units in the system were adjusted according to the Hebbian learning rule once the activation state of the network had reached equilibrium for each position (defined here as the point at which the maximum change in activation went below a threshold of 0.0005). The activation state was initialized to zero between different objects, but not between positions of a single ob- ject, enabling a state resulting from an object in one position to exert the desired hysteresis effect on the state with the object in the next position. Four "objects" were used in the simulations, which consisted of three active features in any of the eight possible retinotopic locations repre- sented in the input layer (see Fig. 4). The arrangement of the feature bits into columns simply represents eight different views of an object, which could in theory correspond to any transformation sequence, not just horizontal translation. There was 1 bit out of the three features in

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 369

common between neighboring stimuli, so that object A overlapped with B by 1 bit, as did C with D. B and C did not overlap. Since it is not possible to record preferential approach behavior from the network, some proxy for this kind of preference measure must be used. As layer 2 in the model represents the output of IMHV, we recorded the activation level over this layer when different stimuli were presented to determine the preference. Because of the extreme inhibitory compe- tition between units, the activation level of the single active unit is not indicative of the level of excitation this unit is receiving4 To circumvent this potential measurement artifact, we instead used the total excitatory input to layer 2 (i.e., for all units in the layer) as an indication of prefer- ence. The actual preference scores were computed by averaging the total excitatory input over all the different positions for each stimulus, result- ing in a raw preference score for each stimulus. These raw preference scores were converted into percentage preference measures between two stimuli (e.g., A over D)by dividing each raw score by the total for both stimuli. This is analogous to the preference score measure used in many behavioral studies (Horn 1985).

5 Basic Imprinting and Capacity Properties

Before addressing the more challenging behavioral phenomena with our model, it is first necessary to establish its basic imprinting and capacity properties. The latter is especially important because one does not want a putative sensitive period phenomenon to reduce to a simple capacity limitation of the model. The basic simulation of the imprinting effect involved presenting a single stimulus over many epochs while record- ing the development of the preference for this stimulus. The imprint- ing results for the model are shown in Figure 5, which indicates that a preference for the training stimulus over a novel, dissimilar, object does develop over time. In the model, this effect is simply due to the selective enhancement of weights to units that respond to the imprinted stimulus, and is not a surprising or novel result. The capacity simulation was run by training on all four stimuli (ob- jects A-D), and showing that the model developed individuated repre- sentations for each object that were invariant across the entire range of positions of the object. Each epoch of training consisted of sweeping each stimulus in turn across the input layer, covering all eight positions of each object. The order in which the stimuli were presented was ran- domized, with a "delay" (implemented by zeroing the activation states) between each sweep of the stimuli. Training continued for 100 epochs. The receptive fields for the two simulated IMHV layers developed invari- ant representations in a graded fashion. That is, the first layer developed

4This is an artifact of the winner-take-all nature of our model, and is probably not the case in the real system.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 370 Randall C. OReilly and Mark H. Johnson

Imprinting Preference Development

0.80 I

5haa; 0.500.60 8 10.40 - w (lrdned) vz (novel) o.30 MA D MC (novel) vs. D (novel) I 0.20 1 ' 0 2s so 75 100 12s 150 Epochs of Training on A

Figure 5: The basic imprinting effect, showing the preference for the imprinted stimulus A as compared to a novel stimulus, D. The preference for a control stimulus C as compared to D is also shown. This preference does not deviate from chance (0.5).

representations specific to one stimulus in any of several (but not all) different positions of that stimulus, making it partially invariant with respect to position. Layer 2, however, did have fully invariant represen- tations, so that a single pattern of activation coded for a single stimulus in any position. The development of invariance was quite robust over different pa- rameter settings, although the specific qualities of the representations depended on certain ranges of settings. In particular, larger levels of activation decay tended to reduce the hysteresis effect and caused lower levels of invariance to develop. Likewise, smaller levels of decay caused more invariance to develop on layer 1 (layer 2 was already fully invari- ant). These effects appeared over relatively large changes in decay (1 vs. 1.5, for example).

6 Reversibility and the Sensitive Period

6.1 Behavior. Lorenz (1937) originally claimed that an imprinted preference was irreversible. Jaynes (1956) pointed out that there are two senses in which imprinting could be irreversible. First, that after

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 371

imprinting a bird will never again direct its filial responses to a novel object. Alternatively, that while a bird can direct its filial responses to a second object, it always retains information about the first object. There is a considerable amount of evidence that imprinting is not irreversible in the first sense (e.g., Klopfer and Hailman 1964a,b; Kopfer 1967; Salzen and Meyer 1968; Kertzman and Demarest 1982). Much evidence supports the second, and weaker, form of the irreversibility claim (for review see Bolhuis and Bateson 1990; Bolhuis 1991). Thus, while imprinted pref- erences can be reversed by prolonged exposure to a second object, a representation of the original object remains. A number of factors affect whether imprinted preferences can be re- versed, and, if so, the extent of reversal. These factors include the length of exposure to the first object experienced by the chick, and length of subsequent exposure to a second object. For example, a prolonged ex- posure to the first object will prevent reversal of preference for a second object (Shapiro and Thurston 1978). With shorter exposure to the first object, a very brief period of exposure to a second object will not result in a reversal of preference, while a longer period of exposure to a sec- ond object will (Salzen and Meyer 1968). Data from Bolhuis and Bateson (19901, shown in Figure 6, illustrate this effect. The chicks were trained for either 3 or 6 days, and preference tests revealed a strong preference for the training object. Then, the chicks were exposed to a second object for 3 or 6 days, and preferences were reversed. The extent to which a reversal occurs depends on the length of exposure to the first stimulus (longer exposure reduces the subsequent extent of reversibility). While Bolhuis and Bateson (1990) did not systematically manipulate the length of exposure to the second stimulus, other studies have shown that this variable influences the degree of reversal (Salzen and Meyer 1968; Ein- siedel 1975).

6.2 Simulation. The implementation of reversibility in the model was straightforward: train on one stimulus for a variable amount of time, then train on a different one for a variable amount of time. For the basic effect, we trained on stimulus A for 100 epochs (100 sweeps across the input layer), and then trained again with stimulus D from various points of training on A. The results are shown in Figure 7a for a network which was exposed to A for 100 epochs, and then exposed to D for up to 300 epochs. Thus, reversibility occurs despite a relatively strong initial preference for A, with D being preferred to A after more than 150 epochs of exposure to D. This finding is consistent with those cited above. Fur- ther, the strength of preference for D increases with longer training on that object. Also shown in this figure is the continued preference for A, the initial training object, over a novel stimulus, B. This finding shows that a representation of the first object still exists, and is consistent with the second, weaker form of irreversibility observed in the chick (Cook 1993).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 372 Randall C. OReilly and Mark H. Johnson

Training Time and Reversibility in the Chick Dab horn Bolbub and Baleson, 1WO 1.0 r 1

7 0.8 - e 0.7 - 0.6 - 05 b' OA -

M6 OD first, 3 on sccond 0.1

0.0 I ' 1 2 Test Period

Figure 6: Data showing an initial preference for the first object after either 3 or 6 days recorded on test 1, and a reversal of this preference after a period of exposure of 3 or 6 days to the other stimulus, as recorded in test 2. A decreased level of reversal is observed with greater exposure to the first object.

That the network displayed reversibility is not terribly surprising, as many neural networks will continue to adapt their weights as the envi- ronment changes. However, unlike many networks that display a "catas- trophic" level of interference from subsequent learning (cf. McCloskey and Cohen 19891, this model retained a relatively intact preference for the initial stimulus over a completely novel one. The explanation for this preserved learning effect will be presented below. While reversibility seems to occur in the network and in chicks, there is also support for the idea of a self-terminating sensitive period where sufficiently long exposure to an imprinting stimulus will prevent the preference from being reversed (e.g., Shapiro and Thurston 1978). Indeed, in the model, a sensitive period is observed with just 25 more epochs of exposure to A before exposing the network to stimulus D. This additional exposure prevents any amount of further training on D from reversing the initial preference for A (Fig. 7b). This kind of sensitive period is self- terminating because it is determined solely by exposure to the object, and not by some other maturational change in the system.

6.3 Discussion. Both the sensitive period effect and the preserved preference effect described above can be explained by an interaction be-

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 373

a) Reversibility of Preference ENmt of Exposure lo D aner 100 Epochs on A

0.70 9- 0.60 ill

0.20" ' ' ' ' ' ' ' ' ' ' ' '1 0 50 100 I50 200 21 300 Epochs of Training on D

b) Self-TerminatingSensitive Period Elfecl or Expmure to D mer 125 Epochs on A 0.80

0.70

h* g 0.60 t $ 050 Y \I 4 2

d OAO 0.30

0.20 J 100 200 300 400 500 600 700 BOO 900 Epochs of Training on D

Figure 7 (a) The basic reversibility effect from training on D after initial im- printing on A. The comparison of the preference for A vs. D shows a reversal (from above 50% preference to below 50% preference). The preference for A over a second object B remains stable despite the training on D. (b) A sensitive period effect from 125 epochs of imprinting on A before exposing to D. The A vs. D preference does not reverse, indicating a preserved preference for A despite up to 900 epochs of exposure to D.

tween the subtractive component of the covariance learning rule and hysteresis effects. Consider a unit in layer 1 of the model network that has become active by the presentation of a stimulus in position 1 of the

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 374 Randall C. OReilly and Mark H. Johnson

input layer. This unit will increment its weights to the feature units that activated it, and to the unit in layer 2 that is also active. Also, according to the subtractive component of the covariance learning rule, it will de- crease its weights to all the other inactive units in the network, including those in the other positions of the same stimulus. When, due to hysteresis, the same layer 1 unit remains active for position 2 of the stimulus, the same subtraction will occur on weights from those input units in position 1 that were just increased on the previous time step, as they will now be inactive. Indeed, if this unit remains active for more than two different positions of an object, the net result will be to decrease its weights to each of these positions more than increase them. With the weight bounding procedure used in the model (see Ap- pendix B), the value of a weight represents the balance of increasing and decreasing forces. Thus, the weights to a unit representing an object over several different positions will reflect the conditional probability that the input unit is active given the unit is (Rumelhart and Zipser 1986): if a unit is active for N positions of an object, the weight from each position will equilibrate around a value of 1/N. Because the weights are initialized to a random value with a mean of 0.5, they will typically be decreasing initially whenever N > 2. This means that a unit that was active for a given object on one epoch will be less likely to be active the next epoch. As a result, a different unit will become active the next time, and adjust its weights for the same object. In this way, the repeated presentation of a given object will result in a recruitment process where multiple units will become tuned to the same object. It should be noted that this process happens gradually, with all of the recruited units taking turns becoming active, even when the system has reached equilibrium (i.e., all weights near their 1/N values). As layer 1 units become recruited, they are also always decreasing their weights to input units with which they are never contemporane- ously active. This makes them not respond to the features corresponding to objects other than the one they have been exposed to, which we refer to as a tuning process. In the presence of multiple objects, the influence of recruitment is balanced by this tuning process, because units tuned to a given object will be unavailable for recruitment for another object. However, when only one object is viewed for a period of time, recruit- ment and tuning work together to cause many units to respond very selectively to the imprinting stimulus. Thus, as training continues on the imprinting stimulus, both the tuning and recruitment effects get stronger, so that by a certain point (125 epochs in the present simulations), a ma- jority of the units become selective to the imprinting stimulus, and are unavailable for recruitment to any new training stimulus because their weights to the other object features are near zero. Once this majority has been established, no amount of exposure to a different stimulus will cause these units to be recruited to the new stimulus, and the balance of preference will not shift. Instead, the retraining will cause the minority

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 375

of initially unrecruited or less strongly recruited units to become selective to the new stimulus. The tuning and recruitment processes also explain the retention of preference for a trained stimulus over a novel one even after retraining. The recruited units, being tuned to a particular object, do not become recruited by the other object, and the original preference for the object is preserved. Thus, the two computationally important and biologically supported features of our model, hysteresis and a covariance Hebbian learning rule, are critical for its self-terminating sensitive period behavior. However, the specific properties of both of these components will affect the quanti- tative nature of the reversibility phenomenon in the chick. In particular, the level of recruitment will affect the exposure duration necessary to terminate the critical period. To demonstrate this, simulations were run that were identical to the previous ones in all respects except for the size of layers 1 and 2, which were doubled. In these simulations, the initial preference for stimulus A was reversible after 150 epochs of exposure to A, but not after 300 epochs of exposure to A. Examination of the devel- opment of the receptive fields for units in layer 1 of this network showed that it was not the number of units which were tuned to stimulus A that changed over time, but rather the contrast between the weights from the imprinted object and the weights from the other objects, supporting the distributed and gradual account of preference formation given above. Other control simulations showed that variables such as the magnitude of the initial random weights were not important for the effect.

6.4 Predictions. While the model's behavior is consistent with the empirical studies described, we state here the clear predictions that the model makes. The falsification of any of these would pose a challenge to the model.

0 The length of time training on the first stimulus will affect the amount of retraining time necessary to reverse the preference for the first stimulus in favor of the second.

0 A sufficiently long period of training on the initial stimulus will pre- vent any reversal effect regardless of how long the second stimulus is presented.

0 A neuron in IMHV that shows a preference for a given stimulus will tend to retain this preference even after retraining. Further, this propensity for retention of initial preference will be correlated with the selectivity and/or strength of the initial preference.

0 More neurons in IMHV will show a strong preference for the first training stimulus after the sensitive period for reversibility than before it.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 376 Randall C. OReilly and Mark H. Johnson

Early on in training, neurons in IMHV should exhibit phasic or noisy correlations (i.e., correlated for a period then not, then corre- lated again) with the stimulus presentation as the balance between weight increase and decrease forces is struck.

7 Generalization

At a behavioral level, a number of studies have shown that generalization from the training stimulus occurs following imprinting on an object to objects having the same shape or color (Jaynes 1956, 1958; Cofoid and Honig 1961). More recently, Bolhuis and Horn (1992) found evidence that generalization varies systematically with stimulus similarity. Generalization occurs in the model because of the tendency of over- lapping input patterns to activate the same units in higher layers accord- ing to the degree of overlap present in the input layer. This is a basic property of most neural network models. The central finding from the simulations is that generalization appears to vary in a nonlinear fashion with respect to stimulus similarity. Thus, for the basic imprinting simu- lation, generalization was present for stimulus AB (a hybrid of A and B, having 2/3 overlap with each), but not for stimulus B, which had only 1/3 overlap with A after initial training on A (see Fig. 8). This nonlinear- ity is due to the lateral inhibition, which causes weakly activated units to become inhibited by more active units. Similar results were found when generalization was tested in the re- versibility simulation. In this case, the initial exposure to A generalized to AB, and not B, and subsequent exposure to D caused the preference for A and AB to reverse. This reversed preference was less than the pref- erence for D over B, indicating that the non-linear generalization from the initial exposure is preserved following reversal (see Fig. 9a). The re- versibility simulations were also run with AB as the retraining stimulus, which is highly similar (2/3 overlap) to the original stimulus A. As can be seen in Figure 9b, the preference for A relative to a novel stimulus (D) increases slightly, rather than decreases after exposure to AB. This follows from the fact that the same units are active in both cases, causing learning to benefit both stimuli. Another interesting effect shown in the figure is that the retraining on AB generalizes to stimulus B, because of B’s 2/3 overlap with AB. Thus, initially quite distinct objects, A and B, can be made to elicit the same preference from the network, and activate the same units, by providing a ’%ridge” between them. Control simulations using different orderings of initial and subsequent stimuli (e.g., exposing to A and B, then AB) indicate that order is not important for this effect. Ryan and Lea (1989) describe findings in the chick consistent with our model’s behavior. They exposed chicks to a stimulus composed of four balls, all one of two colors. One group saw the stimulus gradually change by flipping the color of

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 377

Generalization in Imprinting Eflect d Expasure lo SUmulus A

0.00 I

hc)

0.60

0.20 0 1 SO 75 100 125 150 Epochs of Tralnlng on A

Figure 8: Generalization of imprinting on stimulus A to stimulus AB (2/3 over- lap with A), as revealed by a comparison with the measured preference for D. Note that stimulus B, which has a 1/3 overlap with A, does not experience significant imprinting from exposure to A, revealing a nonlinearity in general- ization.

one ball every 4 days, while the other group's stimulus was changed all at once. The chicks with the gradually changing stimulus developed an equally strong preference for both colors of the stimulus, while the others did not. The authors argue that these chicks had undergone cutegory enlargement. Finally, using stimulus AB for retraining prevented the termination of the critical period for imprinting, since, after retraining on AB, it will be preferred to A even after any number of epochs of initial exposure to A. However, the preference for A did not diminish relative to a novel stimulus (it actually increased slightly), and the level of preference of AB over A was not strong. This is a direct consequence of the fact that many of the same units in the model that were active for A are active for AB. Some evidence from imprinting on highly similar objects, such as two live hens, suggests that preferences remain fluid for a longer period (Kent 1987).

7.1 Predictions. Again, the model's behavior is consistent with the empirical studies described, but we consolidate here the clear predictions that the model makes.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 378 Randall C. OReilly and Mark H. Johnson

a) Generalization in Reversibility ErrKl of Exposure lo D ahr 100 Epocbr on A

0.70

*

0.50 0 b

f 0.40 0.30 HAB (66% MIarIly)vs.

0.20" ' ' . ' ' ' ' ' ' ' ' 'bl 0 so loo I50 200 250 300 Epochs of Training on D

b) Generalization from a Similar Stimulus

e* B (66% aImllarlly dAB)vs. D HCnovel va. D 0.20 I 0 I 15 50 7s 100 12s 150 Epoch of Training on AB

Figure 9: (a) Generalization of the reversibility effect from exposure to D after 100 epochs on A. Stimulus AB (2/3 overlap with A) shows both the initial preference, and the reversal of this preference from exposure to D. Note that stimulus B remains less preferred than A or AB, due to nonlinear generalization effects. (b) Training on a highly similar stimulus (AB)to the original imprinting stimulus A increases the preference of A relative to a novel stimulus (D).While there is a reversal of preference for A relative to AB, this effect is minor relative to the increase in preference compared to the novel stimulus. Also, the AB stimulus, being equally similar to A and B, causes the system to generalize to B as well, which shows a similar level of preference relative to the novel stimulus as A.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 379

0 The shape of the generalization window in chicks will be nonlinear with respect to stimulus similarity, with a rather sharp boundary to the range of stimuli that generalize and those that do not.

0 The generalization of preference should be preserved under reversal due to subsequent training on a novel, dissimilar stimulus.

0 The objects within the generalization window will activate many of the same principal neurons in IMHV, while those not in the window will not. This finding would establish the neural basis of general- ization as predicted from our model.

0 Secondary training on a stimulus that is very similar to the original imprinting stimulus, but also similar to another stimulus, will re- sult in a ”widening” of the imprinting preference, so that all three stimuli will now show a similar level of preference to a novel stim- ulus.

0 Retraining with a stimulus that is sufficiently similar to the original will not result in a decrease in preference relative to a completely novel stimulus, and only a weak reversal of preference over the initial stimulus.

0 The termination of the sensitive period will occur at a variable time depending on the degree of similarity between the retraining stim- ulus and the original imprinting stimulus.

8 Temporal Contiguity and Blending

8.1 Behavior. In a phenomenon related to generalization, a number of authors have reported that blending can occur when two objects are presented in close temporal or spatial contiguity. For example, Chantrey (1974) varied the interstimulus interval (ISI) between the presentation of two objects from 15 sec to 30 min. By subsequently attempting to train chicks on a visual discrimination task involving the two objects he was able to demonstrate that when the objects had been presented in close temporal proximity (under 30 sec gap between them) they became blended together to some extent. This result was replicated by Stewart et al. (1977) who also controlled for the total amount of exposure to the stimuli.

8.2 Simulations. According to our assumptions about the specific nature of the learning taking place in IMHV, temporal contiguity should be a very important variable in determining what the network considers to be the same object. In the “identity from persistence” algorithm, an object is defined as that which coheres over time. If this kind of learn- ing is indeed taking place in IMHV, then one would expect to find the kinds of blending effects found by Chantrey (1974) due to close temporal

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 380 Randall C. OReilly and Mark H. Johnson

contiguity. To simulate the effects of delay in the model, we used the reinitialization of the activation state as a proxy for a delay long enough to allow the neural activation states to decay to values near resting, or at least retain little information about previous states. The exact length of time that this corresponds to in the chick is not known, so we manip- ulated ”delay” as a binary variable. The delay condition networks were reinitialized between each sweep of a stimulus across the input layer, while the no-delay networks were not.5 In the model, we measure blending by looking directly at the unit’s weights to determine what they have encoded. If a unit has weights that have been enhanced from features belonging to two or more different stimuli, then this unit has a representation that blends the distinction between these objects. If the unit only has strong weights from features for one stimulus, then the unit has a representation that differentiates be- tween stimuli. Thus, to measure blending, we simply count the number of units that have nondifferentiating representations. In order to allow for noise in the weights, we take an arbitrary threshold at 0.36788 (l/e) times the strength of the strongest weight into a unit as a cutoff for con- sidering that weight to be enhanced. Figure 10 shows the level of blending for training on stimuli A and D in the delay and no-delay conditions. Clearly, the absence of a delay causes almost total blending in the representations that form, so that the network would be unable, based on the active units, to distinguish whether stimulus A or stimulus D was present in the environment. The explanation for this effect is simple-a different position of a given object and an entirely different object are no different to a naive network. Thus, the intermingling without a delay of two different stimuli causes the network to treat them as different “positions” of the same stimulus. Note that without hysteresis, this effect would not arise, making the blending phenomenon a crucial source of empirical support for our hysteresis- based algorithm. Intuitively, one might expect that the effect of similarity of stimuli on blending would be to increase it, so that more similar objects will be subject to greater blending effects. This was tested in the model by training with stimulus A in conjunction with AB or B, in the delay and no- delay conditions, and comparing the results to those found with stimulus D. Figure 11 shows that the effect of similarity on blending is not the same for both delay and no-delay conditions. In the delay condition, the more similar the stimuli were, the more blending occurred. This is the predicted direction of the effect. However, in the no-delay condition, the more similar the two stimuli, the less blending occurred: the level of blending in the no-delay condition was inversely proportional to the similarity of the stimuli.

5Note that all simulations to this point involving multiple stimuli per epoch have been run with a ”delay.”

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 381

Temporal Contiguity Efht d Delar a(& SUmuli A and D

"i0

Figure 10: Temporal contiguity manipulation (delay vs. no-delay) affects num- ber of blended representations, so that the no-delay condition causes consider- ably increased blending of the two stimuIi. Blending is measured by examining the receptive fields of the units in each of the two layers of the model.

8.3 Discussion. The inverse relationship between blending in the de- lay and no-delay conditions suggests a kind of inoculation effect might be operating. Since the delay condition represents a "best case" for stimulus discriminability, we can think of it as representing the baseline perfor- mance of the network for blending. This baseline is much worse for similar stimuli than dissimilar ones. When networks are run in the no- delay condition, the intrinsic level of blending (as revealed in the delay networks) somehow inoculates the system against the effects of no-delay induced blending6 The explanation for this effect, like that for the re- versibility effects described above, depends critically on hysteresis and the covariance formulation of the learning rule. When the stimuli overlap, units that respond to one stimulus will be more likely to subsequently respond to the other, due to generalization of the learning. However, as was the case with the recruitment process described previously, the more positions of stimuli a unit is active for, the more it decreases its weights to each individual stimulus position that ac- tivates it. Thus, as a unit is active more often with similar input stimuli, it experiences a greater tuning pressure. Given the competitive activation

6Keep in mind that delay and no-delay conditions are run on separate networks, with the delay network serving as a control condition. Thus, the delay condition does not itself inoculate the networks for the no-delay condition, rather it reveals something about the intrinsic blending present in the network.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 382 Randall C. OReilly and Mark H. Johnson

Temporal Contiguity Effect d Inter-Slhalllur SlmUulty 100 I 1 I

lo 0%t (A-D) 33% (A-B) 66% (A-AB) Degree of Slmllarlty

Figure 11: Temporal contiguity manipulation (delay vs. no-delay)interacts with similarity of the stimuli, so that in the delay condition, more similar stimuli (AB, B) show greater blending than dissimilar stimuli (D).The no-delay condition causes increased blending in inverse proportion to the amount of blending in the delay case. These data are the average of both layers 1 and 2, but each layer individually shows the same pattern.

dynamics in the network, a slight imbalance in weight strength between two similar stimuli can result in a unit being active for one stimulus and not the other. Such an imbalance can be caused by the weight tuning process, and once a unit begins to respond preferentially to one stimulus over the other, the imbalance is magnified, resulting eventually in a dif- ferentiated representation for one stimulus over the other. Paradoxically, increasing overlap in the input causes increased pressure to form differ- entiated representations in layers 1 and 2, explaining why the blending in the delay condition for stimuli A-AB is only around 50% even though AB is known to show a high degree of generalization to A. In the no-delay condition for these highly overlapping stimuli, the absence of a delay has relatively little effect because the units are already becoming active for both stimuli even with the delay. Thus, causing them to remain active across stimulus boundaries in the no-delay condition does not alter the balance of tuning and generalization very dramatically, re- sulting in only a small increase in blending for the 2/3 overlap condition. The inoculation analogy is an appropriate one. Think of the similarity of the stimuli and the no-delay manipulation as two similar viruses that cause the same disease, blended representations. These viruses cause the illness (blending), but the similarity virus also arouses natural defenses

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 383

against it (enhanced tuning due to increased activity levels). Thus, in the similar stimuli (A-AB) no-delay condition, the additional virus of the no-delay does not have much of an additional effect, because the nat- ural defenses are engaged by the similarity virus in such a way as to inoculate the system against the effects of this other virus. However, in a system without the similarity virus (e.g., the A-D condition), the no-delay virus causes much more rampant damage because its natural defenses are not otherwise aroused. Finally, the 1/3 overlap condition (A-B) falls somewhere in between these two extremes.

8.4 Predictions.

0 When dissimilar stimuli are used in a temporal blending experi- ment, significant blending will be observed in short delay condi- tions. However, when stimuli that share many visual features are used, evidence for blending will be more difficult to detect. We pro- pose that the difficulties in extending the Chantrey (1974) findings to other stimuli and experimental conditions (Stewart et al. 1977) is due to the relative similarity of the stimuli used.

0 The temporal blending effect should be apparent in neural record- ing studies. With two dissimilar training stimuli, a short temporal gap training condition should result in more cells responding to both stimuli. In contrast, a long temporal gap condition with the same two stimuli should reveal more neurons responsive to one or the other stimulus.

0 Some IMHV neurons initially activated by both of two similar stim- uli should become gradually more sensitive to one or the other as training proceeds, while others should remain active for both. The gradient of overall similarity in IMHV firing patterns should be towards more distinct patterns for the two stimuli as exposure in- creases, reflecting the tuning process responsible for inoculation of similar stimuli at work.

9 Discussion

Through a series of simulations and analysis of the mechanisms behind them, we have been able to evaluate a simple neural network model of invariant object recognition by relating its behavior and neural properties to those of the domestic chick during imprinting. The model exhibits a range of behaviors in different analogs of experimental manipulations from reversibility and sensitive periods to generalization and tempo- ral blending that qualitatively agree with empirical results. It is this range, combined with the counterintuitive predictions regarding tempo- ral blending and its ability to exhibit a self-terminating sensitive period,

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 384 Randall C. OReilly and Mark H. Johnson

that enables the behavioral data to support our theoretical claims. Fur- ther, the relative simplicity of the model has enabled its behaviors to be related to certain properties that the model incorporates from the neu- ronal structure of IMHV. This lends support to the idea that these neu- ral circuit properties have a functional role in object recognition within IMHV corresponding to their role in our model. Aside from the specific predictions made regarding each of the dif- ferent behavioral effects found in imprinting, we feel the model leads to several broader conclusions. To the extent that our account of IMHV fits the behavioral and neural data, it provides support for the compu- tational model of object recognition upon which our account is based. Given the challenging nature of the object recognition problem from a computational standpoint, the existence of a simplified animal model of object recognition would provide a valuable tool for further exploration and understanding of how neural systems recognize objects. Further, the sensitive period issues explored in our model may have utility outside the sphere of visual imprinting in birds. For example, sev- eral authors have pointed out similarities between some of the phenom- ena associated with imprinting, and observations about the development and plasticity of the primate cortex (eg, Rauschecker and Marler 1987). In particular, there may be strong parallels between the reversibility ef- fects examined in this paper, and the extent of recovery of visual acuity following monocular occlusion in the kitten (e.g., Mitchell 1991). Several important assumptions have been made in constructing the model. First, in applying a neural network model of translation invariant object recognition to imprinting phenomena in the chick, we have as- sumed (and presented evidence in support of the claim) that area IMHV is involved in the process of object recognition. Further, we have as- sumed that the response properties of neurons in IMHV are sufficient to determine the preference behavior of chicks, since our preference mea- sures in the model were taken directly from the units corresponding to those in IMHV. However, we do not wish to imply that IMHV is the only neural structure that influences the preferences of chicks. Indeed, it is well established that there are other areas of the chick brain that are involved in both imprinting (Horn 1985; Horn and Johnson 1989) and in passive avoidance learning (Kossut and Rose 1984; Rose 1991). A general issue that arises with a model such as the one presented in this paper concerns the extent to which it can be considered a “neural” model or purely an abstract psychological model. Clearly, the units in our model have nothing like the detail of a real neuron, or even of a relatively simplified point neuron model. For this reason we have been unable to make any quantitative predictions about patterns of firing of neuron types within IMHV, or about the temporal parameters which would produce blending effects. Instead, we have attempted to capture in a simplified computational framework a relatively few critical features of IMHV’s neural structure that we believe relate directly to its role as an

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 385

object recognition system. The resulting model is simple enough to make the analysis of its behavior possible, furthering our understanding of the system and allowing us to make several qualitative predictions that are open to experimental investigation.

Appendix A: IAC Activation Function

The IAC activation function (in a slightly modified form) contains an input and a decay term (with the relative contribution of these two factors controlled by modifying the level of the decay term, a) controlled by a rate parameter A:

Aai = XV(neti) - .(ai - rest)] (A.1) where f (neti) is the following input function:

neti(max - ai) neti > 0 f (neti)= (A.2) neti(ai- min) neti < 0 and net, is the net input to the unit: neti = 1o,w,i + C qwl, (A.3) i IJfi with I indexing over the other units in the same layer with inhibitory connections, and o, representing the positive-only activation value of unit j (0 otherwise).

Appendix B: Hebbian Weight Update Rule

The weight update rule used throughout was as follows: 1 (1.0 - w,,)a, > O,a, > 0 -Awl, = owll a, > O,a, < 0 (B.1) X { otherwise where X is the learning rate. The rule depends on the signs of the pre- and postsynaptic terms (a, and a,, respectively), as the inhibition within a layer will drive all but a single unit into the negative activation range. Without hysteresis, this unit would always be the one with the largest net input from the current input pattern, making this formulation in combination with the lateral inhibition roughly equivalent to the Competitive Learning scheme (Rumelhart and Zipser 1986). The hysteresis can cause an already-active unit that might not have the largest amount of input from the current pattern to remain active, which is the basis of the translation invariance learning, and in this way the system differs from Rumelhart and Zipser (1986). Note that it differs also from the Foldilk (1991) scheme in that the hysleresis or trace component of the activation is implemented in the activation function through the influence of lateral inhibition and mutual excitation, and not directly in the weight update function.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 386 Randall C. OReilly and Mark H. Johnson

Acknowledgments

We would like to thank Jeff Elman, Jay McClelland, and Yuko Munakata for helpful comments on the paper. R. C. OReilly was supported by an ONR graduate fellowship.

Note Added in Proof

Another neural network model on imprinting in the chick has recently come to our attention (Bateson & Horn, in press, Animal Behavior). A comparison between the two models will be the subject of a future paper.

References

Artola, A., Brocher, S., and Singer, W. 1990. Different voltage-dependent thresh- olds for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature (London) 347, 69-72. Bateson, I? 1966. The characteristics and context of imprinting. Bid. Rev. 41, 177-220. Bolhuis, J. J. 1991. Mechanisms of avian imprinting: A review. Biol. Rev. 66, 303-345. Bolhuis, J. J., and Bateson, I? 1990. The importance of being first: A primacy effect in filial imprinting. Anim. Behav. 40, 472-483. Bolhuis, J. J., and Horn, G. 1992. Generalization of learned preferences in filial imprinting. Anim. Behav. 44, 185-187. Bolhuis, J. J., Johnson, M. H., Horn, G., and Bateson, P. 1989. Long-lasting effects of IMHV lesions on social preferences in domestic fowl. Behav. Neurosci. 103, 438-441. Bradler, J., and Barrionuevo, G. 1990. Heterosynaptic correlates of long-term potentiation induction in hippocampal CA3 neurons. 35(2), 265-271. Bradley, P., Davies, D., and Horn, G. 1985. Connections of the hyperstriatum ventrale in the domestic chick Gallus domesticus. 1. Anat. 140, 577-589. Brown, M. W., and Horn, G. 1992. Neurones in the intermediate and medial part of the hyperstriatum ventrale (IMHV) of freely moving chicks respond to visual and/or auditory stimuli. 1. Physiol. 452, 102P Bush, P. C., and Douglas, R. J. 1991. Synchronization of bursting discharge in a model network of neocortical neurons. Neural Comp. 3,19-30. Chantrey, D. 1974. Stimulus pre-exposure and discrimination learning by do- mestic chicks: Effect of varying interstimulus time. J. Comp. Physiol. Psychol. 87, 517-525. Cofoid, D., and Honig, W. 1961. Stimulus generalization of imprinting. Science 134, 1692-1694. Collingridge, G., and Bliss, T. 1987. NMDA receptors-their role in long-term potentiation. Trends Neurosci. 10, 288-293.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 387

Cook, S. 1993. Retention of primary preferences after secondary filial imprint- ing. Anim. Behav. 46, 405-407. Davey, J., and Horn, G. 1991. The development of hemispheric asymmetries in neuronal activity in the domestic chick after visual experience. Behazj. Brain Res. 45, 81-86. Davies, D., Taylor, D., and Johnson, M. H. 1988. Restricted hyperstriatal lesions and passive avoidance learning in the chick. J. Neurosci. 8, 4662-4668. Douglas, R. J., and Martin, K. A. C. 1990. Neocortex. In The Synaptic Organization of the Brain, G. M. Shepherd, ed., Chap. 12, pp. 389-438. Oxford University Press, Oxford. Douglas, R. J., Martin, K. A. C., and Whitteridge, D. 1989. A canonical micro- circuit for neocortex. Neural Comp. 1, 480-488. Einsiedel, A. A. 1975. The development and modification of object preferences in domestic white leghorn chicks. Dm.Psychobiol. 8(6), 533-540. Foldiiik, P. 1991. Learning invariance from transformation sequences. Neural Comp. 3(2), 194-200. Frkgnac, Y., Shulz, D., Thorpe, S., and Bienenstock, E. L. 1988. A cellular ana- logue of visual cortical plasticity. Nature (London) 333, 367-370. Hoffman, H., and Ratner, A. 1973. A reinforcement model of imprinting. Psy- chd. Rm. 80,527-544. Hopfield, J. J. 1984. Neurons with graded response have collective computa- tional properties like those of two-state neurons. Proc. Natl. Acad. Sci. U.S.A. 81, 3088-3092. Horn, G. 1985. Memory, Imprinting, and the Brain: An Inquiry into Mechanisms. Clarendon Press, Oxford. Horn, G., Bradley, P., and McCabe, B. J. 1985. Changes in the structure of synapses associated with learning. J. Neurosci. 5, 3161-3168. Horn, G., and Johnson, M. H. 1989. Memory systems in the chick Dissociations and neuronal analysis. Neuropsychologia 27, 1-22. Hubel, D., and Wiesel, T. N. 1962. Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106-154. Jaynes, J. 1956. Imprinting: The interaction of learned and innate behavior. I. Development and generalization. 1. Comp. Physiol. Psychol. 49, 200-206. Jaynes, J. 1958. Imprinting: The interaction of learned and innate behavior. IV. Generalization and emergent discrimination. J. Comp. Physiol. Psychol. 51, 238-242. Johnson, M. H. 1991. Information processing and storage during filial imprint- ing. In Kin Recognition, P. Hepper, ed., pp. 335-357. Cambridge University Press, Cambridge. Johnson, M. H., and Horn, G. 1986. Dissociation of recognition memory and associative learning by a restricted lesion of the chick forebrain. Neuropsy- chologia 24, 329-340. Johnson, M. H., and Horn, G. 1987. The role of a restricted region of the chick forebrain in the recognition of conspecifics. Behav. Brain Res. 23, 269-275. Johnson, M. H., and Morton, J. 1991. Bioiogy and Cognitive Development: The Case of Face Recognition. Blackwell, Oxford. Kent, J. 1987. Experiments on the relationship between the hen and chick:

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 388 Randall C. OReilly and Mark H. Johnson

The role of the auditory mode in recognition and the effects of maternal separation. Behaviour 102, 1-14. Kertzman, C., and Demarest, J. 1982. Irreversibility of imprinting after active vs. passive exposure to the object. J. Cornp. Physiol. Psychol. 96, 130-142. Klopfer, P. H. 1967. Stimulus preferences and imprinting. Science 156, 1394- 1396. Klopfer, P. H., and Hailman, J. P. 1964a. Basic parameters of following and imprinting in precocial birds. Z. Tierpsychol. 21, 755-762. Klopfer, P. H., and Hailman, J. P. 1964b. Perceptual preferences and imprinting in chicks. Science 145, 1333-1344. Kossut, M., and Rose, S. 1984. Differential 2-deoxyglucose uptake into chick brain structures during passive avoidance training. J. Neurosci. 12, 971-977. Livingstone, M., and Hubel, D. 1988. Segregation of form, color, movement, and depth: Anatomy, physiology, and perception. Science 240, 740-749. Lorenz, K. 1935. Der Kumpan in der des Vogels. I. Ornithol. 83, 137- 213,289-413. Lorenz, K. 1937. The companion in the bird’s world. Auk 54, 245-273. Maunsell, J. H. R., Nealey, T. A., and DePriest, D. D. 1990. Magnocellular and parvocellular contributions to responses in the middle temporal visual area (MT) of the macaque monkey. J. Neurosci. 10(10), 3323-3334. McCabe, B. J., and Horn, G. 1988. Learning and memory: Regional changes in N-methybaspartate receptors in the chick brain. Proc. Natl. Acad. Sci. U.S.A. 85, 2849-2853. McCabe, 8.J., Cipolla-Neto, J., Horn, G., and Bateson, P. 1982. Amnesic effects of bilateral lesions placed in the hyperstriatum ventrale of the chick after imprinting. Exp. Brain Res. 48, 13-21. McClelland, J. L., and Rumelhart, D. E. 1981. An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychol. Rev. 88(5), 375-407. McCloskey, M., and Cohen, N. J. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In The Psychology of Learning and Motivation, G. H. Bower, ed., Vol. 24, pp. 109-164. Academic Press, San Diego, CA. Mitchell, D. E. 1991. The long-term effectiveness of different regimens of occlu- sion on recovery from early monocular deprivation in kittens. Phil. Trans. Royal Soc. (London) B 333, 51-79. OReilly, R. C., and McClelland, J. L. 1992. The self-organization of spatially in- variant representations. Parallel Distributed Processing and Cognitive Neu- roscience PDP.CNS.92.5, Carnegie Mellon University, Department of Psy- chology. Payne, J., and Horn, G. 1984. Differential effects of exposure to an imprinting stimulus on ’spontaneous’ neuronal activity in two regions of the chick brain. Brain Res. 232, 191-193. Poggio, T., Torre, V., and Koch, C. 1985. Computational vision and regulariza- tion theory. Nature (London) 317, 314-319. Rauschecker, J., and Marler, P. 1987. Irnprintingand Cortical Plasticity: Comparative Aspects of Sensitive Periods. Wiley, New York.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021 Object Recognition and Sensitive Periods 389

Rose, S. 1991. How chicks make memories: The cellular cascade from C-fos to dendritic remodelling. Trends Neurosci. 14, 390-397. Rumelhart, D. E., and Zipser, D. 1986. Feature discovery by competitive learn- ing. In Parallel Distributed Processing, Volume 1: Foundations, D. E. Rumelhart and J. L. McClelland and PDP Research Group, eds., Chap. 5, pp. 151-193. MIT Press, Cambridge, MA. Ryan, C., and Lea, S. 1989. Pattern recognition, updating, and filial imprinting in the domestic chicken (gallus gallus). In Models of Behaviour: Behavioural Approaches to Pattern Recognition and Concept Formation. Quantitative Analyses of Behavior, M. Commons, R. Herrnstein, S. Kosslyn, and D. Mumford, eds., Vol. 8, pp. 89-110. Lawrence Erlbaum, Hillsdale, NJ. Salzen, E. A., and Meyer, C. 1968. Reversibility of imprinting. J. Comp. Physiol. 66, 269-275. Sejnowski, T. J. 1977. Storing covariance with nonlinearly interacting neurons. J. Math. Biol. 4, 303-321. Shapiro, L., and Thurston, K. 1978. The effect of enforced exposure to live models on the reversibility. Psychol. Record 28, 479-485. Sluckin, W. 1972. Imprinting and Early Learning, 2nd edn. Methuen, London. Sluckin, W., and Salzen, E. A. 1961. Imprinting and perceptual learning. Q. J. EXP. Psychol. 13, 65-77. Stanton, P. K., and Sejnowski, T. J. 1989. Associative long-term depression in the hippocampus induced by Hebbian covariance. Nature (London)339,215-218. Stewart, D., Capretta, P., Cooper, A., and Littlefield, V. 1977. Learning in do- mestic chicks after exposure to both discriminanda. J. Comp. Physiol. Psychol. 91, 1095-1109. Tombol, T., Csillag, A., and Stewart, M. G. 1988. Cell types of the hyperstria- tum ventrale of the domestic chicken Gallus domesticus: A golgi study. J. Hinforschung 29(3), 319-334. Yuille, A. L. 1990. Generalized deformable models, statistical physics, and matching problems. Neural Comp. 2(1), 1-24.

Received November 16, 1992; accepted September 10, 1993.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.3.357 by guest on 28 September 2021