Convolutional Neural Network Models of V1 Responses to Complex Patterns
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/296301; this version posted April 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Journal of Computational Neuroscience manuscript No. (will be inserted by the editor) Convolutional neural network models of V1 responses to complex patterns Yimeng Zhang∗ · Tai Sing Lee · Ming Li · Fang Liu · Shiming Tang∗ Received: date / Accepted: date Abstract In this study, we evaluated the convolutional and Wiesel, 1968, 1959, 1962). V1 neurons are tradi- neural network (CNN) method for modeling V1 neu- tionally classified as simple and complex cells, which rons of awake macaque monkeys in response to a large are modeled by linear-nonlinear (LN) models (Heeger, set of complex pattern stimuli. CNN models outper- 1992) and energy models (Adelson and Bergen, 1985), formed all the other baseline models, such as Gabor- respectively. However, a considerable gap between the based standard models for V1 cells and various vari- standard theory of V1 neurons and reality has been ants of generalized linear models. We then systemati- demonstrated repeatedly, at least from two aspects. First, cally dissected different components of the CNN and although standard models explain neural responses to found two key factors that made CNNs outperform simple stimuli such as gratings well, they cannot explain other models: thresholding nonlinearity and convolu- satisfactorily neural responses to more complex stimuli, tion. In addition, we fitted our data using a pre-trained such as natural images and complex shapes (David and deep CNN via transfer learning. The deep CNN's higher Gallant, 2005; Victor et al., 2006; Hegd´eand Van Es- layers, which encode more complex patterns, outper- sen, 2007; K¨osterand Olshausen, 2013). Second, more formed lower ones, and this result was consistent with sophisticated analysis techniques have revealed richer our earlier work on the complexity of V1 neural code. structures in V1 neurons than those dictated by stan- Our study systematically evaluates the relative merits dard models (Rust et al., 2005; Carandini et al., 2005). of different CNN components in the context of V1 neu- As an additional yet novel demonstration of this gap, ron modeling. using large-scale calcium imaging techniques, we (Li et al., 2017; Tang et al., 2018) have recently discovered Keywords convolutional neural network V1 · · that a large percentage of neurons in the superficial lay- nonlinear regression system identification · ers of V1 of awake macaque monkeys respond strongly to highly specific complex features; this finding suggests that some V1 neurons act as complex pattern detectors 1 Introduction rather than Gabor-based edge detectors as dictated by classical studies (Jones and Palmer, 1987a; Dayan and There has been great interest in the primary visual cor- Abbott, 2001). tex (V1) since pioneering studies decades ago (Hubel While our previous work (Tang et al., 2018) has ∗ YZ and ST are co-corresponding authors. shown the existence of complex pattern detector neu- YZ and TL rons in V1, a quantitative understanding of the rela- Center for the Neural Basis of Cognition and Computer Sci- tionship between input stimuli and neural responses for ence Department, Carnegie Mellon University, Pittsburgh, PA 15213 those neurons has been lacking. One way to better un- derstand these neurons quantitatively is to build com- ML, FL, and ST Peking University School of Life Sciences and Peking- putational models that predict their responses given in- Tsinghua Center for Life Sciences, Beijing 100871, China put stimuli (Wu et al., 2006). If we can find a model that IDG/McGovern Institute for Brain Research at Peking Uni- accurately predicts neural responses to (testing) stim- versity, Beijing 100871, China uli not used during training, a careful analysis of that bioRxiv preprint doi: https://doi.org/10.1101/296301; this version posted April 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2 Yimeng Zhang∗ et al. model should give us insights into the computational literature, we believe that this is the first study that mechanisms of the modeled neuron(s). For example, we systematically evaluates the relative merits of different can directly examine different components of the model CNN components in the context of V1 neuron model- (McIntosh et al., 2017; McFarland et al., 2013; Prenger ing. et al., 2004), find stimuli that maximize the model out- put (Kindel et al., 2017; Olah et al., 2017), and decom- pose model parameters into simpler, interpretable parts 2 Stimuli and neural recordings (Rowekamp and Sharpee, 2017; Park et al., 2013). A large number of methods have been applied to 2.1 Stimuli model V1 neural responses, such as ordinary least squares (Theunissen et al., 2001; David and Gallant, 2005), Using two-photon calcium imaging techniques, we col- spike-triggered average (Theunissen et al., 2001), spike- lected neural population data in response to a large set triggered covariance (Touryan et al., 2005; Rust et al., of complex artificial \pattern" stimuli. The \pattern" 2005), generalized linear models (GLMs) (Kelly et al., stimulus set contains 9500 binary (black and white) im- 2010; Pillow et al., 2008), nested GLMs (McFarland ages of about 90 px by 90 px from five major categories: et al., 2013), subunit models (Vintch et al., 2015), and orientation stimuli (OT; bars and gratings), curvature artificial neural networks (Prenger et al., 2004). Com- stimuli (CV; curves, solid disks, and concentric rings), pared to more classical methods, convolutional neural corner stimuli (CN; line or solid corners), cross stim- networks (CNNs) have recently been found to be more uli (CX; lines crossing one another), and composition effective for modeling retinal neurons (Kindel et al., stimuli (CO; patterns created by combining multiple el- 2017) and V1 neurons in two studies concurrent to ours ements from the first four categories). The last four cat- (McIntosh et al., 2017; Cadena et al., 2017). In addi- egories are also collectively called non-orientation stim- tion, CNNs have been used for explaining inferotempo- uli (nonOT). See Figure 1 for some example stimuli. In ral cortex and some other areas (Yamins et al., 2013; this study, the central 40 px by 40 px parts of the stim- Kriegeskorte, 2015; Yamins and DiCarlo, 2016). Never- uli were used as model input as 40 pixels translated to theless, existing studies mostly treat the CNN as a black 1:33 degrees in visual angle for our experiments and all box without analyzing much the reasons underlying its recorded neurons had classical receptive fields of diam- success relative to other models, and we are trying to eters well below one degree in visual angle around the fill that knowledge gap explicitly in this study. stimulus center (Tang et al., 2018). The cropped stimuli were further downsampled to 20 px by 20 px for compu- To understand the CNN's success better, we first tational efficiency. Later, we use x to represent the t-th evaluated the performance of CNN models, Gabor-based t stimulus as a 20 by 20 matrix, with 0 for background standard models for simple and complex cells, and vari- and 1 for foreground (there can be intermediate values ous variants of GLMs on modeling V1 neurons of awake due to downsampling), and x~ to denote the vectorized macaque monkeys in response to a large set of complex t version of x as a 400-dimensional vector. pattern stimuli (Tang et al., 2018). We found that CNN t models outperformed all the other models, especially for neurons that acted more like complex pattern de- Stimulus type Previous work modeling V1 neurons mostly tectors than Gabor-based edge detectors. We then sys- used natural images or natural movies (Kindel et al., tematically explored different variants of CNN models 2017; Cadena et al., 2017; David and Gallant, 2005), in terms of their nonlinear structural components, and while we used artificial pattern images (Tang et al., found that thresholding nonlinearity and max pooling, 2018). While neural responses to natural stimuli ar- especially the former, were important for the CNN's guably reflect neurons' true nature better, it has the performance. We also found that convolution (spatially following problems in our current study: 1) public data shifted filters with shared weights) in the CNN was ef- sets (Coen-Cagli et al., 2015) of V1 neurons typically fective for increasing model performance. Finally, we have much fewer images and neurons than our data set, used a pre-trained deep CNN (Simonyan and Zisser- and limited data may introduce bias on the results; 2) man, 2014) to model our neurons via transfer learn- artificially generated images can be easily classified and ing (Cadena et al., 2017), and found that the deep parameterized, and this convenience allows us to clas- CNN's higher layers, which encode more complex pat- sify neurons and compare models over different neu- terns, outperformed lower ones; the result was consis- ron classes separately (Section 2.2). While white noise tent with our earlier work (Tang et al., 2018) on the stimuli (Rust et al., 2005; McIntosh et al., 2017) are complexity of V1 neural code. While some of our ob- another option, we empirically found that white noise servations have been stated in alternative forms in the stimuli (when limited) would not be feasible for finding bioRxiv preprint doi: https://doi.org/10.1101/296301; this version posted April 6, 2018.