Generalization in Data-Driven Models of Primary
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.10.05.326256; this version posted April 6, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Published as a conference paper at ICLR 2021 GENERALIZATION IN DATA-DRIVEN MODELS OF PRIMARY VISUAL CORTEX Konstantin-Klemens Lurz,1-2,* Mohammad Bashiri,1-2 Konstantin Willeke,1-2 Akshay K. Jagadish,1 Eric Wang,4-5 Edgar Y. Walker,1,2,4-5 Santiago A. Cadena,2,3 Taliah Muhammad,4-5 Erick Cobos,4-5 Andreas S. Tolias,4-5 Alexander S. Ecker,6 Fabian H. Sinz1-5, ** 1 Institute for Bioinformatics and Medical Informatics, University of Tübingen, Germany 2 International Max Planck Research School for Intelligent Systems, Tübingen, Germany 3 Bernstein Center for Computational Neuroscience, University of Tübingen, Germany 4 Department for Neuroscience, Baylor College of Medicine, Houston, TX, USA 5 Center for Neuroscience and Artificial Intelligence, Baylor College of Medicine, Houston, TX, USA 6 Department of Computer Science / Campus Institute Data Science, University of Göttingen, Germany *[email protected], **[email protected] ABSTRACT Deep neural networks (DNN) have set new standards at predicting responses of neural populations to visual input. Most such DNNs consist of a convolutional network (core) shared across all neurons which learns a representation of neural computation in visual cortex and a neuron-specific readout that linearly combines the relevant features in this representation. The goal of this paper is to test whether such a representation is indeed generally characteristic for visual cortex, i.e. gener- alizes between animals of a species, and what factors contribute to obtaining such a generalizing core. To push all non-linear computations into the core where the generalizing cortical features should be learned, we devise a novel readout that reduces the number of parameters per neuron in the readout by up to two orders of magnitude compared to the previous state-of-the-art. It does so by taking advantage of retinotopy and learns a Gaussian distribution over the neuron’s receptive field po- sition. With this new readout we train our network on neural responses from mouse primary visual cortex (V1) and obtain a gain in performance of 7% compared to the previous state-of-the-art network. We then investigate whether the convolutional core indeed captures general cortical features by using the core in transfer learning to a different animal. When transferring a core trained on thousands of neurons from various animals and scans we exceed the performance of training directly on that animal by 12%, and outperform a commonly used VGG16 core pre-trained on imagenet by 33%. In addition, transfer learning with our data-driven core is more data-efficient than direct training, achieving the same performance with only 40% of the data. Our model with its novel readout thus sets a new state-of-the-art for neural response prediction in mouse visual cortex from natural images, generalizes between animals, and captures better characteristic cortical features than current task-driven pre-training approaches such as VGG16. 1 INTRODUCTION A long lasting challenge in sensory neuroscience is to understand the computations of neurons in the visual system stimulated by natural images (Carandini et al., 2005). Important milestones towards this goal are general system identification models that can predict the response of large populations of neurons to arbitrary visual inputs. In recent years, deep neural networks have set new standards in predicting responses in the visual system (Yamins et al., 2014; Vintch et al., 2015; Antolík et al., 2016; Cadena et al., 2019a; Batty et al., 2016; Kindel et al., 2017; Klindt et al., 2017; Zhang et al., 2018; Ecker et al., 2018; Sinz et al., 2018) and the ability to yield novel response characterizations (Walker et al., 2019; Bashivan et al., 2019; Ponce et al., 2019; Kindel et al., 2019; Ukita et al., 2019). 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.10.05.326256; this version posted April 6, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Published as a conference paper at ICLR 2021 Such a general system identification model is one way for neuroscientists to investigate the computa- tions of the respective brain areas in silico. Such in silico experiments exhibit the possibility to study the system at a scale and level of detail that is impossible in real experiments which have to cope with limited experimental time and adaptation effects in neurons. Moreover, all parameters, connections and weights in an in silico model can be accessed directly, opening up the opportunity to manipulate the model or determine its detailed tuning properties using numerical optimization methods. In order for the results of such analyses performed on an in silico model to be reliable, however, one needs to make sure that the model does indeed replicate the responses of its biological counterpart faithfully. This work provides an important step towards obtaining such a generalizing model of mouse V1. High performing predictive models need to account for the increasingly nonlinear response properties of neurons along the visual hierarchy. As many of the nonlinearities are currently unknown, one of the key challenges in neural system identification is to find a good set of characteristic nonlinear basis functions—so called representations. However, learning these complex nonlinearities from single neuron responses is difficult given limited experimental data. Two approaches have proven to be promising in the past: Task-driven system identification networks rely on transfer learning and use nonlinear representations pre-trained on large datasets for standard vision tasks, such as object recognition (Yamins & DiCarlo, 2016). Single neuron responses are predicted from a particular layer of a pre-trained network using a simple readout mechanism, usually an affine function followed by a static nonlinearity. Data-driven models share a common nonlinear representation among hundreds or thousands of neurons, and train the entire network end-to-end on stimulus response pairs from the experiment. Because the nonlinear representation is shared, it is trained via massive multi-task learning (one neuron–one task) and can be learned even from limited experimental data. Task-driven networks are appealing because they only need to fit the readout mechanisms on top of a given representation and thus are data-efficient in terms of the number of stimulus-response pairs needed to achieve good predictive performance (Cadena et al., 2019a). Moreover, as their representations are obtained independently of the neural data, a good predictive performance suggests that the nonlinear features are characteristic for a particular brain area. This additionally offers the interesting normative perspective that the functional representations in deep networks and biological vision could be aligned by common computational goals (Yamins & DiCarlo, 2016; Kell et al., 2018; Kubilius et al., 2018; Nayebi et al., 2018; Sinz et al., 2019; Güçlü & van Gerven, 2014; Kriegeskorte, 2015; Khaligh-Razavi & Kriegeskorte, 2014; Kietzmann et al., 2019). In order to quantify the fit of the normative hypothesis, it is important to compare a given representation to other alternatives (Schrimpf et al., 2018; Cadena et al., 2019a). However, while representations pre-trained on ImageNet are the state-of-the-art for predicting visual cortex in primates (Cadena et al., 2019a; Yamins & DiCarlo, 2016), recent work has demonstrated that pre-training on object categorization (VGG16) yields no benefits over random initialization for mouse visual cortex (Cadena et al., 2019b). Since random representation should not be characteristic for a particular brain area and other tasks that might yield more meaningful representations have not been found yet, this raises the questions whether there are better ways to obtain a generalizing nonlinear representation for mouse visual cortex. Here, we investigate whether such a generalizing representation can instead be obtained from data- driven networks. For this purpose, we develop a new data efficient readout which is designed to push non-linear computations into the core and test whether this core has learned general characteristic features of mouse visual cortex by applying the same criteria as for the task-driven approach: The ability to predict a population of unseen neurons in a new animal (transfer learning). Specifically, we make the following contributions: 1 We introduce a novel readout mechanism that keeps the number of per-neuron parameters at a minimum and learns a bivariate Gaussian distribution for the readout position from anatomical data using retinotopy. With this readout alone, we surpass the previous state-of-the-art performance in direct training by 7%. 2 We demonstrate that a representation pre-trained on thousands of neurons from various animals generalizes to neurons from an unseen animal (transfer learning). It exceeds the direct training condition by another 11%, setting the new state-of-the-art and outperforms a task-driven representation—trained on object recognition—by about 33%. 3 We then show that this generalization can be attributed to the representation and not the readout mechanism, indicating that the data-driven core indeed captures generalizing features of cortex: A representation trained on a single experiment (4.5k examples) in combination with a readout trained on anatomically matched neurons from four experiments (17.5k examples) did not achieve this performance. 4 Lastly, we find that transfer learning with our data-driven core is more data-efficient than direct training, achieving the same performance with only 40% of the data.