Neural Networks Towards Building a Neural Networks Community
Total Page:16
File Type:pdf, Size:1020Kb
Neural Networks 23 (2010) 1135–1138 Contents lists available at ScienceDirect Neural Networks journal homepage: www.elsevier.com/locate/neunet Editorial Towards building a neural networks community December 31, 2010 is my last day as the founding editor-in- Data about reinforcement learning in animals and about chief of Neural Networks and also my 71st birthday! Such vast cognitive dissonance in humans kindled a similar passion. What changes have occurred in our field since the first issue of Neural sort of space and time representations could describe both a Networks was published in 1987 that it seemed appropriate to thought and a feeling? How could thoughts and feelings interact my editorial colleagues for me to reflect on some of the changes to guide our decisions and actions to realize valued goals? Just as to which my own efforts contributed. Knowing where we as a in the case of serial learning, the non-occurrence of an event could community came from may, in addition to its intrinsic historical have dramatic consequences. For example, the non-occurrence of interest, be helpful towards clarifying where we hope to go and an expected reward can trigger reset of cognitive working memory, how to get there. My comments will necessarily be made from a emotional frustration, and exploratory motor activity to discover personal perspective. new ways to get the reward. What is an expectation? What is The founding of Neural Networks is intimately linked to the a reward? What type of dynamics could unify the description founding of the International Neural Network Society (INNS; of cognition, emotion, and action? Within the reinforcement http://www.inns.org) and of the International Joint Conference learning literature, there were also plenty of examples of irrational on Neural Networks (IJCNN; http://www.ijcnn2011.org). These behaviors. If evolution selects successful behaviors, then why are events, in turn, helped to trigger the formation of other groups of so many behaviors irrational? scientists and engineers who are interested in our field, as well as Such issues are still of current research interest. And all of them other conferences to which they could contribute to share their led me to the same theoretical method and modeling framework. ideas. The theoretical method, which I called the Method of Minimal There was little in the way of infrastructure in our community Anatomies, showed how analysis of how an individual's behavior thirty years ago. It was not obvious where to submit a modeling adapts autonomously in real time to a changing world can lead to article for publication, or what conferences to attend where one models of how the brain works. In other words, brains embody could report work to interested colleagues. It was also not clear a natural computational architecture for autonomous adaptation where one could go to school to learn how to model, or even to in real time to a changing world. By bridging the gap between learn about was known in the field. behavioral experience and brain dynamics, the method opened a How did I come to be in a position to contribute to these path towards solving the mind-body problem. developments? That happened in gradual stages over a period of This method led me to introduce networks of neurons in over fifty years. My own beginnings in our field started accidentally which short-term memory (STM), medium-term memory (MTM), in 1957, thirty years before Neural Networks was founded, when I and long-term memory (LTM) traces exist. The STM and LTM was a 17 year-old Freshman at Dartmouth College. At that time, like traces are also known as cell activities, or potentials, and adaptive thousands of other Freshmen, I took an introductory psychology course. The accident was my exposure to classical data about how weights. In such a network, backward effects in time became easy humans and animals learn about the world. To me, these data were to understand. These traces were described within Additive and filled with philosophical paradoxes. And 17 year-olds of a certain Shunting Models that lie at the core of essentially all contemporary disposition are particularly vulnerable to philosophical paradoxes! connectionist models of mind and brain. The Additive Model is For example, data about serial verbal learning in humans – now sometimes called the Hopfield model based on Hopfield's that is, the learning of lists of language or action items through popular 1984 article on this topic. When I introduced the model time – seemed to suggest that events can go ``backwards in time''. in 1957–58, both the paradigm and the equations were new, and Indeed, the non-occurrence of future items in a list to be learned even revolutionary. can dramatically change the distribution of performance errors at The LTM laws to which I was led included gated steepest all previously occurring list positions. The so-called ``bowed serial descent learning laws that allow learned increases and decreases position curve'' showed that the beginning and the end of a list in weights, in order to learn spatially distributed weight patterns. can be learned more easily than the middle, with error gradients From the outset, it was clear that the Hebb hypothesis that learned in performance that point forwards or backwards in time at the weights can only increase was incorrect. When in the 1970s von beginning or end of a list, and error gradients in the middle pointing der Malsburg and I introduced self-organizing maps, such a gated in both directions. In what sort of space and time could events steepest descent law was used in it, and Kohonen adapted this law go backwards in time? What type of dynamics could represent in his applications of self-organizing maps in the 1980s. the non-occurrence of an event? Exposure to such paradoxes The MTM law described how signals can be gated, or multiplied, triggered a passionate intellectual excitement that I had never by chemical transmitters or post-synaptic sites subject to activity- before experienced. dependent habituation. This law started to become broadly used 0893-6080/$ – see front matter ' 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2010.10.006 1136 Editorial / Neural Networks 23 (2010) 1135–1138 in the 1990s due to the work of Abbott, Markram, and their scientific history to try to understand why it seemed so hard colleagues, often under the name of synaptic depression. for people to understand discoveries which, at least to me, Thus, while I was in college, it became clear that, to link mind seemed natural and intuitively appealing. I gradually began to and brain, one needed nonlinear neural networks operating on realize that I was contributing to a major scientific revolution multiple temporal and spatial scales. There was nothing like this whose groundwork was laid by great nineteenth physicists such going on in psychology departments then, which is why, with some as Hermann von Helmholtz, James Clerk Maxwell, and Ernst trepidation, I decided to get my Ph.D. in mathematics. When I Mach. These interdisciplinary scientists were physicists as well graduated from Dartmouth in 1961, the best research group in as psychologists and physiologists. Their discoveries made clear mathematical psychology, and one of the best departments in that understanding the Three N's of the brain would require applied mathematics, were at Stanford University. I therefore went new intuitions and mathematics. As a result, the next generation to Stanford to begin my graduate studies. While at Stanford, I tried of physicists exclusively pursued the major new paradigms of to get my nonlinear neural networks simulated with the help of one relativity theory and quantum theory, which could build on of the best programmers at Stanford's computation center, without great new physical intuitions that were supported by known success. mathematics. Psychologists and physiologists were left with Computers in those days were not up to the task of simulating inadequate intuitions and mathematics with which to explain nonlinear neural networks operating on multiple time scales. their data. Given inadequate tools with which to understand the The learning in such networks made them nonstationary in Three N's, a century of controversy, along with anti-theoretical time. The long-range interactions between neurons could be prejudices, unfolded in the mind-brain sciences as they collected said to be nonlocal. In other words, computers in those days huge, but typically unexplained, data bases. had problems simulating large networks of neurons undergoing Although these historical insights helped me to understand, and nonlinear, nonstationary, and nonlocal interactions—what I called emotionally cope with, the frightening social forces to which I was the Three N's of brain dynamics. My mathematical training helped exposed as a young man, it was still difficult to deal with them. Had me to overcome this roadblock by giving me enough tools to prove I not been first in my class in school for many years, and were it not foundational theorems in the 1960s and 1970s about how STM, a time when the United States was investing large sums of money MTM, and LTM interact, including the first global theorems about in student fellowships, I may not have survived the social pressures content addressable neural memories, and theorems about how against the kind of research that I was doing. the bowed serial position curve arises. This loneliness, combined with my passionate belief in the The theoretical method also facilitated technology transfer importance of this new paradigm shift, drove me to do everything to applications in neuromorphic engineering and technology. that I could, when opportunities presented themselves, to build It did this by clarifying how brain mechanisms give rise to a community wherein future students and practicing scientists behavioral functions while an individual autonomously adapts could readily learn about our field, as if it were the most natural to a changing world.