Neural Networks (AI) (WBAI028-05) Lecture Notes
Total Page:16
File Type:pdf, Size:1020Kb
Herbert Jaeger Neural Networks (AI) (WBAI028-05) Lecture Notes V 1.5, May 16, 2021 (revision of Section 8) BSc program in Artificial Intelligence Rijksuniversiteit Groningen, Bernoulli Institute Contents 1 A very fast rehearsal of machine learning basics 7 1.1 Training data. .............................. 8 1.2 Training objectives. ........................... 9 1.3 The overfitting problem. ........................ 11 1.4 How to tune model flexibility ..................... 17 1.5 How to estimate the risk of a model .................. 21 2 Feedforward networks in machine learning 24 2.1 The Perceptron ............................. 24 2.2 Multi-layer perceptrons ......................... 29 2.3 A glimpse at deep learning ....................... 52 3 A short visit in the wonderland of dynamical systems 56 3.1 What is a “dynamical system”? .................... 58 3.2 The zoo of standard finite-state discrete-time dynamical systems .. 64 3.3 Attractors, Bifurcation, Chaos ..................... 78 3.4 So far, so good ... ............................ 96 4 Recurrent neural networks in deep learning 98 4.1 Supervised training of RNNs in temporal tasks ............ 99 4.2 Backpropagation through time ..................... 107 4.3 LSTM networks ............................. 112 5 Hopfield networks 118 5.1 An energy-based associative memory ................. 121 5.2 HN: formal model ............................ 124 5.3 Geometry of the HN state space .................... 126 5.4 Training a HN .............................. 127 5.5 Limitations ............................... 131 5.6 Miscellaneous notes ........................... 135 6 Moving toward Boltzmann machines 138 6.1 The Boltzmann distribution ...................... 140 6.2 Sampling algorithms .......................... 144 6.3 The Metropolis algorithm ....................... 148 6.4 Simulated annealing: principle ..................... 152 6.5 Simulated annealing: examples .................... 158 7 The Boltzmann machine 160 7.1 Architecture ............................... 162 7.2 The stochastic dynamics of a BM ................... 163 7.3 The learning task ............................ 165 7.4 The learning algorithm ......................... 166 7.5 The restricted Boltzmann machine .................. 168 8 Reservoir computing 170 8.1 A basic demo .............................. 172 8.2 RC in practice .............................. 176 8.3 Online reservoir training ........................ 187 8.4 The echo state property ........................ 188 8.5 Physical reservoir computing ...................... 189 A Elementary mathematical structure-forming operations 195 A.1 Pairs, tuples and indexed families ................... 195 A.2 Products of sets ............................. 196 A.3 Products of functions .......................... 196 B Joint, conditional and marginal probabilities 197 C The argmax operator 203 D The softmax function 204 E Expectation, variance, covariance, and correlation of numerical random variables 204 Bibliography 208 3 Fasten your seatbelts – we enter the world of neu- ral networks Let’s pick up some obvious facts from the surface and take a look at the bottomless voids that open up underneath them. What makes you YOU is your brain. Your brain is a neural network. A neural network is a network made of neurons which connect to each other by synaptic links. Thus, one of the big question of science and life is how YOU are a NETWORK of interconnected NEURONS. The answer seemed clear enough 77 years ago for the pioneers of what we now call computational neuroscience. In the kick-start work on neural networks (McCulloch and Pitts, 1943), a neuron x was cast as binary switch that could have two states — call them 0 and 1, or false and true — and this neuron x becomes switched depending on the 0-1 states of the neurons y1, . , yk which have synaptic links to x. A brain thus was seen as a Boolean circuit. The final sentence in that paper is “Thus in psychology, introspective, behavioristic or physiological, the fundamental relations are those of two-valued logic.” In other words, brains (and you) are digital computers. But. What followed is 77 years of scientific and philosophical dispute, sometimes fierce, and with no winners to the present day. The more we have been learn- ing about neurons and brains and humans and robots and computers, the more confusing the picture became. As of now (the year 2020), neither of what is a NEURON, a NETWORK, or a YOU is clear: NEURONS. Biological neurons are extremely complicated physico-chemical ob- jects, with hundreds of fast and slow chemical and genetic processes inter- acting with each other in a tree-like 3D volume with thousands of branches and roots. Even a supercomputing cluster cannot simulate in real-time all the detail of the nonlinear dynamical processes happening in a single neuron. It is a very much unanswered question whether this extreme complexity of a biological neuron is necessary for the functioning of brains, or whether it is just one of the whims of evolution and one could build equally good brains with much much simpler neurons. NETWORKS. It is generally believed that the power of brains arises from the synaptic interconnectivity architecture. A brain is highly organized society (Minsky, 1987) of neurons, with many kinds of communication channels, local communication languages and dialects, kings and workers and slaves. However, the global, total blueprint of the human brain is not known. It is very difficult to experimentally trace neuron-to-neuron connections in bio- logical brains. Furthermore, some neuron A can connect to another neuron B in many ways, with different numbers and kinds of synapses, attaching to 4 different sites on the target neuron. Most of this connectivity detail is al- most impossible to observe experimentally. The Human Connectome Project (https://en.wikipedia.org/wiki/Human Connectome Project), one of the largest national U.S. research programs in the last years, invested a gigantic concerted effort to find out more and found that progress is slow. YOU. An eternal, and unresolved, philosophical and scientific debate is about whether YOU can be reduced to the electrochemical mechanics of your brain. Even when one assumes that a complete, detailed physico-chemical model of your brain were available (it isn’t), such that one could run a realistic, detailed simulation of your brain on a supercomputer (one can’t), would this simulation explain YOU - entirely and in your essence? the riddles here arise from phenomena like consciousness, the subjective experience of qualia (that you experience “redness” when you see a strawberry), free will and other such philosophical bummers. Opinions are split between researchers / philosophers who, on the one side, claim that everything about you can be explained by a reduction to physico-chemical-anatomical detail, and on the other side claim that this is absolutely impossible. Both have very good arguments. When you start reading this literature you spiral into vertigo. And, on top of that, let me add that it is also becoming unclear again in these days what a COMPUTER is (I’ll say more about that in the last lecture of this course). Given that the scientific study of neural networks is so closely tied up with fundamental questions about ourselves, it is no wonder that enormous intellectual energies have been spent in this field. Over the decades, neural network research has produced a dazzling zoo of mathematical and computational models. They range from detailed accounts of the functioning of small neural circuits, compris- ing a few handfuls of neurons (celebrated: the modeling of a 30-neuron circuit in crustaceans (Marder and Calabrese, 1996)), to global brain models of grammatical and semantic language processing in humans (Hinaut et al., 2014); from low-detail models of single neurons (1963 Nobel prize for Alan Hodgkin and Andrew Huxley for a electrical engineering style formula describing the overall electrical dynamics of a neuron in good approximation (Hodgkin and Huxley, 1952)), to supercomplex geometrical-physiological compartment models on the high-detail side (Gouwens and Wilson, 2009)); from statistical physics oriented models that can “only” ex- plain how a piece of brain tissue maintains an average degree of activity (van Vreeswijk and Hansel, 2001) to AI inspired models that attempt to explain ev- ery millisecond in speech understanding (Shastri, 1999); from models that aim to capture biological brains but thereby become so complex that they give satis- factory simulation results only if they are super delicately fine-tuned (Freeman, 1987) to the super general and flexible and robust neural learning architectures that have made deep learning the most powerful tool of modern machine learning applications (Goodfellow et al., 2016); or from models springing from a few most 5 elegant mathematical principles like the Boltzmann machine (Ackley et al., 1985) to complex neural architectures that are intuitively put together by their inventors and which function well but leave the researchers without a glimmer of hope for mathematical analysis (for instance the “neural Turing machine” of Graves et al. (2014)). In this course I want to unfold for you this world of wonder. My goal is to make you aware of the richness of neural network research, and of the manifold perspectives on cognitive processes afforded by neural network models. A student of AI should, I am convinced, be able to look at “cognition” from many sides — from application-driven machine learning