Machine Learning with Neural Networks an Introduction for Scientists and Engineers

Machine learning with neural networks arXiv:1901.05639v3 [cs.LG] 10 Feb 2021 An introduction for scientists and engineers BERNHARD MEHLIG Department of Physics UNIVERSITY OF GOTHENBURG Göteborg, Sweden 2021 Machine learning with neural networks An introduction for scientists and engineers BERNHARD MEHLIG Department of Physics University of Gothenburg Göteborg, Sweden 2021 ACKNOWLEDGEMENTS This textbook is based on lecture notes for the course Artificial Neural Networks that I have given at Gothenburg University and at Chalmers Technical University in Gothenburg, Sweden. When I prepared my lectures, my main source was Intro- duction to the theory of neural computation by Hertz, Krogh, and Palmer [1]. Other sources were Neural Networks: a comprehensive foundation by Haykin [2], Horner’s lecture notes [3] from Heidelberg, Deep learning by Goodfellow, Bengio & Courville [4], and also the online book Neural Networks and Deep Learning by Nielsen [5]. I thank Martin Cejkaˇ for typesetting the first version of my hand-written lecture notes, Erik Werner and Hampus Linander for their help in preparing Chapter8, Kris- tian Gustafsson for his detailed feedback on Chapter 11, Nihat Ay for his comments on Section 4.5, and Mats Granath for discussions about autoencoders. I would also like to thank Juan Diego Arango, Oleksandr Balabanov, Anshuman Dubey, Johan Fries, Phillip Gräfensteiner, Navid Mousavi, Marina Rafajlovic, Jan Schiffeler, Ludvig Storm, and Arvid Wenzel Wartenberg for implementing the algorithms described in this book. Many Figures are based on their results. Oleksandr Balabanov, An- shuman Dubey, Jan Meibohm, and in particular Johan Fries and Marina Rafajlovic contributed exam questions that became exercises in this book. Finally, I would like to express my gratitude to Stellan Östlund, for his encouragement and criticism. Last but not least, a large number of colleagues and students – past and present – pointed out misprints and errors, and suggested improvements. I thank them all. The cover image shows an input pattern designed to maximise the activation of certain hidden neurons in a deep convolutional neural network [126,127]. See also page 165. Image by Hampus Linander. Reproduced with permission. CONTENTS Acknowledgementsv Contents vii 1 Introduction1 1.1 Neural networks . .6 1.2 McCulloch-Pitts neurons . .6 1.3 Activation functions . .9 1.4 Asynchronous updates . 11 1.5 Summary . 11 1.6 Further reading . 11 I Hopfield networks 13 2 Deterministic Hopfield networks 15 2.1 Pattern recognition . 15 2.2 Hopfield networks and Hebb’s rule . 16 2.3 The cross-talk term . 22 2.4 One-step error probability . 24 2.5 Energy function . 27 2.6 Summary . 30 2.7 Exercises . 31 3 Stochastic Hopfield networks 35 3.1 Stochastic dynamics . 35 3.2 Order parameters . 36 3.3 Mean-field theory . 38 3.4 Critical storage capacity . 42 3.5 Beyond mean-field theory . 49 3.6 Correlated and non-random patterns . 50 3.7 Summary . 51 3.8 Further reading . 51 3.9 Exercises . 52 4 The Boltzmann distribution 54 4.1 Convergence of the stochastic dynamics . 55 4.2 Monte-Carlo simulation . 57 4.3 Simulated annealing . 59 4.4 Boltzmann machines . 62 4.5 Restricted Boltzmann machines . 66 4.6 Summary . 71 4.7 Further reading . 71 4.8 Exercises . 72 II Supervised learning 77 5 Perceptrons 79 5.1 A classification problem . 81 5.2 Iterative learning algorithm . 84 5.3 Gradient descent for linear units . 85 5.4 Classification capacity . 88 5.5 Multi-layer perceptrons . 91 5.6 Summary . 95 5.7 Further reading . 96 5.8 Exercises . 96 6 Stochastic gradient descent 102 6.1 Chain rule and error backpropagation . 102 6.2 Stochastic gradient-descent algorithm . 105 6.3 Preprocessing the input data . 108 6.4 Overfitting and cross validation . 112 6.5 Adaptation of the learning rate . 115 6.6 Summary . 117 6.7 Further reading . 117 6.8 Exercises . 118 7 Deep learning 123 7.1 How many hidden layers? . 123 7.2 Vanishing and exploding gradients . 128 7.3 Rectified linear units . 135 7.4 Residual networks . 136 7.5 Outputs and energy functions . 139 7.6 Regularisation . 142 7.7 Summary . 149 7.8 Further reading . 149 7.9 Exercises . 150 8 Convolutional networks 152 8.1 Convolution layers . 153 8.2 Pooling layers . 155 8.3 Learning to read handwritten digits . 156 8.4 Coping with deformations of the input distribution . 159 8.5 Deep learning for object recognition . 160 8.6 Summary . 163 8.7 Further reading . 165 8.8 Exercises . 165 9 Supervised recurrent networks 168 9.1 Recurrent backpropagation . 170 9.2 Backpropagation through time . 173 9.3 Vanishing gradients . 178 9.4 Recurrent networks for machine translation . 180 9.5 Reservoir computing . 183 9.6 Summary . 186 9.7 Further reading . 186 9.8 Exercises . 187 III Learning without labels 189 10 Unsupervised learning 191 10.1 Oja’s rule . 191 10.2 Competitive learning . 195 10.3 Self-organising maps . 197 10.4 K -means clustering . 203 10.5 Radial basis functions . ..

Load more