Reinforcing Connectionism: Learning the Statistical Way
Total Page:16
File Type:pdf, Size:1020Kb
Reinforcing Connectionism: Learning the Statistical Way Peter Samuel Dayan PhD University of Edinburgh 1991 Declaration I have composed this thesis by myself, and it contains original work of my own execution. Edinburgh, 25' February, 1991 For my family: Hedi and Anthony, Zoë and Daniel 111 Abstract Connectionism's main contribution to cognitive science will prove to be the re- newed impetus it has imparted to learning. Learning can be integrated into the existing theoretical foundations of the subject, and the combination, statistical computational theories, provide a framework within which many connection- ist mathematical mechanisms naturally fit. Examples from supervised and reinforcement learning demonstrate this. Statistical computational theories already exist for certain associative matrix memories. This work is extended, allowing real valued synapses and arbi- trarily biased inputs. It shows that a covariance learning rule optimises the signal/noise ratio, a measure of the potential quality of the memory, and quan- tifies the performance penalty incurred by other rules. In particular two that have been suggested as occurring naturally are shown to be asymptotically optimal in the limit of sparse coding. The mathematical model is justified in comparison with other treatments whose results differ. Reinforcement comparison is a way of hastening the learning of reinforcement learning systems in statistical environments. Previous theoretical analysis has not distinguished between different comparison terms, even though empiri- cally, a covariance rule has been shown to be better than just a constant one. The workings of reinforcement comparison are investigated by a second order analysis of the expected statistical performance of learning, and an alternative rule is proposed and empirically justified. The existing proof that temporal difference prediction learning converges in the mean is extended from a special case involving adjacent time steps to the general case involving arbitrary ones. The interaction between the statistical mechanism of temporal difference and the linear representation is particularly stark. The performance of the method given a linearly dependent representation is also analysed. The method of planning using temporal difference prediction had previously been applied to solve the navigation task of finding a goal in a grid. This is extended to compare the qualities of alternative representations of the environ- ment and to accomplish simple latent learning when the goal is initially absent. Representations that are coarse-coded are shown to perform particularly well, and latent learning can be used to form them. iv Acknowledgements I owe an enormous debt to David Wilishaw. His influence pervades every page, and his inspiration, tolerance, humour, perceptiveness, and guidance quite made my research, and made returning from exile worthwhile. David Wallace made the return possible, provided more than ample doses of enthusiasm and encouragement, and taught me the value of time. I am very grateful to him for his supervision and help. My parents and the rest of my family offered their constant support and help, and carefully fostered my academic pretensions. If only it were medical... Working in David Wilishaw's group at the Centre for Cognitive Science has been enjoyable, educational and exciting in equal measure. I am very grateful to Marcus Frean for many fruitful discussions, for my vicarious exercise, and for healthy competition for our trusty steed. Jay Buckingham, whom I also thank for interesting interactions, kept bonzo and its similarly elevated colleagues pro- cessing perfectly, and provided the means, if never the excuse, for learning Kate Jeffrey and Toby Tyrrell have more lately been valuable colleagues and friends. Jean Wilson, Leslie Chapman and Rosanna Maccagnano firmly kept us all in check, and greatly enlivened the fourth floor. Both the Centre and the Department of Physics brooked my comings and goings and provided most opulent facilities. Many people have read and laboured mightily to improve versions of this thesis. Many thanks to Nick Chater, Steve Finch, Carol Foster, Peter Hancock, Kate Jeffrey, Alex Lascarides, Jon Oberlander (in whose illustriousness as co- - author I basked), Martin Simmen, Randall Stark, Rich Sutton, Toby Tyrrell, David Wallace, Chris Watkins, and David Wilishaw for valiantly providing the irresistible force. The last additionally provided the resistible punctuation. I have had innumerable discussions with all these people, and many other friends and colleagues - Robert Biegler, Mike Malloch, Richard Morris, Mike Oaksford, Steve Renals, Richard Rohwer, Tom Tollenaere, and all the partici- pants in the Connectionist Workshop at Edinburgh; Andy Barto, under whose aegis, and ultimately care, I made a visit to Amherst which made a lasting im- pression, and also his group, particularly Vijaykumar Gullapaffi and Satinder Singh; Geoff Goodhill, also for light, heavy, and Californian entertainment; and the students and faculty of the 1990 Connectionist Models Summer School, particularly Geoff Hinton and Terry Sejnowski. Life in Edinburgh was made fondly memorable by Clare Anderson, Shari Cohn, Alan Simpson, Sally Wassell, Hon Yau, Sophie Cormack, who played the anti- heroine and made the large, stuffed, hate-objects, and especially Carol Foster, whose support and encouragement have meant a great deal. I am very grateful to SERC for their financial assistance. Contents 1 Connectionism in Context 1 1.0 Summary ............................... I 1.1 Introduction .............................. 2 1.2 Levelling the Field .......................... 3 1.3 Statistical Computational Theories ................. 15 1.4 Introduction to the Thesis ....................... 20 2 Optimal Plasticity 25 2.0 Summary ............................... 25 2.1 Introduction ............................. 26 2.2 The Model .............................. 29 2.3 The Ugly, the Bad and the Good .................. 32 2.4 The Optimal and Sub-Optimal Rules ................ 44 2.5 Threshold Setting ........................... 47 2.6 Physics Models ............................ 49 2.7 Further Developments ........................ 52 2.8 Conclusions ............................... 57 3 Reinforcement Comparison 58 3.0 Summary ............................... 58 3.1 Introduction ............................. 59 3.2 Theory ................................. 61 3.3 Results ................................ 69 3.4 Further Developments ........................ 77 V CONTENTS Vi 3.5 Conclusions . 79 4 Temporal Difference: TD(A) for General A 81 4.0 Summary ............................... 81 4.1 Introduction ............................. 82 4.2 TD(A) ................................. 85 4.3 Q-Learning ............................... 104 4.4 Further Developments ........................ 106 4.5 Conclusions .............................. 108 5 Temporal Difference in Spatial Learning 109 5.0 Summary ............................... 109 5.1 Introduction ............................. 109 5.2 Barto, Sutton and Watkins' Model ................. 114 5.3 Alternative Stimulus Representations ............... 123 5.4 Goal-Free Learning .......................... 127 5.5 Further Developments ........................ 136 5.6 Conclusions .............................. 143 6 Context in Connectionism 145 60 Summary ............................... 145 6.1 Introduction .............................. 145 6.2 Context ................................. 148 6.3 Knowledge Representation and Context .............. 152 6.4 Epilogue ............................... 156 Coda 161 A Parameters for Chapter 5 165 Bibliography 169 Today's theses rap tomorrow's fish. Chapter 1 Connectionism in Context To compare Al as mechanised flight with neuroscience as natural aviation is to ignore hot air ballooning. But where, prey, might the hot air come from? After J Oberlander 1.0 Summary The vituperative disputes between connectionism and its more traditional alter- natives are rather confused between methodology and philosophy, explanation and replication, and theoretical neuroscience and practical computer science. These confusions can be untangled through the careful use of alternative levels of analysis, revealing the interest of connectionism to lie in the cornucopia of non-traditional mechanisms and representations that it offers, and its regard for learning. Unfortunately, traditional delineations of levels are rather cavalier in the way they treat learning. This chapter argues that learning can be incorporated into such accounts through the mediation of statistical computational theories, and presents a summary of the rest of the thesis in this light. 1 CHAPTER 1. CONNECTION1SM 1W CONTEXT 2 1.1 Introduction Many of those people who helped forge modern cognitive science out of be- haviourism's sterile mindlessness see connectionism's reappearance as herald- ing a return to the bad old days. It seems to suffer from its products' apparently uninterpretable and unstructured internal representations, and its producers' seemingly empiricist leanings. The ensuing debate has something of the flavour of the old battles over the philosophical (il)legitimacy of artificial intelligence (Al), with proponents of connectionism talking glibly about systems that can- not be built, quite yet, and opponents, using arguments oft used against them, pointing to a few possible theoretical flaws and to many evident inadequacies of existing systems. A salient paradox is