Introduction to Radial Basis Function Networks
Total Page:16
File Type:pdf, Size:1020Kb
Intro duction to Radial Basis Function Networks Mark J L Orr Centre for Cognitive Science University of Edinburgh Buccleuch Place Edinburgh EH LW Scotland April Abstract This do cumentisanintro duction to radial basis function RBF networks a typ e of articial neural network for application to problems of sup ervised learning eg regression classication and time series prediction It is now only available in PostScript an older and now unsupp orted hyp ertext ver sion maybeavailable for a while longer The do cumentwas rst published in along with a package of Matlab functions implementing the metho ds describ ed In a new do cument Recent Advances in Radial Basis Function Networks b ecame available with a second and improved version of the Matlab package mjoancedacuk wwwancedacukmjopapersintrops wwwancedacukmjointrointrohtml wwwancedacukmjosoftwarerbfzip wwwancedacukmjopapersrecadps wwwancedacukmjosoftwarerbfzip Contents Intro duction Sup ervised Learning Nonparametric Regression Classication and Time Series Prediction Linear Mo dels Radial Functions Radial Basis Function Networks Least Squares The Optimal WeightVector The Pro jection Matrix Incremental Op erations The Eective NumberofParameters Example Mo del Selection Criteria CrossValidation Generalised CrossValidation Example Ridge Regression Bias and Variance Optimising the Regularisation Parameter Lo cal Ridge Regression Optimising the Regularisation Parameters Example Forward Selection Orthogonal Least Squares Regularised Forward Selection Regularised Orthogonal Least Squares Example A App endices A Notational Conventions A Useful Prop erties of Matrices A Radial Basis Functions A The Optimal WeightVector A The Variance Matrix A The Pro jection Matrix A Incremental Op erations A Adding a new basis function A Removing an old basis function A Adding a new training pattern A Removing an old training pattern A The Eective NumberofParameters A Leaveoneout Crossvalidation A A ReEstimation Formula for the Global Parameter A Optimal Values for the Lo cal Parameters A Forward Selection A Orthogonal Least Squares A Regularised Forward Selection A Regularised Orthogonal Least Squares Intro duction This do cumentisanintro duction to linear neural networks particularly radial basis function RBF networks The approach describ ed places an emphasis on retaining as much as p ossible the linear character of RBF networks despite the fact that for go o d generalisation there has to be some kind of nonlinear optimisation The two main advantages of this approacharekeeping the mathematics simple it is just linear algebra and the computations relatively cheap there is no optimisation by general purp ose gradient descent algorithms Linear mo dels have b een studied in statistics for ab out years and the theory is applicable to RBF networks which are just one particular typ e of linear mo del However the fashion for neural networks which started in the mids has given rise to new names for concepts already familiar to statisticians Table gives some examples Such terms are used interchangeably in this do cument statistics neural networks mo del network estimation learning regression sup ervised learning interp olation generalisation observations training set parameters synaptic weights indep endent variables inputs dep endent variables outputs ridge regression weight decay Table Equivalent terms in statistics and neural networks The do cument is structured as follows We rst outline sup ervised learning section the main application area for RBF networks including the related areas of classication and time series prediction section We then describ e linear mo dels section including RBF networks section Least squares optimisation section including the eects of ridge regression is then briey reviewed followed by mo del selection section After that we cover ridge regression section in more detail and lastly we lo ok at forward selection section for building networks Most of the mathematical details are put in an app endix section A For alternative approaches see for example the work of Platt and asso ciates and of Fritzke Sup ervised Learning A ubiquitous problem in statistics with applications in many areas is to guess or es timate a function from some example inputoutput pairs with little or no knowledge of the form of the function So common is the problem that it has dierent names in dierent disciplines eg nonparametric regression function approximation system identication inductive learning In neural network parlance the problem is called supervisedlearning The func tion is learned from the examples which a teacher supplies The set of examples or training set contains elements which consist of paired values of the indep endent input variable and the dep endent output variable For example the indep endent variable in the functional relation y f x is x a vector and the dep endent variable is y a scalar The value of the variable y dep ends through the function f on each of the comp onents of the vector variable x x x x n Note that we are using b old lowercase letters for vectors and italicised lowercase letters for scalars including scalar valued functions like f see app endix A on notational conventions The general case is where b oth the indep endent and dep endent variables are vectors This adds more mathematics but little extra insight to the sp ecial case of univariate output so for simplicitywe will conne our attention to the latter Note however that multiple outputs can be treated in a sp ecial way in order to reduce redundancy The training set in which there are p pairs indexed by i running from up to p is represented by p T fx y g i i i The reason for the hat over the letter y another convention see app endix A indicating an estimate or uncertain value is that the output values of the training set are usually assumed to b e corrupted by noise In other words the correct value to pair with x namely y is unknown The training set only sp ecies y which is i i i equal to y plus a small amount of unknown noise i In real applications the indep endentvariable values in the training set are often also aected by noise This typ e of noise is more dicult to mo del and we shall not attempt it In any case taking account of noise in the inputs is approximately equivalent to assuming noiseless inputs but with an increased amount of noise in the outputs Nonparametric Regression There are two main sub divisions of regression problems in statistics parametric and nonparametric In parametric regression the form of the functional relationship between the dep endent and indep endentvariables is known but maycontain param eters whose values are unknown and capable of b eing estimated from the training set For example tting a straightline f x ax b p to a bunch of points fx y g see gure is parametric regression b ecause i i i the functional form of the dep endence of y on x is given even though the values of a and b are not Typicallyinany given parametric problem the free parameters as well as the dep endent and indep endent variables have meaningful interpretations like initial water level or rate of ow P P P P x y r P r P P x y P P x y P P r P P P P P P y P x y P P r r P P P x y P P P P P x Figure Fitting a straightlinetoabunchofpoints is a kind of parametric regression where the form of the mo del is known The distinguishing feature of nonparametric regression is that there is no or very little a priori knowledge ab out the form of the true function which is b eing estimated The function is still mo delled using an equation containing free parame ters but in a way which allows the class of functions which the mo del can represent to be very broad Typically this involves using many free parameters which have no physical meaning in relation to the problem In parametric regression there is typically a small numb er of parameters and often they havephysical interpretations Neural networks including radial basis function networks are nonparametric mo dels and their weights and other parameters have no particular meaning in relation to the problems to which they are applied Estimating values for the weights of a neural network or the parameters of any nonparametric mo del is never the primary goal in sup ervised learning The primary goal is to estimate the underlying function or at least to estimate its output at certain desired values of the input On the other hand the main goal of parametric regression can b e and often is the estimation