Effective Training and Controlled Generalization of Backpropagation Neural Networks Axel Roebel

The Dynamic Pattern Selection Algorithm: Effective Training and Controlled Generalization of Backpropagation Neural Networks Axel Roebel To cite this version: Axel Roebel. The Dynamic Pattern Selection Algorithm: Effective Training and Controlled Gen- eralization of Backpropagation Neural Networks. [Research Report] Technical University of Berlin, Institut for Applied Computer Science. 1994. hal-02911738 HAL Id: hal-02911738 https://hal.archives-ouvertes.fr/hal-02911738 Submitted on 4 Aug 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Dynamic Pattern Selection Algorithm Eective Training and Controlled Generalization of Backpropagation Neural Networks A Rb el Technische Universitt Berlin March 1 Institut fr Angewandte Informatik FG Informatik in Natur und Ingenieurwissenschaften Abstract In the following rep ort the problem of selecting prop er training sets for neural network time series prediction or function approximation is addressed As a result of analyzing the relation b etween approximation and generalization a new measure the generalization factor is intro duced Using this factor and cross validation a new algorithm the dynamic pattern selection is develop ed Dynamically selecting the training patterns during training establishes the p ossibility of controlling the generalization prop erties of the neural net As a consequence of the prop osed selection criterion the generalization error is limited to the training error As an additional b enet the practical problem of selecting a concise training set out of known data is likewise solved By employing two time series prediction tasks the results for dynamic pattern selection training and for xed training sets are compared The favorable prop erties of the dynamic pattern selection namely lower computational exp ense and control of generalization are demonstrated This rep ort describ es a revised version of the algorithm intro duced in Rb el Contents Intro duction Approximation Interp olation and Overtting in the context of func tion theory Cho osing the training set Online Cross Validation Dynamic selection of training patterns Exp erimental results Predicting the Henon mo del Predicting the MackeyGlass mo del Discussion Data requirements Noise Comparison with online training Conclusion Bibliography i Intro duction Since the formulation of the backpropagation algorithm by Rumelhart Hinton and Wil liams there has b een a steadily growing interest on articial neural networks Due to some vague analogies b etween neural networks and the biological nervous systems it has b een exp ected that successful applications of neural networks in elds like Classication Pattern Recognition Nonlinear Signal Pro cessing or Control all areas in which the known technical solutions remain far b ehind the p erformance of the biological systems will b e p ossible in the near future Concerning the theoretical investigations of neural networks there exists encouraging results supp orting these exp ectations However the exp eriences concerning the practical generalization prop erties of neural net works demonstrated that the widely used backpropagation algorithm do es not always achieve the desired generalization precision This is not surprising b ecause as a detailed analysis shows the two tasks which should b e solved during training to represent and to generalize the training examples are not well determined Poggio and Girosi Mathematically sp eaking the conditions for go o d approximation and go o d interp olation are only partly related As a matter of fact the backpropagation algorithm only considers approximation errors for gradient descent and improved approximation will in general not b e accompanied by a b etter interp olation Therefore the interp olation obtained is strongly inuenced by the random starting conditions of the optimization and long train ing times often results in high quality approximations but insucient interp olations This widely known eect often is called overtting There are two dierent strategies to prevent neural networks from overtting The rst one is esp ecially useful if the available data set is small It is based on a heuristic argu ment which states that the simplest mo del will in general achieve the b est generalization or interp olation Following these argument one may try to cho ose as simple a net struc ture as p ossible or state additional constraints on the weights to limit network complexity Weigend et al Ji et al The latter normally are called Constraint Nets However due to the general foundation of this metho d only weak heuristic arguments concerning the interp olation prop erties nd their way into the optimization pro cedure An adaption to the sp ecial problem under investigation is only obtainable with great ad ditional eort and consequently b etter interp olation is restricted to sp ecial well b ehaved problems Moreover the additional optimization exp ense leads to considerably increased training times If there are enough training samples one may follow another strategy which relies up on the chosen training set If the training data is selected carefully it will contain enough information to ensure that the optimal approximating network will have go o d interp ola tion prop erties to o Up to now it is a well known practice to achieve this by selecting very large training sets which in general contain a lot of redundancy Following the latter strategy a new metho d has b een develop ed to ensure valid generaliza tion This algorithm the dynamic pattern selection is based on the batch training variant of the backpropagation algorithm and has b een proven to b e useful in applications with very large data sets The training data is selected during the training phase employing cross validation to revise the actual training set The error of the net function is used to cho ose the pattern which should b e added to the steadily growing training set Rb el The overhead for the selection pro cedure is small Due to the initially small training set size the dynamic pattern selection algorithm leads to more eective training then the standard algorithm Practical exp eriments have shown that it outp erforms current online training variants even in the case of very big and highly redundant data sets Plutowski and White have develop ed a similar algorithm which they call active selection of training sets Their algorithm fo cuses mainly on reducing training set size without considering generalization eects and do es not employ cross validation to con tinually assess the generalization obtained by the training set in use In contrast to their algorithm the dynamic pattern selection prop osed here validates the training set by continually monitoring the the generalization prop erties of the net In the following section the relations b etween approximation interp olation and overtting will b e discussed on a background of function theory Subsequently the known heuristics concerning the numb er and distribution of training patterns are summarized The actual metho ds to cho ose the training sets for neural nets will b e describ ed and the dynamic pattern selection will b e established Thereafter two examples from the eld of nonlinear signal pro cessing are investigated to demonstrate the prop erties of the new algorithm In the last section there will b e a short discussion concerning data requirements noise and a comparison to online training metho ds The following explanations are based on the well known backpropagation algorithm as intro duced by Rumelhart Hinton and Williams Descriptions of this algorithm are widely spread in the literature and will not b e rep eated here Approximation Interp olation and Overtting in the context of function theory There have b een many publications proving that under weak assumptions simple feedfor ward networks with a nite numb er of neurons in one hidden layer are able to approximate n m arbitrary closely all continuous mappings R R HechtNielsen White Concerning practical applications however these results are obviously of limited use b e cause they are not able to establish the required network complexity to achieve a certain approximation delity The conditions by which the stepwise improved approximation achieved with the back propagation algorithm is accompanied by a decreasing interp olation error are not pre cisely known To understand the basic relations it is useful to analyze the optimization pro cedure on the background of function theory The target function x y f x is t 1 assumed to b e smo oth that is f is a memb er of C the set of functions with continuous t derivatives of every order and the domain X of f to b e a compact manifold In general t there is only a limited set of memb ers x X available for which the targets y f x are t F () a ) F ( a 2 F (0) a F (0) i N = f F t n F () i F ( ) i 2 Figure A p ossible

Effective Training and Controlled Generalization of Backpropagation Neural Networks Axel Roebel

A Simple Algorithm for Semi-Supervised Learning with Improved Generalization Error Bound

Statistical Mechanics Methods for Discovering Knowledge from Production-Scale Neural Networks

Multitask Learning with Local Attention for Tibetan Speech Recognition

Lecture 9: Generalization

Issues in Using Function Approximation for Reinforcement Learning

Effective Dimensionality Revisited

Generalizing to Unseen Domains: a Survey on Domain Generalization

Batch Policy Learning Under Constraints

Learning Better Structured Representations Using Low-Rank Adaptive Label Smoothing

Learning Universal Graph Neural Network Embeddings with Aid of Transfer Learning

1 Introduction 2 the Learning Problem

Principles and Algorithms for Forecasting Groups of Time Series: Locality and Globality