Institut Für Neuroinformatik Ruhr-Universität Bochum
Total Page:16
File Type:pdf, Size:1020Kb
Institut für Ruhr-Universität Neuroinformatik Bochum Internal Rep ort Ob ject Recognition with a Sparse and Autonomously Learned Representation Based on Banana Wavelets by Norb ert Kruger Gabriele Peters Christoph von der Malsburg IRINI RuhrUniversitat Bo chum Dezemb er Institut fur Neuroinformatik ISSN Bo chum Ob ject Recognition with a Sparse and Autonomously Learned Representation Based on Banana Wavelets Norb ert Kruger x Gabriele Petersx Christoph von der Malsburgxz x RuhrUniversitat Bo chum Institut fur Neuroinformatik D Bo chum Germany z University of Southern California Dept of Computer Science and Section for Neurobiology Los Angeles CA USA Abstract We intro duce an ob ject recognition system based on the well known Elastic Graph Matching EGM but includes signicant improvements compared to earlier versions Our basic features are banana wavelets which are generalized Gab or wavelets In addition to the qualities frequency and orientation banana wavelets have the attributes curvature and size Banana wavelets can b e metrically organized A sparse and ecient representation of ob ject classes is learned utilizing this metric organization Learning is guided by a sensible amount of a priori knowledge in form of basic principles The learned representation is used for a fast matching Signicant sp eed up can b e achieved by hierarchical pro cessing of features Furthermore manual construction of ground truth is replaced by an automatic generation of suitable training examples using motor controlled feedback We motivate the biological plausibility of our approach by utilizing concepts like hierarchical pro cessing or metrical organization of features inspired by brain research and criticize a to o detailed mo delling of biological pro cessing Intro duction In this pap er we describ e a novel ob ject recognition system in which representations of ob ject classes can b e learned automatically The learned representations allow a fast and eective lo cation and identication of ob jects in compli cated scenes Our ob ject recognition system is based on three pillars Firstly our prepro cessing is based on the idea of sparse coding Secondly eective learning is guided by a priori constraints covering fundamental structure of the visual world Thirdly we use Elastic Graph Matching EGM for the lo cation and identication of ob jects A sparse representation can b e dened as a co ding of an ob ject by a smal l number of binary features taken from a large feature space A certain feature is only useful for co ding a small subset of ob jects and is not applicable for most of the other ob jects Sparse co ding has biologically motivated advantages like minimizing wiring length for forming asso ciations Baum et al p oint to the increase of asso ciative memory capacity provided by a sparse co de Ohlshausen Field argue that the retinal pro jection of the threedimensional world has a sparse structure and therefore a sparse co de meets the principle of redundancy reduction by reducing higherorder statistical correlations of the input As an additional advantage to the reasons mentioned ab ove our matching algorithm achieves a siginicant sp eedup by utilizing the fact that only a small numb er of features is required in our sparse representation of an ob ject For a more detailed discussion of sparse co ding we refer to Our representation of a certain view of an ob ject class comprises only imp ortant features These are extracted from dierent examples see gure iiv The central assumption of our learning algorithm necessitates on a priori knowledge applied to the system in the form of general principles and mechanisms Learning is inherently faced with the biasvariance dilemma If the starting conguration of the system is very general it can learn from and sp ecialize to a wide variety of domains but it will in general have to buy this advantage by having many internal degrees of freedom This is a serious problem since the numb er of examples needed to train a system scales very badly with the systems size quickly leading to totally unrealistic learning time or else with a limited set of training examples the system will trivially adapt to its accidental p eculiarities and the system will fail to generalize prop erly Supp orted by grants from the German Ministry for Science and Technology INE NEUROS and MA Electronic Eye to new examples This is the variance problem On the other hand if the initial system has few degrees of freedom it may b e able to learn eciently but unless the system is designed with much sp ecic insight into the domain at hand the solution we criticized ab ove there is great danger that the structural domain spanned by those degrees of freedom do es not cover the given domain of application at all the bias problem a) b) i) ii) iii) iv) v) Figure iiv Dierent examples of cans and faces used for learning v The learned representations We prop ose that a priori knowledge is needed to overcome the biasvariance dilemma The challenge here is to attain generality and to avoid the extreme of equipping the system with manually constructed sp ecic domain knowledge such as geometry and physics in general or even the geometric and physical structure of ob jects themselves We have formulated a numb er of a priori principles to reduce the dimension of the search space and to guide learning ie to handle the varianceproblem We assume that we can avoid the biasproblem b ecause of the general applicability of those principles All these principles are concerned with the selection of imp ortant features from a predened feature space P P P and the structure thereof P In and we have already made use of the following principles P Lo cality Features refering to dierent lo cations are treated as indep endent P Invariance Features are preferred which are invariant under a wide range of ob ject transformations P Minimal Redundancy Features should b e selected for minimal redundancy of information Here we intro duce a principle P as an imp ortant additional constraint P Lo cal Feature Assumption Signicant features of a lo cal area of the twodimensional pro jection of the visual world are lo calized curved lines We formalize P by extending the concept of Gab or wavelets see eg to banana wavelets section To the parameters frequency and orientation we add curvature and size see gure An ob ject can b e represented as a conguration of a few of these features gure v therefore it can b e co ded sparsely The space of banana wavelet resp onses can b e understo o d as a metric space its metric representing the similarity of features This metric is utilized for the learning of a representation of ob jects and for recognition of these ob jects during the matching pro cedure The banana wavelet resp onses can b e derived from Gab or wavelets resp onses by hierarchical pro cessing to gain sp eed and reduce memory requests see section A set of examples of a certain view of an ob ject class gure iiv is used to learn a sparse representation sections and which contains only the imp ortant features ie features which are robust against changes of background and illumination or slight variations in scale and orientation This sparse representation allows for quickly and eectively lo cating see section by using EGM Our system has certain analogies to the visual system of vertebrates There is evidence for curvature sensitive features pro cessed in a hierchical manner in early stages sparse co ding is discussed as a co ding scheme used in the visual system and metric organization of features seems to play an imp ortant role for information pro cessing in the brain Instead of detailed mo delling of brain areas we aim to apply some basic concepts inspired by brain research like sparse co ding hierarchical pro cessing metrical organisation of features etc in our articial ob ject recognition system We think a system do es not necessarily need to contain neurons or hebbian plasticity to b e called biologically motivated Mayb e we miss the imp ortant asp ects of information pro cessing in the brain by lo oking on a to o detailed level After all humans did not build planes with feathers but the observation of birds inspired the understanding of the basic principles of ying which are used by any airplane For a more detailed discussion of the analogy to biology we refer to To enable simultaneously a rough understanding of the basic ideas of the approach and a detailed description of the algorithm this pap er can b e read in two mo des For every subsections we give rst a short summary and then a more detailed description b eginning with the phrases Formally sp eaking or More formally The reader may skip the latter parts for a rough understanding or a rst reading size size size size frequency frequency frequency frequency curvature curvature size size size curvature curvature direction direction Figure Relation b etween Gab or wavelets and banana wavelets Left four examples of Gab or wavelets which dier in frequency and direction only Right examples of banana wavelets which are related to the Gab or wavelets on the left Banana wavelets are describ ed by two additional parameters curvature and size The Banana Space In this section we describ e our realization of principle P a feature generation based on banana wavelets and its metric organization in the banana space P gives us a signicant reduction of the search space Instead of allowing eg all linear lters as p ossible features we restrict ourself to a small subset Considering the risk of a wrong feature