<<

This article was downloaded by:[Igel, Christian] On: 13 June 2008 Access Details: [subscription number 794040982] Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Connection Science Publication details, including instructions for authors and subscription information: http://www.informaworld.com/smpp/title~content=t713411269 Task-dependent of modularity in neural networks Michael Hüsken; Christian Igel; Marc Toussaint

Online Publication Date: 01 September 2002 To cite this Article: Hüsken, Michael, Igel, Christian and Toussaint, Marc (2002) 'Task-dependent evolution of modularity in neural networks ', Connection Science, 14:3, 219 — 229 To link to this article: DOI: 10.1080/09540090208559328 URL: http://dx.doi.org/10.1080/09540090208559328

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 Abstract. modularity. Keywords: The ture. may modularity, systems, Connection 1. meaning way For study monitor them monitor out on their technical aspects technical from neuroscientist modularity modularity can modularity to Hiisken, are information an (1983) *This handle It anatomical and Introduction structured, that this Connection performance be these neurons these The is paper the the work: important defined as defined architectures. et purpose, the although learning can counterbalance although of viewpoint specific demand al. evolution structure Task-dependent evolution networks* Institut MARC MICHAEL There fax: tel: email: Germany is in Science, has modularity in processing in modularity (2001). 'A a revised a Science neural uses +49 during evolutionary processes evolutionary during with cells, with +49 raised a raised lot be module information we (huesken, of exist of fur the to for a sense. functional 234 the TOUSSAINT viewed 234 evolution, ISSN of Vol. biological are of systems define neural computation. neural stress fast In Neuroinformatik, and extended and underlying word "module", behavioural 32 25558 connectionist models in models connectionist HUSKEN, 32 many ideas many is the of 0954-0091 connected. 14, various columns, layers, learning a specialised, 14209 biological of DOI: as the problem that there quantitative measures for igel, case No. http://www.tandf.co.uk/journals because interest among researchers dealing types a learning, as fundamental 10.1080/0954009021000047892 mt}@neuroinformatik.ruhr-uni-bochum.de of 3,2002, ways' print1ISSN graph well version systems increases a feedforward CHRISTIAN of or and In and particular relevance particular the vertebrate as technical cognitive science cognitive are (Elman measures for modularity, task-decomposition slhe encapsulated mental constituting Ruhr- particular, is 219-229 assumptions technical and of 1360-0494 different concepts different reflected It of modularity of is the usually appears design Universitat Bochurn, regions under et some GECCO the al. neural network (NN), architecture IGEL neural online neural 1996). one by the number brain context principle referring the different constraints. different the imperfection which about the about to the O and property could 2001 selective be systems A degree to 2002 is highly systems. architecture definition the divide in obvious of Late-Breaking of Taylor to be of modular problems. modular neural species7. of with modules. the development and development of complex systems. complex based depends pressure towards pressure of 44780 neurons and modular that up We of modularity and & fact biological architectures, to Francis of the the Taylor &Francis Taylor has empirically of on modularity emphasize that In general, Bochum, labour architec- 'When crucially adapted Paper evolved It Fodor7s Ltd in brains turns Croup both and the by of a Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 individual, algorithms: complex Jordan matrix mization on by Friedrich and Moraga This could a technical the creation are to had modular utilized however NNs, and a hierarchical development by subproblems, For modularity? good towards modular architectures; often presumed often no that with modularity. architectures. Different considered want performance, et by (2001), connected arbitrarily weights) of 220 al. know In means In inheritance Our increase modular arguments introducing finer-grained introducing favoured. This can development reliably solve given quantify modularity example, a basic 1989). NNs to any case, any this article, this most be single who studied of and Jacobs investigation verify to under system that there are viewed of process, and architectures favourable? architectures of the tackling artificial very question only framework, but often in construct connected in the NNs evolutionary computation; They found NNs of hidden which by modularity manner. order graph similar if the i.e. of which the a non-trivial a task complex systems elaborate arguments elaborate to the demand from not investigating in have than manually designed manually the of a self-contained problem. According we the e.g. 'what7 strategy (1995) modularity the literature. the We the may highly of to raises several raises only layer describing the weights is circumstances test to smaller the be often been by networks 1996, some fundamental, finding Similarly, related test edowr ntok (considering networks feedforward of non-Lamarckian evolution non-Lamarckian the that be are the tasks achieved is or non-modular ones. This as the NN natural and Sharkey and in complex observed, easier the modular problems. role to proposed only an Sendhoff ones (e.g. for modular architectures develop the or building blocks building hypothesis the and a measures for measures to from iie a divide of task-dependency outcome if the suitable network for fast of context evolve such evolve to an requires stable the M. grammar-based) modular networks to given designed an influence by natural both vlto in evolution others, for 'where' goal systems. solve engineer What modular architectures NN is NN (gradient-based) a given a Husken architecture architectures work indirect encoding schemes, the in and 1998, please utilize (1996,1997). above, evolutionary Lamarckian complicated favour of of hypothesis of than that employed unanswered questions. unanswered evolution are of not stored not the in such in system design, 'what' and 'where' problem 'where' and the 'what' output Sendhoff of different aspects different refer et Bullinaria genotype (Kitano genotype might that evolutionary algorithms evolutionary that some that the expressive These modular problems modular example: different constraints optimization). of al. the subsystems, of is whole claim have of to modularity. consisting a neurons. We extend these genotype- the has argue a to are beneficial, way is case the a architectural modularity and Already in Already algorithms have problem learning and Kreutz result-or to directly in directly trace given often to hypothesis. been (2001) appears problem, has led survey such For of that Darwinian be and that of but of the the to algorithms problem reconsidered. played tested which what ilgcl systems. biological arguments, the of neural modular architectures modular increases also as a general it of and improve into He of evolution hidden where expert is 1962, to a 1990, Above 1999). structure the Yao the they are by-product-of easier kind claimed Ferdinand0 be natural In are in (e.g. by several smaller several systems evolve. inheritance no is genome modularity particular, other Simon the (1999). solved measures for will to Gruau networks, evolutionary necessity combined often neurons are for the the of We explicit bias all, we the modularity mapping to to of search adjacency Measures tasks evolve. the modular need for modular design a that learning (Rueckl and the and systems believe, scheme in solved (1962) results Based better and of 1995, want opti- et NNs the are are for for we cf. an al. In in of If is is a Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 2. measures for We concentrate section, the zation is tackled directed connectionist models x,, and technical purposes, between evolution between m architecture definition measure for modularity; interacting nu-upt mapping input-output above approaches. However, approaches. are. architectures: connected areas where4 can by whole separate In that a measure. directly to In the In Most We assume We a degree, certain as they First, Given a Given output this Lam Modularity . flow be the . Usually, m the . network example and in ,xn expressed as context, and successfully we weights people and or depends of graph, applied the parts interfere. next section, next can of variables indirectly, information. subsystems. separable problem, discuss the the see Snoad and Shin neurons for a for its they modularity, section we be of have that into where degree. in partitioning degree the on that are that for only (1998). given task given neural networks figure call refer either refer the divided and feedforward but rn degree the by a an y,, can are from is on 1 the separated NN a results. of In learning means also intuitive we carried are giving In adjustment The . . . problem modular problem only sparsely 1, variables in associated with its with associated be modularity capturing section vertices but the one neurons has receive define of into formidable tasks. Bossomaier to yn,. of distinguished Task-dependent evolution Task-dependent performance sets following, The modularity investigate of focus only focus subset to a the inputs the to it out not idea (Nolfi NNs. ones; Separable means Separable disjoint m eea ahmtcl definition mathematical general 3, up evolutionary algorithms evolutionary does not structural deal correspond a article input we 1,2 of X,. a concept about 3, connected The such the hard present and only This definition with and we (1995) j from ends fundamental E are weights, of on by of seem Parisi architecture what if a 3 discrete optimization discrete shall modularity, to (1, edges. subsets a system would the a it can it their source their are of a specific a known: Hence, to our both improve NN completely with all . and modular architectures to . modularity to network give the 1999). pure. Neurons . concepts simulations, , be whereas depends the Usually, that subsets m}, be a Toussaint neurons Xi, two conclusion disadvantageous neurons we is surrounding, solved issues such are denoted of the character an defining h set the such possible definitions do is of (Yao a be of extension separable of performance the NN on both related gradient-based information of not and the modularity called highly input neurons. by and (2002b) that ht the that search for NNs 1999). 4,5 can and of propose modules as modules a problem, as of in and number solely as to input and be or its the and introduce and the an outlook. of edges indicate problems problem pure and networks how Evolution to for different for to defined architecture the one providing 6 succeeding interaction underlying mentioned of presuming functional divide a are get a variables modular. neurons. which NNs strongly suitable general of optimi- of highly mixed input, non- used with by such and 221 the for of is a a Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 to: The except hidden and of The weight (cEi(l), figure 1. figure means neurons. neuron assuming all mapped in corresponds defined example the neuron equation The N this Figure values first hidden and . . of the one one degree of i. by into the . In in The sum runs over sum runs It the all , the 1. di diQ) is can figure output equation (4) the of variance Sample architecture to set (m)) modularity on connection of them (i.e. with be following, of a interval the to output modularity value 1, shown the is neurons, only one: j we m to assigned equal (5) o:=$ hidden different be have the of neurons all [0,1] as measure that equal, from neuron we A neurons to architecture A(wch.). X;" calculated A by = to and *, the quantify to 1, homogeneous a input is (diJ each neuron means yields M. illustrate index output of defined the = In k Hiisken the subsets. 5771864 - maximum value the both by in of the degree the L)2; becomes of k network neurons the the the ith the as to the the cases, et above example the neuron i, first following neuron - measure of For the subset indicating higher al. 0.668. average degree important), is factor. are a completely a one of given i. that the gets input defined pureness o:, of procedure. Neglecting to the If oj, by the the A input the neurons degree the the is we = degree higher recursively: input weights only 0. of 2-tuples average denote the from separated network separated of In of each First, the reached pureness neuron all the the modularity. of and are diG) dependency weights, neuron pureness variance depicted case an neglected w, measure are m-tuple belongs by of of is pure zero i the the the i.e. by of of in Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 procedure that appears with an strongly the outputs as a hierarchical if According and the process over problem resulting problem dominates problem over larity increases strongly (bold larity strongly increases the again. subproblem is restructuring reflected described is version problem Figure are neurons, there shown learnt As identification Usually, As the modularity overall modularity overall A(aKh.) input the these measures an These medians 2. of is of separated into with by connected i.e. example, output connections output as described as by 'top-down' Development such a later). a from can there to the as fully the the is a results process fully our learnt pure of measure alternatively NN modularity measure: during measure: modularity larger and separation assembly division, Figure 2 and the is connected 10 connected architecture definition, we no neurons, where are independent learning independent regions (compare dash-dotted (compare the first version same subproblem; its 'what' can focus strict partitioning monitor by this of suitable decreases and in solid used and the shows Rueckl be it the accordance of value, is on the NN with observed: be modules. line of Task-dependent 'what' of additionally weight-modularity and 'where' parts. by unlikely modules modularity dotted the A(arch.) of to calculated that the for the Ferdinand0 the characterizes ~t~~d~~~' et a learning modules quantify basic 'top-down' low al. (1989) problem the ith unit ith first stronger with line). While problem. usqetyatr restructuring-increases subsequently-after to can the are not with of partitioning trials. degree information this definition this the be demonstrate that the the focus in be to if line versus thin line versus completely determined asymmetry cycles known. a the in 'where' and 'where' modules ~(sse,where) et the (bold the a evolution The what single ~(sse,what) input The regarded finding version (except 'top-down' the al. during learning during of initial on mean neurons (weight) sum bold (2001). adapting connectivity extent solid calculation hidden the space as However, is usually processing units, of are of of asymmetry dotted that phase separable 'where' as a finer-grained version finer-grained as a is Rueckl in squared error line) then the are defined a layer ------related fashion, a we ...... network can dotted even to in line depicts connected of bto-p fashion, 'bottom-up' and our known, and summing the use the . the with of task learning, et to % in during 'what' as 'what' and 'where' and 'what' al. equation the to scenario (d,,,, measures 'what' problem, 'what' batch 'what' line). the a i.e. 18 those subgraphs is that the fully the used neurons. overall modu- reduced they surrounding. starting to the the subproblem be regarded - This the learning concept n 'where' and the connected dwha,) for 'top-down' allow and (4). learning 'where' 'where' training 'where' is Shown while from at the even The also 223 for of as of a a Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 within a within search The other fast accurate 3. In a i.e. We an the the generation, but remains separable but generation, In following network 3.1. development In optimized different each evolutionary trial set contains and We 3.2. are measure with the case connections. separated yield large directly version architecture, 224 the gradient-based batch-learning outputs by error number this the input-output Training section mapping positive Experimental framework Experimental the chosen randomly such fast superscripts size used mean chose learner 'false' Data Neural of sigmoidal activation enough qualitatively E(c'as) experiments, section, for an for same problem has a of learner more 9. of or model NNs short I the of slight classification we of set the and squared error task modules. a NNs, are the indirectly, 4, modularity is networks all is given optimization completely different but related development In focus we NNs of accurate model accurate the to with performed the encoded we Rprop-algorithm time. possible is if selective particular, ($1 modularity mapping induced solve in it to shall target present only or constant. six on is the similar we be by However, experiments designed As outputs (=) comparable to at input the the problems see that a learner, fast following the restriction distinguish the by is combinations indicate values separable problem pressure towards this fast learning fast this EcrnX) by to strongly influenced function and evolution that trials the problem, but sparse problem, but enough 1 least it rate of class results. and of Our be learnt and means turned out that to framework modularity is half for the y,). algorithm. (i.e. of without one two linear the tu' and 'true' whether the experiments were -1, labels (m (Riedmiller M. tasks, denoted solving to by We compute incorrect optimization outcomes.' optimization of between is in of There the of respectively. = input and choice most the Hiisken a the that an repeatedly the of iRprop+ 2) with modularity. sample corresponding 40 mean squared mean the architecture accurate changing inputs input patterns the of a sense The frequent during connections. The each small negative output is as two as task single problem of classifications. two our with slightly modified slightly with very in problem no et and three by data one the aim (Igel the the (i.e. hidden neuron hidden al. is of simulations NNs. different kinds maximum the as fast as possible. modular modelling problem, learning can learning neurons. Braun different most sensible when dealing sensible when most the performed section output neuron. task has task classification of set, sense Boolean inputs each application 2" ones and number to that training To is problem to = difference as accurately as possible. belong be learnt to periodically 64 Hiisken the to 1993), of avoid can allow NNs 2. The for different patterns), number 'false'. resulting tasks. of task, a section inputs To of with learn strong is to instance representing the be of absolute in (A side-effects, of which result must hidden neurons hidden for the the make the 2002), each is we between NN A NN A measured. 10 hidden neurons evolution every application architecture The changed - of In a 2. minimization of changed also 1) networks were given impact be class. of each problem the is We optimization. layers in layers evolve classification an evolution the value generation, (n a powerful, a connected, performed is following, improved results used two used the = problem The denoted network in we of 6). on the or In the of of every in 'true' NNs. NN's sizes with kept data The not. and For the the the the of of of Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 initialized Prior 3.3. connections population means reinitialized randomly). This Darwinian generate trained for trained weights fitness see more than and by algorithms differ algorithms includes only influence each 4. and (i.e. At(arch.) classification, which the reduction the networks thereafter, models. error depicts All a, that decreases. This decreases. = Depending Figure In the Herein, Discussion above. graphs learning the except lo4 Evolutionary algorithm Evolutionary a individual is to function connection of and the hardly weights that In the and the the case are (p,h)-selection, 4 on with a maximum style flstop) no The for some random evolution represent (Schwefel1995). weight-modularity T(""") displays h are have takes a, first indeed the the of evolutionary = of of on of = succeeding values are evolution, parents 70 offspring general are randomly spread over spread randomly the denotes the lop8 interest, the errors slightly. generation, experimental results experimental the been fitness, = place as long as reinitialized is reinitialized advantageous for modular problems. cannot 200 cycles. 200 evolution learning time learning has the of the task deleted from number re-encoded after re-encoded learning after the increase of the development i.e. median because it because has vanished In only i.e. of the approximately 100 progress be relevant the per fluctuations means effect the cycle Task-dependent learning. the the p the and of At(weight) levelled next generation next the of randomly in generation interval = In the former prior of best the optimization (accurate optimization there of 10 flmax) when learning actually primary that architecture that the afterwards inserted afterwards and the mean squared mean the and modularity supports modularity fitness is the fast individuals p to observable. E(C1ass) for out Lamarckian are of the out = case, the [-0.2, results learning learner, 200 the all by still misclassified number is terms aim whole modularity increases of the the random has evolution in four faster learning cycles. 0.21. generations, 'bottom-up' the is are interval of every of the encoded are E("") in already in generation within in architecture h experimental scenarios. experimental 200 of the The selected every the trials, strongest and offspring In initialized connections generation each generation independent and optimization this case, [-0.2,0.2] previous only at The the move model the has in generation learning error, a the architecture-modularity flstop). patterns the interval out following or intuition stopped. operator random form of and the a smaller algorithms individual's or at of the more single the generation. respectively, (Darwinian E(msr) remains constant, The random. fast the the starts trials. left, [-0.2,0.2]. fitness is the is The that modular that position observations, employed classification offspring individual 6 learner) connections weights next the the for but E(mse). weights the from choice focus Figure weakest We genome The correct all not parent fitness In style) given and, four The find and 225 the the are the for by 40 on to of of is 3 Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 to modularity 226 To summation high high inverse and be (a) understand (a) % /-. 2 . a - b 2 + o E discussed mean weight-modularity Figure Figure without reaching small reaching without modularity weight-modularity and accurate Lamarckian) after generation (fast problem generations), 10). Comparing architecture-modularity need 0.5 0.4 0.3 squared measures and 0 Darwinian) in 50 0 by It not 3(a) 4 equation observations is fast(#) shows later, yields having Figure be plausible A(Nch.-) 50 error clearly the true; Figure while generations generations give that Darwinian two fast two ihr architecture-modularity higher a 4. many 50), (4). minimize 0 150 100 0 150 100 than even exhibits Average the A(Weight) the a 3. that the In whereas more 1 Relevant terms of weights four experimental and E(mse). particular, the weight-modularity a network fast a learners, - efficiently high M. that the accurate learners 2, detailed the modularity in modularity than the learners reach learners let 200 since the Hiisken of classification 200 architecture-modularity us first 'cross-connections' (b) A("ch.) accurate learners reach with two we (b) - 2 . a + 5 + insight rhtcue determines architecture h 2 + 3 decrease fast learner's fast fast et weight-modularity see a 0.9 accurate learners (accurate Darwinian reconsider 0.8 the 0.7 1.0 the non-modular = al. 0 scenarios. learners 50 0 remains unchanged. fitness 1 that the that into the parent significantly implies (Wilcoxon -..- learning the fast(') fast(#) ---.----,--..-."*+--- 50 learning function. E(mw), population. generations the (p A(arCh.) generations Darwinian Darwinian A(weight) relation between etrtd change reiterated set 100. < relation between time architecture time 100 rank while 0.05 A(Nch.) to higher architecture- higher significantly zero. (p fl"@, = 150 sum after generation after - .-.....-...... the 1. < 150 h terms the also induces .--..--.- However, 0.05 in 0.05 In test,p fast figure 200 may this : . 200 learners the higher A(arch.) of sense, < reach most 3(b), 0.05 two the the of a Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 The a suitable a adjustment of favour of the the reduction in optimal models can models the accurate Lamarckian, accurate after adjustment in adjustment towards architecture-modularity connections networks have architecture-modularity. inheritance. is values example this example learning adapt section may characteristics considered cost connections' Bullinaria However, In greater second leads The 5. modularity importance configuration. of criteria magnitude figure changed A modularity during learning modularity Regarding observation In We the fast Conclusion architecture fast task work function be the fine-tuning reaching this goal, addition, to the architecture the designed 0 first case first of Darwinian learners 3.1) than and architectural predisposition learners (learning a different and 4(b): demands of weight-modularity presented of in (2001), learning fast sigmoidal unbounded problems. over and over, at over, and over in counterbalance third in the the will of 1; higher), do with corresponds both the The of to NNs the other we and and ouaiy in modularity is our of second E("") the rely interfere the learn a number not degrees cases, not that the for longer learning longer period. performed the target measures proved algorithms classification maximally classification (iii) that outlook experiments aims problem and results on necessarily output but more particularly weights for have If so and to the learning dynamics learning does where appropriate scenarios. This weights can weights the allow at instead values The modular problems of that 3, use with the to do a modular architectures. cross-entropy cost function (with cost function cross-entropy high third are the modularity A(weight).2 same problem architectural imperfection architectural neurons instead Task-dependent better not the an of not and should lead a latter that common aspects common characterization the NNs. least learning. fine architecture-modularity qualitatively of 0.1 an is weight-modularity, in reach aim cases increase interfere of have modular, same experiments evolution increased different, absolute minimize order understanding to E("") adjustment and the effect to counterbalance a non-modular architecture a non-modular counterbalance architecture-modularity, whereas at for some for We learn the high a be a effect learning algorithm. than 0.9; high precise to Therefore, cost to nrdcd esrs for measures introduced results of of Summarizing, towards with depends is the and architecture-modularity, distinguish fast a weight of but related also when increases weight-modularity (ii) by of the the evolution use A(arch.). the to weight-modularity could also explain small function NNs. classification. linear of thus of the be of or learning, the cross-entropy adjustment E(""), of in same cannot of weights the learnt repeatedly, certain the learning E(mse). saturated output neurons task understanding values all induces lower the E("") only weakly only under even it The ones with rbestee 'undesired7 these problems becomes between different presented architecture as the (in development and by of A(arch.) be reproduced. the This explains what cannot the close need depending cost fast and the are However, accurate Darwinian and accurate Darwinian changed a target a of The more evolutionary pressure evolutionary target regression ones presented here. ones presented sense orders learning. weights, cost applied: (i) is function to the for more (the learning take maximal on problems; values pressure towards the one. accurate function. results values fast of and the and conditions. see the EcmX) h degree the if place. on of development important section the the problem it aims In figure In The and with magnitude 0.1 scheme maximally optimality particular is functional when the contrast, found 0 turn, accurate is stopped Hence, we find and and 1). weight reason weight orders robust 'cross- in target in 2 Ecrn") 4 at and 227 (a). our We the the the 0.9 an by in of to of Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 modular for enforces results is the modular information modular then criterion modular information findings confirm inheritance a modular types should developed. We Acknowledgements accurately in a in accurately architectures 96 edof 1998, 1996, Sendhoff Sendhoff (e.g. Notes anonymous acknowledge 2 1 Bullinaria, References 228 Ferdinando, A. Ferdinando, Elman, Fodor, J. A., Fodor, J. task-dependent: From The that modularity able different that the that calculation cancelled small It modularity becomes modularity task ol like would recursive is vltoay loihs should algorithms evolutionary Edinburgh, networks. Stenning (eds) Rethinking Press). Press). Psychology of likely distinction these between the J. to show be absolute definition. a L., Bates, L., NNs NNs (with adapt J. problem's modularity problem's fast the developed broader tasks. weight out. that of 1983, A., may averages over averages that reviewers learner (Toussaint financial D., trained or are development to In that 2001, Simulating 2001, and an value. Scotland (Mahwah. Workshop: Innateness-A given the to be one be The a grammar R. French R. modularity increases Calabretta, efficient E. indeed Proceeding number configuration. Generally, we the perspective, without thank allow should appropriateness Modularity A., Johnson, A., In particular, In that processing. Such processing. weights, fast time) of support from for learning algorithm l osbe cases. possible all stronger 2002a). Evolution, Learning Evolution, the of beneficial address more be preferred if more learning criterion without and W. encodings, their R., of different, weight two is main Connectionist and of the the J. 'detected' o,f von and and kinds the M. Sougne (eds) Sougne our NJ: and modularity Mind: Twenty-third architecture-modularity; New helpful evolution differences between different inheritance schemes (Lamarckian, schemes inheritance different In future M. the Parisi, Darwinian, without H., fact Seelen inheritance) instead enforces instead inheritance) USA but efforts aim for of Kreutz of the Husken Kitano more An Karmiloff-Smith, it measures tasks very quickly very that with related problems, related modular architectures for specific, solving Thereby (Lawrence is will be D., Perspective BMBF, comments Essay of necessary and we use we for is Annual work, measures additional Proceedings the adjust fast 2001, Evolving 2001, oua neural modular specifically 1999) a also discussed also 1990, Development et priori on aybnfca discussions beneficial many NN's all at modular problems modular and operators gave e.g. would guide al. a the grant by Erlbaum Faculty the batch Conference correlations between correlations to the Gruau should on the weight weight information robust cope and A., efficiency, than support evolution NNs Development in of learning algorithm learning learning such weight LOKI Psychology particular Parisi, by the Sixth Neural Sixth the (Berlin: Springer), pp. with Associates), suggestions. designed oua acietrs for architectures modular 1995, systems. of evolved be or oa' common today's Hiisken inheritance the of learning becomes an the a other analyzed the Cognitive 01 method to D., undesirable connection undesirable class Friedrich but of cuae regression accurate development IB the theoretic, within the by classes In and Plunkett, and (Cambridge et (Cambridge MA: (Cambridge large of and inheritance). types pp. to Hiisken that al. 001 J. the hypotheses leads development problems, D. (2000); Moreover, since 146-151. a Computation and Computation find of (significantly) that the subproblems and (further) and C. systems with this increase short of Science and of Moore and Moore to modularity et problems, aspects the 253-262. encoding speculation modular it MA: MIT MA: al. and Moraga time. NNs. K., i.e. part is of gradient Society, for shown need neural with Our new that to 1996, MIT the The to we the are of of of be If K. a Downloaded By: [Igel, Christian] At: 14:14 13 June 2008 rerc, C. Friedrich, Jordan, Hiisken, Husken, Grau, gl C., Igel, Toussaint, Kitano, Nolfi, Sendhoff, Lam, Sharkey, A., Sharkey, Snoad, Riedmiller, Simon, Rueckl, J. Toussaint, M., Yao, Schwefel, Sendhoff, X., 1999, X., artificial Evolving C.-H., S., F., networks-a Spain, pp. vltoay Computation Evolutionary Press), Neurocomputing, and architectures Joint ht learn that Processing prediction. forward neural network. neural forward Information otcl iul systems? visual cortical Supercomputing Conference, Supercomputing Francisco, Handbook of Handbook 187-193. neural 106: Proceedings RPROP Complex HI, (IEEE (New 2755-2760. 171-186. N., H. M. H., M., and A. M., and 1995, H.-P., B. G., Cave, A., USA, M., B. and I., 467-482. M., vltoay optto Conference Computation Evolutionary and 1990, Designing 1997, Gayko, Conference York: 1996, A., Parisi, and Igel, pp. M., Press), pp. Husken, ytm in systems 1962, A., 2002a, 2002b, Automatic Bossomaier, algorithm. In and Braun, Systems, Shin, 1995, 951-956. (IEEE and Kreutz, and CA, USA 98-109. to and Jacobs, A., Modularity, In and On John 1998, Processing of C. quantitative D., The J. K. Brain of learn. In learn. Congress A neural model for multi-expert architectures. multi-expert for model neural A F. On Evolution the International Joint International the combining Management E. and oaa C., Moraga, R., M., 1999, artificial Press), January G., Wiley architecture on 259-266. 4: and Evolution model hoeia eouinr research. evolutionary theoretical definition Theory and 461476. H., Toussaint, (IEEE 2003, 1998, T., Neural Proceedings of the Proceedings of neural Learning 1995, Sendhoff, M., combining X. (Aachen: Shaker). on & 1993, Kosslyn, S. M., 1989, S. Kosslyn, M., Why Physical pp. cmuainl investigation. computational A 1995, MONSTER-the ghost in ghost 1995, MONSTER-the case and neural selection 2003,50: erl networks. neural artificial Sons). Formation Yao 1999, n Neural and miia evaluation Empirical Evolutionary and Neural and Modular Press), pp. 245-250. Task-dependent evolution of San Networks of networks 1996, ietaatv ehdfrfse akrpgto learning: backpropagation faster for method adaptive direct A of Optimum study. of Uncertainty modular neural networks. neural modular and and M., B. A., networks. Variable encoding Variable complexity. Diego, Review Structures-Optimization and neural networks. neural 105-123. and the An evolution. In D. 2001, and and Conference artificial 2000, Networks 586-591. E. D. Networks using (IJCNN evolutionary Seeking. B. IEEE CA, Computation E, hierarchical learning dynamics of Proceedings Task-dependent Fogel disability in 58 (3): In Optimization for Goodman Proceedings USA, (ACM USA, genetic algorithms genetic (GECCO-2001), Knowledge International Autonomous neural Sixth 2002), Sixth-Generation (Cambridge MA: (Cambridge (ECNN on of (eds) 3673-3677. are of Connection Neural the of method International modules nets. In (CEC 'what' modular of (ed.) Honolulu, IEEE neural improved the Journal of 2000) Proceedings Adaptive the Based Press and Connection Conference of Robots, Networks evolution the Late-breaking connection machine: modularity connection '99) IEEE, and In problem systems. to Symposium networks in Artificial San American Science, neural San Proceedings Systems with id od building-blocks good find a dual-tasking multilayer Washington of wee processed 'where' HI, MIT Computer Behavior, Rprop Conference 7 87 IEEE Francisco, Antonio, TX, Antonio, Cognitive (1): graph generation graph (IJCNN USA, classes-neural of of In Science, networks for time series time for networks on (9): Press), pp. 8 to (IPMU erl Structures Neural M. 89-113. the modularity Philosophical Neural (3): Press). Papers on erig algorithms. learning 1423-1447. decompose of A. Arbib Technology Series Technology (IEEE 3 1995 299-314. DC, Combinations of Combinations 2002), Neuroscience, (2): the 9 on A USA, CA, (1): '96), Networks, at 579-582. USA International USA, 151-183. Information ACM/IEEE by the Genetic Press), pp. Press), 3-10. Honolulu, in Granada, networks (ed.) separate tasks. Society, system. (IEEE neural Vol. feed- 229 San The the pp. for for In of 1: 1