<<

CROSSING IN COMPLEXITY: INTERDISCIPLINARY APPLICATION OF PHYSICS IN BIOLOGICAL AND SOCIAL SYSTEMS

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

CROSSING IN COMPLEXITY: INTERDISCIPLINARY APPLICATION OF PHYSICS IN BIOLOGICAL AND SOCIAL SYSTEMS

IGNAZIO LICATA AND AMMAR SAKAJI EDITORS

Nova Nova Science Publishers, Inc. New York

Copyright © 2010 by Nova Science Publishers, Inc.

All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher.

For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com

NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works.

Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication.

This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS.

LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA

Crossing in complexity : interdisciplinary application of physics in biological and social systems / editors, Ignazio Licata, Ammar Sakaji. p. cm. Includes index. ISBN 978-1-61209-298-0 (eBook) 1. Protein folding--Mathematical models. 2. Biocomplexity--Mathematical models. 3. Laplace transformation. 4. theory. I. Licata, Ignazio. II. Sakaji, Ammar. QP551.C785 2009 572'.633--dc22 2010001045

Published by Nova Science Publishers, Inc. New York

CONTENTS

Foreword vii

Chapter 1 Living with Radical Uncertainty: The Exemplary Case of Folding 1 Protein Ignazio Licata

Chapter 2 The Limits of Atomism: The Bohm Way of a New Ontology 11 Ryo Morikawa

Chapter 3 Doing Mathematics about and with the Brain 21 Michael A. Arbib

Chapter 4 Physics of Life from First Principles 57 Michail Zak

Chapter 5 Theoretical Physics of DNA: New Ideas and Tendencies 133 in the Modeling of the DNA Nonlinear Dynamics L.V. Yakushevich

Chapter 6 Mathematical and Data Mining Contributions to Dynamics 151 and Optimization of Gene-Environment Networks Gerhard–Wilhelm Weber, Pakize Taylan, Başak Akteke-Öztürk and Ömür Uğur

Chapter 7 Folding Proteins: How to Set Up an Efficient Metrics for Dealing 181 with Complex Systems Alessandro Giuliani

Chapter 8 The (Unfortunate) Complexity of the Economy 191 J.P. Bouchaud

Chapter 9 Evolution of Norms in a Multi-Level Selection Model 199 of Conflict and Cooperation J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub vi Contents

Chapter 10 Dynamics of Coupled Players and the Evolution of Synchronous 213 Cooperation — Dynamical Systems Games as General Frame for Systems Inter-Relationship Eizo Akiyama

Chapter 11 Fractal Time, Observer Perspectives and Levels of Description 229 in Nature Susie Vrobel

Index 255

FOREWORD

Someone told that Physics is what physicists do (late at night!) Those who happen to read up or even read through any physical review or ArXiv will find papers dealing with topics which are usually not related to the “traditional” field of Physics: DNA and folding protein, organization of eco-systems, market fluctuations, web topology, language evolution, biological systems, and cognitive agents. Recently a new Physics of Emergence has been developing; it aims at investigating the interwoven hierarchies of the evolution of complex systems. The concept of emergence firstly appeared in studying phase transitions and collective processes; it has since moved to new problems and suggested new conceptual and mathematical tools which are redefining the themes and the style itself of Theoretical Physics. The old reductionist approach goes side by side with a novel methodological sensibility where knowledge is not an ascending theoretical chain starting from the analysis of “the World’s bricks” and progressively including broader and broader scales. When dealing with complex systems it is necessary to adopt a complementary plurality of approaches, a dynamical usage of models connected to the multifarious features of the system under consideration. The great majority of interesting systems remain invisible when they are investigated according to reductionist view and traditional formal methods. The key idea is that the more a system is complex the more the perspectives to observe it increase. Through such perspectives the system will show features and organization levels which cannot be neatly “separated” and solved by a single model based on that traditional good ole “fundamental” equation of Theoretical Physics. The outcome is what we – quoting the famous movie - have called the “Physics of Sliding Doors”. In arguing with the supporters of Determinism, William James prophetically wrote:

“the sting of the word "chance" seems to lie in the assumption that it means something (…) of an intrinsically irrational and preposterous sort. Now, chance means nothing of the kind. (…). All that its chance-character asserts about it is that there is something in it really of its own, something that is not the unconditional property of the whole. If the whole wants this property, the whole must wait till it can get it.” (The Dilemma of Determinism in Essays in Pragmatism).

Our aim is to introduce some particularly important topics in order to explore the complexity archipelago promising a new interdisciplinary vision of the Physis . viii Ignazio Licata and Ammar Sakaji

I specially thank my friend and co-editor Prof. Ammar Sakaji: without his precious help this volume could not be edited.

Ignazio Licata Isem, Institute for Scientific Methodology, Palermo

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 1-9 © 2010 Nova Science Publishers, Inc.

Chapter 1

LIVING WITH RADICAL UNCERTAINTY: THE EXEMPLARY CASE OF FOLDING PROTEIN

Ignazio Licata* ISEM, Institute for Scientific Methodology, Palermo

Abstract

Laplace's demon still makes strong impact on contemporary science, in spite of the fact that Logical Mathematics outcomes, Quantum Physics advent and more recently Complexity Science have pointed out the crucial role of uncertainty in the World's descriptions. We focus here on the typical problem of folding protein as an example of uncertainty, radical emergence and a guide to the "simple" principles for studying complex systems.

Keywords: reductionism, emergence, folding protein, hydrophobicity, Recurrence Quantification Analysis.

1. “ZIP Filing the World”

A ghost roams science, a sort of hidden paradigm - never formally enunciated – which steers the way how the whole scientific activity is conceived. We mean the completely- computable-world idea whose conceptual foundation lies reductionism. If we had to trace back the manifesto of such a conceptual tension, the famous Laplace’s excerpt from his Essai Philosophique sur les Probabilités (1814) is exemplary:

“We must therefore regard the present state of universe as the effect of its preceding state and as the cause of one which is to follow. An intelligence which in a single instant could know all the forces which animate the natural world, and the respective situations of all the beings that made it up, could, provided it was vast enough to make an analysis of all the data so supplied, be able to produce a single formula which specified all the movements in the universe from those of the largest bodies in the universe to those of the lightest atom. For such intelligence, nothing would be “uncertain”, and the nature, like the past, would be present before its eyes.”

* E-mail address: [email protected] 2 Ignazio Licata

Although the very nature of Laplace’s observer – with its ability to measure the initial condition of all the particles in the Universe, to insert them in the Newtonian equations and so to calculate any trajectory – patently appears as extremely speculative, the interesting side is the implicit assuming that such a program is unfeasible, and yet theoretically possible! In other words, there is no contradiction in thinking it out, and it is in consonance with our knowledge of the physical world. Still nowadays, contrary to what it is argued, neither Quantum Physics nor non-linear dynamics have swayed such conception. In , for example, the Schrödinger’s equation characterizes an evolutionary dynamics belonging to U-type – according to the well-known Penrose classification – which is perfectly deterministic (besides, from the structural viewpoint, the non-relativistic quantum physics’ key equation is similar to a diffusion equation and formally connected to the Hamilton-Jacobi equation), whereas the R processes, which the probabilistic interpretation is centered on and related to the state vector collapse , can always be regarded as the outcome of an “hidden determinism” limiting any attempt to read them classically, but has no radical incidence on the idea of a complete calculability of the World a la Laplace (Allori & Zanghì., 2004). The same goes for non-linear dynamics: the sensitivity to initial conditions and the long- term unpredictability does not remove local determinism; in fact they make the chaotic dynamics as the ideal sample of computational emergence, algorithmically compressible in few simple formulas (see Gleick, 2008; Chaitin, 2007). It is not by chance the close analogy between the halting problem in Turing computation theory and the deterministic chaotic systems: in both of them the final status is unpredictable, but it can be followed step-by-step. Even the structural instability that was studied by Pontryagin, Andronov and Peixoto in mathematics and whose equivalent in Physics are the Haken and Prigogine dissipative systems – where, starting from a situation of instability, a system can choose infinite equilibrium states – does not prevent a “global” forecast about the asymptotic state of the dynamical evolution (Thom, 1994; Prigogine & Nicolis, 1989; Haken, 2006). Despite its Newtonian roots has grown weaker and weaker, the Laplace-inspired computability of the World has not yet been completely undermined by the Quantum Physics and non-linear dynamics advent. The mythology of “Everything Theories” is grounded on such line of thought (Barrow, 2008), where the key idea is “Zip-filing the World” into a fistful of formulas describing the fundamental interactions between a restricted group of “fundamental objects”. Two of the main reductionism meanings fall within such mythology. The more patent one is related to the crucial – nomological, we dare say - role of the “World’s bricks”; the other and subtler one suggests that the World can be described by a theoretic chain of the kindT1 T2 ...T f , where the Ti are the description levels and the symbol “ ” means “physically weaker than”, and consequently each level can be derived by the

“final theory” Tf. We find interesting to point out that even in a so deeply changed cultural context the idea of the World ultimate computability still largely characterizes the conception of scientific activity as well as the scientific method role, both regarded as the asymptotic approaching to the “reality’s ultimate structure”. One of the principal consequences of such way of thinking lies in considering uncertainty not only as bad, but a worthless bad feature, a mere practical drag to knowledge and located just in some remote and radical quantum zone ruled by Heisenberg Indetermination Principle. Any science where uncertainty could not be tamed so Living with Radical Uncertainty: The Exemplary Case of Folding Protein 3 easily was downgraded to the status of “imperfect science”, unfit for the Physics-oriented programme, as it happened with Biology, Cognitive and Socio-Economic Sciences. Paul Anderson in his famous paper-manifesto “More is Different” (1972) radically and subtly criticized such programme, a criticism later developed by Laughlin and Pines (Laughlin & Pines, 2000; Laughlin, 2006). The central idea is simple: the universality of the collective behaviors - such as phase transitions – is compatible with the system’s constituents, but not deducible from the properties of the elementary “bricks”. Thus, reductionism simply does not work with such kind of systems, which, after all, are the great majority of the interesting systems and stays in that zone called the “Middle Way” (Laughlin et al., 2000) that is located between Particle Physics and Cosmology. In “middle-way” systems, evolution is strongly connected to dynamical coupling with environment and depends on the structural history of the system. The “Everything Theory” for this kind of systems is impossible, just because any interactions between every single system and the environment at each instant should be taken into account. That is to state the best dynamical description of the system is the system itself! Contrary to the Laplace hypothesis, the observer is “immersed in the World” and has to make its descriptive choices critically depending on its inter-relation with the system (Licata, 2008). So, we have to come to terms with the idea that a mathematical model of a complex system is just a “picture” of a single side of the system that has been taken from one of the several possible perspectives among the organizational behaviors. Does this kind of dynamical usage of models fall under the scheme of the above-examined theoretical chain? The answer is no! Uncertainty and incommensurability between a description and another one are here natural elements of the knowledge process and are placed in the interstices of any theoretical description. In fact, a model is not a mere description of a space-time range (for ex. from the smallest to the biggest), but it is fixed by the observer’s aim and its goals; thus models aimed at grasping a specific behavior of the system are intrinsically affected by uncertainty as for the other aspects and they cannot be easily tuned within a single key (Minati, 2008; Licata, 2008). We will show here that Folding Protein is a perfect example of a “middle-way” process as well as a precious guide to the understanding of both complex systems and the radical elements of uncertainty which are connected to their description.

2. The Folding Problem

Proteins are linear heteropolymers made of a non-periodic chain of amino acids connected by a covalent bond. The vital task for “life mechanism” is carried out by folding process, when a protein in solution folds into a three-dimensional structure which must be self-consistent with the solvent. Proteins catalyze the chemical reactions necessary to life (enzymes), build the biological structure in the strict sense and individuate antibodies. These macromolecules, at the edge between Chemistry and Biology, carry a “biological signature” containing all the significant elements of the authentic biological complexity. The problem a protein has to solve is that of finding a spatial soluble configuration so as to carry out its functions without going out of the solution. In this condition, the protein can carry out its informational role, which depends on a great deal of boundary conditions related to 4 Ignazio Licata environment where it is immersed (solvent, presence of other molecules, PH, ionic strength and so on) (see Whitford, 2005; Giuliani and Zbilut, 2008). In spite of the huge proposed classifications there do not exist identical folding configurations, each one depends on the responses of “actors into play” to the specific context (as for proteins these actors are well-known: hydrophobic bond, hydrogen bond, van der Waals forces, electrostatic interaction). Such singular coupling with environment is a general feature of any complex system and is valid for a protein as well as a social group or a factory. Uncertainty is the rule there. And it provides for a “minimal” definition of complex system: a complex system is a system showing locally an unpredictable behavior, which could not be zip-filed in a single formal model. In highly logical open systems (which not only exchange matter/energy, but modify their arrangement too) it is impossible to distinguish “in vitro” behavior (closed systems describable by a single formal model) from “in vivo” behavior. It follows what we call the first principle of complex systems or logical openness: complex systems are open, context- sensitive systems. The final configuration of a crystal is unique, whereas the protein’s dynamic path toward its three-folded state is quite “rugged”, full of false minima and metastable states (see Figure 1).

Figure 1. Folding funnel in rugged landscape.

Also such feature lends itself to a generalization valid for all complex systems and is strictly related to unpredictability and uncertainty, the second “simple” principle is the Principle of Indifference: a complex system exhibits several different behaviors equivalent from energetic viewpoint and thus impossible to classify into a hierarchical order, not even a probabilistic one. The reason why we have underlined the last sentence is to point out we are dealing with a more radical unpredictability than the quantum or non-linear ones and at the same its nature is closer to our everyday experience. It is a “sliding-doors”-type of indifference, the system can choose different possible “histories”, each one following a completely different “destiny”. In Living with Radical Uncertainty: The Exemplary Case of Folding Protein 5

Quantum Physics, within given situations, it is possible to assign a probability weight as well as in non-linear dynamics of the simplest models everything depends in continuous way on initial conditions. In the above-described cases instead not only it is impossible any a priori evaluation, but the system’s dynamic history is really far from following the “domestic” rules of differential calculus to such an extent that we need a mathematics taking into account singularities, terminal points and environmental noise rather than trajectories (Non-Lipshitz dynamics, see Zbilut, 2004). Thus it follows the third principle, or “it is easier to observe it”: the path that a system follows towards its final state is decisive to define the state itself. Complex systems can only be narrated by „consequent stories‟ and, not by a priori predefined ones. It has to be pointed out how such statements which appear so weird if related to the Physics’ traditional problems – where boundary and initial conditions are very different from the “law” ruling the phenomenon – become absolutely obvious when we move to different field such as cognitive processes. That is why both Artificial Intelligence and the research for “algorithmic laws of thinking” have failed (Licata, 2008). Thus, the fourth principle deals with the structure-function relationship, inextricableness of the structure from a complex system dynamics: a system is its own history!

3. Radical Emergence: Pyrococcus Furiosus

The general features of complex systems provide them with “flexibility” and adaptedness enabling life and cognition and show the deep link between unpredictability, uncertainty and emergence. That is the principle of “surprise”: complex systems exhibit radical emergence properties. “Radical” is here referred to the appearing of properties which cannot be deduced from a predefined model of the system as responses to the specific situation of coupling with the environment. An Universe a la Laplace can be totally assimilated to a Turing machine, where the observer itself play the role of “event recorder”: many things happen, but all of them fall under the “cosmic code” of fundamental laws. In complex systems, instead, is the single process that counts, and how it occurs, “laws” are just the stable elements within a very tangled dynamic frame, where surprises not deducible from a single model emerge. The study of thermophilic proteins, able to live without denaturing at very high temperatures, is extremely interesting. That is the case of Pyrococcus Furiosus, an endemic archaea bacterium located in Pozzuoli sulphur mines at about 90° C (194° F), a temperature usually causing mesophilic proteins to loose their tertiary structure. A purely physico- chemical analysis cannot find an explanation for the peculiarities of themophilic proteins, as we has already said the “bricks” are the same and the forces into play, too. By studying “chimeras”, artificial proteins made of both thermophilic and mesophilic sequences, it has been shown that thermophily is a global feature of a system: a protein is either thermophilic or not-thermophilic, its lifetime does not depend on how the sequences are recombined! Here it is an exemplary case of emergent property, compatible with structure, sequence, elements and forces into play, but not deducible from them all. New information about a system spur us to use new methodological approaches and update the model which, in turn, is a cognitive emergence caused by the phenomenon under consideration.

6 Ignazio Licata

Figure 2. model updating under emergence.

A significant clue can be found in a statistical analysis technique called Recurrence Quantification Analysis (RQA) and developed by Webber and Zbilut in order to study complex systems such as the biological or financial ones, and so on (Zbilut and Webber, 1992, 1994). As its name suggests such analysis is based on recurrence graphs where the elements of data sequence are plotted in the same point if they indicates a similar position in the phase space. Practically, if the x-axis value and the y-axis one are very close (for two- dimensional case) – it is given for granted an impossible to remove structural uncertainty, so RQA is ideal for studying such systems – they individuate the same point. Such phase portraits are really illuminating. In the case of proteins, the recurrence between adjacent sequences is studied, the outcome can be found in Figure 3. The key element is the relation between hydrophilic and hydrophobic sequences, that is to say the “dialogue” between protein and its solvent (water). In thermophilic proteins, the hydrophobicity patterns are widely distributed along lines parallel to the main diagonal, whereas they are more clustered in mesophilic ones. This thing directly depends on the Principle of Indifference – many solutions energetically equivalent – and has to be interpreted as a greater flexibility of the structure, elements and “architectural” project being the same!That also explains because there do not exist intermediate possibilities: under a given level of flexibility, the protein “collapses” and is unable to carry out its function. What kind of “explanation” is this? Not a deductive one, because there are neither a formal model nor the analysis of the constituents to reveal it, but rather a “global” evidence which is derived a posteriori (it’s easier to observe it) when we use methodological tools able to grasp these dynamical features of the system. That is the essential peculiarity of authentic emergent processes; they cannot be brought back to a specific local “cause”, but rather to a set of collective conditions which allow its occurring. In the case of thermophily, such conditions can be found in the extension of the distribution of the possible configurations and in the transition speed-rate from a configuration to another one as the response to the high Living with Radical Uncertainty: The Exemplary Case of Folding Protein 7 thermal excitation of the external environment. All that can happen by a particular architectural expedient related to hydrophobic sequences and reminds the -glasses’ essential topics in Physics, where the collective characteristics of a system are deeply connected to the complex dynamic arrangement of many local equilibrium states.

From Giuliani et al., 2000.

Figure 3. Comparing of phase portraits by RQA between thermophilic and mesophilic proteins (left) and two chimeras (right). 8 Ignazio Licata

4. Uncertainty’s Fecundity: A Systemic Conclusion

This short excursus in Folding Protein has lead us towards some key principles that cal be applied to any complex system, from Biology to socio-economic systems. Why are they so simple? Should not we expect a complexity science as extremely complicated as the objects it studies? “Simplicity” depends on the fact that the more the collective behaviors matter, the less the microscopic detailed questions are important. Besides, it is a seeming simplicity, because it requires explanation styles and analysis methods completely different from those used to study simple systems which can be easily “enclosed” within an analytic solution. From a systemic viewpoint, simple systems lend themselves to be described by a model in which it is always possible to individuate a distinct border between system and environment, easily schematizing the inter-relations into play. In the great majority of the cases such procedure is the equivalent of making a “toy-model” and “sweeping complexity under the carpet”. There does not exist a reassuring fixed “border” between a system and its environment, and even the identification of the elementary constituents changes in accordance with collective dynamics. Reductionism is a strategy that can bear fruit and has productively led the scientific explanation style for about three centuries, but it cannot be applied in any situation. Tackling complexity means to accept that radical uncertainty all of us experience in the plurality of the possible choices in our life, and which is somewhat subtler and more pervasive than the quantum one, which is fixed by Heisenberg principle. We could be tempted by proposing again the Laplace’s hypothesis: if such intelligence able to take into account each object in the Universe, each system-environment coupling did exist, could it overcome uncertainty? We suggested good reasons which make clear that assuming such kind of observer does not make any sense. “We get no the God’s eye” is what Tibor Vamos (Vamos, 1993) efficaciously wrote; a post-Hegelian argumentation about an observer immersed into the World it is observing can e found in Breuer (Breuer, 1995). And yet the problem of Laplace absolute observer can provide an important systemic cue. In nature there do not exist only objects, but also the objects’ behaviors which cannot be observed when we only focus on elementary constituents, even admitting that these ones can always be individuated without ambiguity. So the “universal” observer should not only be able to take into account every single behavior but also the myriads of collective behaviors in which the object can be involved at the same time! That is nothing but stating that the best “narration” of the World is the natural processes’ evolution itself. Scientific observers, instead, are always situated, and uncertainty as well as the limits of their models are a strong spur for new explorations and perspectives.

Acknowledgments

The author owes a lot to Alessandro Giuliani for unveiling the fascinating world of proteins. This paper is dedicated to my friend Joe Zbilut memory (1948 – 2009). Living with Radical Uncertainty: The Exemplary Case of Folding Protein 9

References

Allori,V. , Zanghì,N. (2004) What is Bohmian Mechanics, Int. Jour. of Theor. Phys., 43, 1743-1755. Anderson, P. (1972), More is Different, in Science, 177, 4047. Barrow, J.D. (2008) New Theories of Everything , Oxford Univ. Press. Breuer, T, (1995) The impossibility of exact state self-measurements, Philosophy of Science, 62 , 197-214. Chaitin, G. (2007) Thinking About Gödel and Turing: Essays on Complexity, 1970-2007, World Scientific. Giuliani A., Benigni R., Sirabella P.,Zbilut J.P. , Colosimo A. (2000) Nonlinear methods in the analysis of protein sequences: a case study in rubredoxins, Biophysical Jour., 78, 136- 148. Giuliani, A., Zbilut, J. (2008) The Latent Order of Complexity, Nova Scienza Publ. Gleick, J. (2008) Chaos: Making a New Science , Penguin. Haken,H. (2006) Information and Self-Organization: A Macroscopic Approach to Complex Systems, Springer. Laughlin, R. (2006) A Different Universe: Reinventing Physics from the Bottom Down , Basic Books. Laughlin, R., Pines, D. (2000) The Theory of Everything, PNAS, 97, 1, 28-31. Laughlin, R., Pines,, D., Schmalian, J. Branko P. Stojkovic, B. P,, Wolynes, P. (2000) The Middle Way, PNAS, 97, 1, 32-37. Licata, I. (2008) Logical Openness in Cognitive Models , Epistemologia, 31, 2 , 177-192 Licata, I. (2008) Vision as Adaptive Epistemology, http://arxiv.org/abs/0812.0115 Minati, G. (2008) New Approaches for Modelling Emergence of Collective Phenomena, Polimetrica International Scientific Publ. Prigogine,I., Nicolis, G. (1989) Exploring Complexity: An Introduction , W, H, Freeman & Co. Thom, R. (1994), Structural Stability And Morphogenesis, Westview Press. Vámos, T. (1991) Computer Epistemology: a Treatise on the Feasibility of the Unfeasibility or Old It, World Scientific. Webber Jr., C. L., Zbilut, J. P. (1994) Dynamical assessment of physiological systems and states using recurrence plot strategies, Journal of Applied Physiology 76, 965–973. Whitford, D. (2005), Proteins: Structure and Function, Wiley. Zbilut, J. (2004) Unstable Singularities and Randomness, Elsevier. Zbilut, J.P., Webber Jr., C.L. (1992) Embeddings and delays as derived from quantification of recurrence plots, Physics Letters A, 171, 199–203.

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 11-19 © 2010 Nova Science Publishers, Inc.

Chapter 2

THE LIMITS OF ATOMISM: THE BOHM WAY OF A NEW ONTOLOGY

Ryo Morikawa* Theoretical Physics Research Unit, Birkbeck College, University of London Malet Street, London WC1E 7HX, UK

Abstract

In this paper, we survey a developed outline of atomism. The paper clarifies that this leading principle of modern physics faces a limitation. This limitation is a limit of ontology. We are unable to recognize a concrete ontology; we have only epistemology. Therefore, we discuss this issue from a philosophical viewpoint by referring to Cassirer‟s philosophy. These arguments will clarify that there is a need for a new ontology that will be able to make a consistent understanding from the microscopic to the macroscopic level. To do this we argue the case of the new ontology that was introduced by Bohm. Also, we will see the mathematical formalism of cyclic ontology as a new ontology for the process. Then we will see that this formalism is able to obtain the Heisenberg equation as well as the Bohm equation.

Keywords: Atomism, Cyclic ontology, The implicate & explicate order, Bohm.

Brief History of Atomism

The primary ideas of atomism were developed by Demokritos and Leukippos. They were philosophers of the atomist school of pluralists in ancient Greece in the late fifth century BC. Demokritos and Leukippos thought that the whole of any given physical object consisted of atoms and void. According to the theory, these two aspects are never generated and never ending. Demokritos and Leukippos thought these two elements were the elements of all physical objects. However, their ideas in atomism were not developed further. Then,

* E-mail address: [email protected] 12 Ryo Morikawa replacing their idea, the „four elements theory‟ became the mainstream of the next generation of philosophers. Thales thought that water was the source of creation and named it „arkhe‟. There were other thinkers that thought fire was the „arkhe‟, air is „arkhe‟ or the Earth is the „arkhe‟. On the other hand, Empedocles thought all four of these were the „arkhe‟. He called these four elements „rizomata‟, that means the „root‟ or „foundation of all creation‟. Empedocles believed that the four rizomata constantly join together and rupture, thus creating all things in the universe. Also, he suggests that the total amount of these four rizomata will never increase or decrease like in the idea of atomism. This was later called the „four elements theory‟. These two ideas seem childish and poor to our eyes but the fundamental concepts are not different from our own atomism of modern physics, since they also search for the fundamental being of the universe. Ancient atomism calls this the „atom‟, and the four elements theory calls this the „rizomata‟. Modern physics calls this „elementary particles‟ though it is not considered to be elementary at present. The similar aspect in these three ideas is that all of them suggest that there is some kind of fundamental object. The „four elements theory‟ eventually succeeded the scholasticism that was established by clergyman and priests in Europe in the eleventh century. The ideas of modern atomism were developed by Lavoisier, Dalton and Boltzmann. This in turn led to modern particle physics.

Necessity of the New Ontology

At the end of the eighteen century, we found that there are subsystems within the atom. That is, the atom is not the most fundamental layer. There is an internal structure within the atom that is made up of electrons and nucleuses. Thus, it follows the ideas of quantum theory and modern particle physics. However, we are then forced to ask, is the nucleus made from quarks? Moreover, is the quark made from strings? Nobody knows the answer to this! Let us assume however, that the quark is an elementary object; then we are faced with a difficult question ----- what is a quark? Is this „the fundamental and elementary substance‟? We are faced with the same problems concerning strings too. If a quark (string) is a substantial object, then we must be able to create a macroscopic object by combining and uniting them. However, nobody knows the mechanism that will combine and unite quarks (strings). That is, we are unable to make a macroscopic object in this way. And moreover, we also know that the expression of „combining‟ (or „uniting‟) quarks (or strings) is an improper use of the word. Quarks and strings are concepts that come from the simple and primitive ideas of atomism. Modern physics arrived at the concepts of quarks and strings from the ancient ideas of atomism but it faces a serious difficulty. We do not have an ontological consistency between classical and modern matter, or between the microscopic object and macroscopic object. We ought to call this a „Limit of Atomism‟. In order to bridge the two worlds properly we need a new ontology. The Limits of Atomism: The Bohm Way of a New Ontology 13

Philosophical Viewpoint from Substance to Function

It is obvious that there is a serious chasm between the two worlds, the microscopic and the macroscopic world. This chasm can be attributed to a lack of ontology in the modern physics theory. Here, we will discuss this from a philosophical viewpoint referring to Cassirer‟s[1] philosophy. When we look at the developing history of scientific atomism we realize that this is a history of how we make an epistemological understanding about nature. Cassirer‟s point of view is that humans recognize the world using metaphors (or simply we should say fable) initially. These metaphors are accomplished by using symbols. The world can be understood by the connection and the relationship between these symbols. However, this is a comparatively primitive way of developing an understanding, since these symbols appear to have a mysterious and magical power. Conceptualization is achieved by fixation of change that is a fluctuation. Therefore, when we use symbols for understanding the world there is a risk that our understanding will regress to its ancient and mythical stage every time. This stage is the initial fluctuation. For example, a number is a typical symbol. We know there is no essential difference between each of the numbers. There is no „special number‟. However, historically we can say that humans recognize numbers to be dressed in some kind of mysterious robe. The irrational numbers are typical good examples of this. People thought such numbers has mythical and magical power. The next step is that, humans use substance for their understanding of the world. These substances are also one type of the symbols but these symbols have no longer magical tendency. However, from Cassirer‟s viewpoint this is still only on the way to discovering a final form of epistemology. Cassirer claims that we do not need to assume there is a substance when we attempt to perceive nature. For him, substance is like Kant‟s „Ding an Sich‟ as „Ding an Sich‟ was a mistake of Kant‟s in Cassirer‟s view[2]. Cassirer insists that we do not need substance for our epistemology; the important point is the relationship between phenomena1. Cassirer claims that our method of understanding, that is epistemology, has changed from being a substantial method to one of recognizing the relationship between phenomena. He describes this change as being a change „from substance-concept to function-concept‟----- from Substanzbegriff to Funktionsbegriff. That is to say, a developed way of epistemology is to recognize the world by functional relationship between phenomena. We do not have to assume any substances for each phenomenon in a modern way of epistemology. This epistemological change of philosophy can be paralleled with the epistemology of science. It seems that there are no serious problems and it seems that this change supports the standard interpretation of quantum theory. However, it also strongly justifies that modern physics has lost its ontological object. In fact, we are unable to find any ontological object and substance in quantum theory (within the microscopic world). There is no concept of ontology in standard quantum theory. Therefore, we can say that Cassirer‟s viewpoint is also correct in this case. It also follows the change of the epistemology of science while it exposes

1 This expression reassembles Bohr‟s claim about his research attitude toward quantum mechanics. Bohr is very careful to avoid using the term „substance‟ when he considers quantum physics and when he develops his theories in physics. This is very similar to his interpretation of quantum mechanics that follows the Copenhagen interpretation. We are able to say that the Copenhagen interpretation abandons the idea of referencing to the real objective world. 14 Ryo Morikawa the fact that we have lost the idea of any form of substance making up the world. That is, we cannot develop any consistent understanding from the quantum to the classical world ----- from the microscopic to the macroscopic world. As we argued in the above paragraphs, we could not find any ontology in quantum theory. The main problem is that the modern physics has lost ontology. Therefore, we are unable to explain the stability of a general object, for example, a desk, pen, cup, and so on using standard quantum theory. The several quantum paradoxes are caused by this lack of ontology. We will argue for a new ontology in order to construct a new way of thinking in the following sections. This new form of ontology can be called „cyclic ontology‟. And this helps us to make a consistent understanding the world from the microscopic to the macroscopic way without to take false stapes of the several quantum paradoxes.

Process and the Cyclic Ontology

In the above arguments we saw that a few serious and fundamental difficulties arise due to the lack of ontology. It is said that physicists imagine a picture when they think about their problems in physics, but strictly these images are fundamentally mistaken. All of these images are classical. That is, a mind of the physicist is also split seriously. This is not beneficial to our thought too. We must cover the chasm between these two worlds as soon as possible. What is the best way to do this? Now, we know this chasm is caused by the limitation of ontology that we have, that is atomism. Such being the case, we must look for a new ontology for an alternative way of thinking. [3] introduced the idea of process. The concept of process was mainly thought by Whitehead in the modern era. In the ancient era, Herakleitos also thought of this idea. Herakleitos said “all things flow”. However, this idea has not been considered particularly in philosophical thought in Asia. This is considered to be „commonsense‟, a part of daily human life in Japan; it is particular to the Japanese way of life. Bohm‟s idea is also similar to this way of thinking. In Japan, it is believed that all things in the universe are mutable, and this should not be limited to the islands of East Asia but adaptable all over the world2. Hojoki is a famous old essay in Japan (it was completed at 1212). The author of this essay was KamoNoCyomei. The beginning of this essay has a deep philosophical implication. KamoNoCyomei describes a river that the river is a flow but not water, the flow is a change3. Thai is to say, water is not an essence of the river. He was not a philosopher but he expressed the river very frankly. We ought to think that he could write this because he had no knowledge of the western philosophy. Now, let us consider an object A (or phenomenon A) while referring to the beginning of this old Japanese essay. A will change to B. B will change to C. C will change to D……. Then D will return to being A sooner or later. Paper is made from wood. Wood grows on the

2 For further information about the eastern (mostly old Japan‟s) natural thought, see „Limit of the Cartesian Order‟(References number four), ANPA Cambridge(2003), Ryo Morikawa. 3 The beginning of this essay is that ----- “Though the river's current never fails, the water passing, moment by moment, is never the same. Where the current pools, bubbles form on the surface, bursting and disappearing as others rise to replace them, none lasting long. In this world, people and their dwelling places are like that, always changing”. ----- Quotation from the website of Robert N Lawson, Washburn University, Kansas USA, http://www.washburn.edu/reference/bridge24/Hojoki.html. The Limits of Atomism: The Bohm Way of a New Ontology 15

Earth. Paper will return to the Earth sooner or later. Creation, in this way, is in a constant state of flux, and everything is constantly changing. Thus we are able to call this cyclic ontology[4]. A phenomenon comes into the world but this phenomenon then disappears and changes back into the form of phenomenon it was in the first place. We are unable to indicate whether one particular phenomenon is fundamental in this cycle. The fundamental thing is not the phenomenon itself but the cycle. Therefore, we can say that the cycle is the most fundamental stage. Let us recall the idea “river is a flow but not water, the flow is a change”. A change is an essential thing in our world. Now, we can depart from atomic ontology and come to terms with cyclic ontology[5].

Mathematical Formalism of Cyclic Ontology

It is possible to form cyclic ontology in mathematics. We will see a brief explanation of this formalism in this section. This formalism was developed by R. Morikawa in 2003 from Bohm‟s idea[6]. Here, we consider the implicate order and the explicate order that are introduced by Bohm[7]. Now, phenomenon A will appear in the explicate order from the implicate order. A will change the variation on the explicate order (for example A will change to B, B will change to C……) and it will return to the background, that is the implicate order, sooner or later. Let us consider this movement4. This is a mapping from the implicate order to the explicate order and vice versa. For example, will be mapped from the implicate order to the explicate order. Let us consider Green‟s function as a propagator. We can determine the form of the at the region „y‟ according to the sum of the contributions from {x} where this {x} means a set of points in a volume at a given time t1 (see Figure 1).

x,t1

V y,t2

Figure-1 Figure 1.

Thus:

y,t2 M x,t1; y,t2 x,t1 dx (1) V

4 This movement was named „holomovement‟ by Bohm. 16 Ryo Morikawa

where M x,t1;y,t2 is Green‟s function. The wave function at all points of the volume V contributes to the wave function at „y‟. We interpret that x,t1 is an order in the implicate order and y,t2 is an order in the explicate order. So that, is in the visible layer. According to Bohm, the implicate order is a sea of information. So we are able to interpret ----- the appearing order is an accumulation of the information coming from the implicate order. This means that the information enfolds as a wave function . In turn itself unfolds into a series of points on a later volume V ' (see Figure-2). We can say is enfolded into the implicate order as well. That is, the accumulated information diffuses to the implicate order again. Or, we can say that information from makes an order in the implicate order too. So can be interpreted as an accumulation from .

y,t2 V'

Figure-2

Figure 2.

' ' z,t M y,t2;z,t y,t2 dx (2) V

This can also be described as an order in the explicate order enfolding (going back, returning) into the implicate order. This also demonstrates both a mapping from the implicate order to the explicate order as well as a mapping from the explicate order to the implicate order.

The Heisenberg Equation

We can deduce the Heisenberg equation of motion using the above idea. This means that we construct a consistent ontology and a consistent theory from the microscopic to the macroscopic world.

Now we will consider two successive orders: e 1 and e 2 . We assume that

is a result of the unfolding movement. Then we describe the enfolding process as M1 and the unfolding process as M 2 . Order is enfolded; thus we write . Order The Limits of Atomism: The Bohm Way of a New Ontology 17

e 2 is unfolded; thus we write M 2 . Order e 1 and are successive orders, so is similar to . Indeed, the differences of these two orders are very small, that is to say the differences are infinitesimally small. Therefore we are able to equate these two expressions as;

e 1 M1 M2e 2 (3)

These two process are the reverse side and the obverse side, if M1 is the reverse side then is the obverse side. One order ( ) generates from the implicate order to the explicate order and is equal to an infinitesimally closed order ( ), which disappears from the explicate order to the implicate order. Equation (3) expresses this.

Let us now assume for simplicity M1 M2 M , where M exp iH . So that we can obtain,

e M 1eM (4)

If is very small, then we can write,

e 1 iH e 1 iH (5)

So that we can obtain,

ee i H, e (6)

from the equation (5). Therefore we obtain the Heisenberg equation,

de i H,e (7) d

The notable point is that we can deduce the Heisenberg equation using cyclic ontology.

The Bohm Equation

Let us substitute e AB for equation (7). Then we have,

dA dB i HA and i BH (8) d d 18 Ryo Morikawa

Now we can get two forms of the Schrödinger equation,

i t H and i t H (9)

We consider a pure state for simplicity. Then we will have,

i t i t t (10)

So we have,

i t , H 0 (11)

Moreover, we can get

i t , H 0 (12) from the Brown-Hiley[8] notation that,

i t i t t (13)

Here , H is an anti-commutator that ,H H H . P2 Put equation (10) into r and r and use the form H V for the result. Also, 2m here we define that P is a probability density P r,t r,t r,t r r . Then we obtain,

t j 0 (14)

Here j is the . Moreover, we put equation (13) into and . iS Then, if we write the wave function as a polar form Rexp h we thus obtain,

1 P r,t S r,t r , H r 0 (15) t 2

Substitute for (15) then we can get the quantum Hamilton-Jacobi equation that,

The Limits of Atomism: The Bohm Way of a New Ontology 19

S 2 2 R S V r 0 (16) t 2m 2mR

The third term on the left side is the that Bohm introduced in 1952.

Conclusion

We see that the fundamental equations, the Heisenberg equation of motion and the Bohm equation, can be deduced from the new methods of ontology. Therefore, we are able to say that our new ontology is by no means ad hoc because we do not have to take choice of tools, classical and quantum mechanics. Phenomenon A that we were considering was neither microscopic nor macroscopic phenomenon. We are unable to realize the difference (or chasm) between two worlds. Now we have a new tool. The next step is that we must consider how the world can be seen when we use this ontology in all of the modern physics fields. Bohm‟s formalism of the quantum mechanics leads to the same result as ordinary quantum theory. This means that we are able to see the world from a different point of view when we use the Bohm theory. Looking from the different view will be a powerful method for revising our view of the world. It provides the potential for us to find a new aspect of the world. This view will then help to create a new world.

References

[1] Ernst Cassirer: Substanzbegriff und Funktionsbegriff : Untersuchungen über die Grundfragen der Erkenntniskritik (1910), Philosophie der Symbolischen Formen (1923- 1929) and Das Erkenntnisproblem in der Philosophie und Wissenschaft der neueren Zeit (1906-1920,1950). [2] Ernst Cassirer: Kants Leben und Lehre (1918). [3] David Bohm: Wholeness and the Implicate Order, Routledge (1982). [4] Ryo Morikawa: Limit of the Cartesian Order, Proceeding of the 24th Annual International Meeting of the Alternative Natural Philosophy Association Cambridge, p49~ (2003) and Five Selected Papers of Basil J. Hiley, edited by Ryo Morikawa, Birkbeck College, University of London, p109~114 and p116~119 (2003). [5] Ryo Morikawa: ibid [4]. [6] Ryo Morikawa: ibid [4], and David Bohm: ibid [3]. [7] David Bohm: ibid [3]and David Bohm & Basil Hiley: The Undivided Universe, an ontological interpretation of quantum theory; Routledge (1993). [8] B.J.Hiley and M. R. Brown, Schrodinger revisited: the role of Dirac's 'standard' ket in the algebraic approach. quant-ph/0005026.

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 21-55 © 2010 Nova Science Publishers, Inc.

Chapter 3

DOING MATHEMATICS * ABOUT AND WITH THE BRAIN

Michael A. Arbib** Computer Science Department and USC Brain Project University of Southern California, Los Angeles, CA, USA

Introduction

I was trained as a mathematician but from an early stage my interest was attracted by the broader theme of “brains, machines, & mathematics” (Arbib, 1964, 1987), of trying to understand the similarities and differences between computing machines and the brain as seen from a mathematical perspective. I have thus worked actively both in the mathematical theory of computation (e.g., Arbib, 1969; Manes and Arbib, 1986) and in computational & cognitive neuroscience (e.g., Arbib 1972, 1979; Arbib et al., 1998). However, as the years have gone by, the proving of theorems has gradually receded from the foreground of my work, and the attempt to understand complex brain systems through computational modeling has come to the foreground. Nonetheless, my work has been informed by a variety of mathematical developments in the analysis of neural networks, and so here I will try to give you some sense of the many different challenges posed by the attempt to understand the brain, and in particular highlight a number of places where mathematics and the brain come together. It will be an ongoing theme that computational analysis and mathematics continue to challenge each other in this domain, as in many others. Often, a complex system does not reduce to easy formalization in a form for which the proving of theorems is possible, but it does yield itself to being simulated by a complex computer program which can then be used to discern patterns of behavior which may then motivate the development of a mathematical

* Portions of this article were delivered as the Rouse Ball Lecture in Mathematics for 2001 at Cambridge University, Cambridge, England on May 1, 2001. I express my deep thanks to Professor Herbert Huppert and his colleagues in the Faculty of Mathematics for their invitation and hospitality. ** E-mail address: [email protected] http://www-hbp.usc.edu/ 22 Michael A. Arbib abstraction which can help one gain perspective on the particularities of various simulations. Indeed, in time, the mathematics may allow one to prove the equivalence of different formalizations, allowing one to infer general properties of systems far more efficiently than would be possible with simulation with the original complicated computerization. I will thus try to give some sense of this give and take between modeling and mathematics as we try to understand the many different levels at which the brain can be analyzed. I have mentioned the give and take between mathematics and computational modeling in neuroscience, but should stress that the bulk of work in neuroscience only involves computers to the extent that they provide useful tools to administer and keep track of stimuli, and record the related responses. One result of this use of computers has been an increasing flood of data about the brain and the response to this has been some preliminary work in Neuroinformatics, which for some people is restricted to the construction of databases for neural data together with techniques for querying such databases and visualizing the results, but which to my mind should also include the tools of computational neuroscience (Arbib, 2001a,b). In fact, it is my thesis that in the future the most useful databases for neuroscience will be linked to models of a range of neural phenomena, and that the data will become understood to the extent that they are assimilated to these models, with the success of these models in making predictions justifying the model-based structure of the database. However, this is outside the scope of the present article.

Doing Mathematics about the Brain

Hodgkin and Huxley

The work of Hodgkin & Huxley (1952) is the most successful attempt to mathematically describe an aspect of neural function, and also reflects a style which is dominant in neuroscience, in distinction perhaps to modern physics. I refer to the inductive approach, the view that the truth is to be found by gathering many, many facts to discover the patterns that emerge. This is in distinction to the view that the right mathematical framework can help us discover new realities, as was dramatically demonstrated by Dirac’s discovery of the positron simply by noting that his equations for the electron had a symmetric solution with positive charge. In any case, the starting point for Hodgkin & Huxley was the observation that, like all other cells, the neuron would have a difference in electrical potential across its bounding membrane. Moreover, it was known that if one sought to measure this potential along the axon (the output line) of a neuron then it was relatively rare for an axon to behave like a cable in the sense of Lord Kelvin, with the potential dropping off exponentially with distance. Instead, the potential was active and would propagate without decrement along the axon so long as the membrane potential at the axon hillock passed some critical threshold. Hodgkin & Huxley sought to measure this propagation of the membrane potential in great detail and for this they chose as their preparation the giant squid axon. (I must confess that I was most disappointed when I learned that this was the giant axon of a typical squid, rather then an axon from a giant squid.) This axon was large enough that they could insert an electrode down the middle of the cylinder and thus perform very delicate recordings of the membrane potential in a variety of conditions. Then they took this massive data and – using hand- Doing Mathematics about and with the Brain 23 operated mechanical calculators –laboriously cranked the numbers to fit the observed curves. But this was not induction in any pedestrian sense of the term. They sought not simply to fit the curves by just adding enough parameters – it was the Cambridge mathematician Littlewood (1953) who observed “Give me four parameters and I can draw you an elephant; give me five and I can tie a knot in his tail” – but rather they sought to come up with differential equations that elegantly described these curves. The form of these equations lent itself to the interpretation of the structure of the membrane in terms of channels which could be triggered by an action potential to selectively gate the passage of sodium and potassium ions across the channel membrane. This mathematical description preceded a set of ongoing empirical discovers which have shown that these channels are not mere mathematical descriptions but amazing macromolecules whose diversity continues to be explored. However, the story goes even further than that. Mathematicians have taken the four-variable Hodgkin-Huxley equations and shown that many of the properties of their solutions can be explored analytically or qualitatively by two-dimensional differential equations. With these models it becomes possible to understand the way in which parametric differences can yield a wonderful variety of dynamics, going from spontaneous spiking to spiking which follows the input stimulus to spiking which responds in isolated bursts. At the same time, neurophysiologists with a variety of techniques, including the recently discovered patch clamp technique, have learned how to block different channels, and thus – as they extend the number of substances they find which can block some aspect of the membrane potential while at the same time seeing differences in the effects of such blocking – they have extended the catalogue of channels, and these have yielded mathematical descriptions which allow us to understand a wide variety of properties that govern spike propagation. Going further, other neuroscientists have seen how these equations can be modified to explain the activity of synapses, showing different ways in which the electrical activity in one axon can, by the chemical mediation of the neurotransmitters, modify the activity of another. In a way then, this classic work of 1952 by Hodgkin & Huxley – which led them to share the Nobel Prize in Physiology or Medicine for 1963 with John Eccles from Australia – remains the acme of mathematical description of a neural system and of the inductive approach in which equations are inferred from painstaking and massive data analysis. But what has been exciting about the Hodgkin-Huxley equations is their richness both in stimulating the discovery of ever more detailed structure of the neuron, and also in stimulating mathematicians to develop the tools of qualitative description of nonlinear differential equations to understand the immense richness of dynamics that the axons and cell membranes of neurons can support. (Examples of such studies may be found in Koch and Segev, 1998.)

Warren McCulloch

Warren McCulloch represents both the search for discrete rather than continuous representations of the brain, and also a romantic quest driven more by a theoretical perspective than by a specific set of data (Arbib, 2000). Although during his long career, McCulloch made many contributions to neuroanatomy and related functional aspects of the brain, his career was really defined from his undergraduate days by a philosophical desire to understand the nature of human thought. He described his work as experimental 24 Michael A. Arbib epistemology, and one of the questions that drove him was expressed in the title of his paper “What is a number that a man may know it, and a man that he may know a number?" (1961). He was driven by his studies of Descartes, Leibnitz and Kant to try to understand the logic of the brain, and in a partnership with the brilliant but eccentric young mathematician Walter Pitts he came up with his answer in a 1943 paper “A Logical Calculus of the Ideas Immanent in Nervous Activity” (McCulloch and Pitts, 1943). Here, they considered the time-line of the neuron to be discrete rather then continuous, with one tick of the clock for each refractory period (the time in which a neuron is unable to respond with a new spike after it has passed threshold and transmitted a spike down its axon). They considered the effects of the synapses to be either moving the neuron towards or away from threshold. They thus modeled each neuron as a threshold device, and it is clear that such devices can implement the logical primitives AND, OR, and NOT, and from these any Boolean function can be realized. Moreover, they were able to demonstrate (or, at least to claim to demonstrate, since the Carnapian notation they used is almost inscrutable) that given the ability to form networks of their neurons that included loops, it was possible to realize the control function of any Turing machine. Turing (1936) had shown that any effectively computable function on the natural numbers could be computed by what is now called a Turing machine, equipped with suitable control box. What McCulloch and Pitts had done was show that a network of neurons, plausibly abstracted to a discrete time-scale in the way I have just described, could provide the computational core of any operation that a human could carry out following well-defined procedures. In a sense, then, they had provided a mathematical counter-proof to the dualistic notion that mind was separate from brain and required the operation of some abstract soul to supplement the brain’s physical operations. From a philosophical point of view, then, this was an extremely important result. In addition, other workers including Donald Hebb (1949) and Frank Rosenblatt (1958) were able to devise ways in which a neural network coupled with receptors and effectors could modify the synaptic strengths of its connections through experience, thus learning to “perceive” regularities in its world, or come to better and better classify patterns according to criteria set by a teacher. This, indeed, has provided the foundation for a vast set of application of artificial neural networks – in other words, networks of McCulloch-Pitts neurons, or simple generalizations of McCulloch-Pitts neurons, equipped with some form of learning rule. This has also led to an elegant body of mathematics associating different learning rules with different styles of statistical inference (see, e.g., Bishop 1995 for a textbook account). Despite these important achievements, and even though Warren McCulloch was a seminal influence in my career, and much of my work in automata theory was an outgrowth of my understanding of the classic work of McCulloch & Pitts and its relation to the work of Turing, I must say that in the long run I have come to feel that the 1943 paper is not the path to understanding “What is a Man that he may know a number, and a number that a Man may know it” because it postulated a direct linkage from the formulas of logic to the function of single neurons. However, I would now argue that the concepts in our "mental vocabulary" and the "logical connectives" which join them appear to be realized in the brain through patterns of activity distributed over many neurons, and indeed neurons in distinct parts of the brain. Thus, I see logic not as the natural expression of neuron by neuron operation as in the 1943 vision, but rather as a cultural achievement in which we as humans have discovered new ideas, whether in language or mathematics or logic or music or the dance, finding ways of Doing Mathematics about and with the Brain 25 instructing successive generations so that their mechanisms for synaptic plasticity can create distributed computers for these operations in a fashion in which nurture – in the guise of “Self- organization” – is at least as important as nature. It is another early paper of Pitts and McCulloch that I believe holds a greater lesson for our understanding of the brain, a paper inspired more by group theory then by logic. In “How we know universals: the perception of auditory and visual forms”, the question was how can we tell that a square is a square, even though each time we see it, it is in a different position, or size, or orientation. Pitts and McCulloch (1947) suggested that the result was to see geometric forms as invariants under groups of such transformations (one is reminded of the Erlangen program of Felix Klein) and came up with two models of neural networks which could implement this idea. The idea was that the visual input would be spread out into a map of different features, such as corners and edges. One form of the theory then showed how to analyze these features to find the appropriate group transformation that would reduce the pattern to a standard form; the other asked how to apply the elements of the group to the given sets of features to find a group invariant associated with the stimulus. In later work with the neurophysiologist Jerome Y. Lettvin and the Chilean neuroanatomist Humberto Maturana, they set out to determine whether the scheme was exemplified in the frog’s brain, and the result was a remarkable paper (Lettvin et al., 1959) called “What the Frog’s Eye Tells the Frog’s Brain”. Here they showed that the passage from the retina to the tectum, the key visual area of the frog’s midbrain, could indeed be characterized as the extraction of four types of local features from such visual input-features as small moving object, or large dark moving object, and that these four “maps” of the world were sent back in registration to the tectum, with terminations at different layers. We may say that the input to the tectum can then be characterized as four retinotopic maps- where retinotopy comes from the words retina and topos, to indicating a map based on place on the retina. Thus, while the search for group operators had failed, Pitts and McCulloch had succeeded on two fronts: the work vindicated some of the ideas that McCulloch had brought from his study of Kant to the nervous system; and they had demonstrated a species-specific recoding of the world as a registered set of retinotopic feature maps. Indeed, in the years since, the search for feature maps has dominated the study of sensory systems in the brain. Most notably, Hubel and Wiesel gained their Nobel Prize (shared with Roger Sperry for his study of the split brain) by their analysis of the way in which the mammalian retina (they first studied cat then monkey) extracts features which essentially enhance contrast, but then transmit this through thalamus to visual cortex where essential features such as edges separating one region of a figure from another are found (e.g., Hubel and Wiesel, 1962, 1977).

Brains, Machines and Mathematics

My first book “Brains, Machines, and Mathematics” was published in 1964. Sadly, Norbert Wiener, the founder of cybernetics and another of my mentors at MIT, died at about that time. (He had been my Ph.D. advisor until he left MIT for a year’s sabbatical). This sad event, though, proved to be to my good fortune, since Wiener’s work had excited the public interest, and so his death brought cybernetics back into the news, one result being that my book received the lead review in the Scientific American, the reviewer being Jacob Bronowski. Bronowski said a number of nice things about the book which may have helped 26 Michael A. Arbib contribute to its excellent sales, but he made one observation which I did not appreciate (in either sense of the word) at the time: he claimed that the book did not present the right mathematics for the brain. Well, of course, that is easy to say. It is much harder to say what the right mathematics is for the brain then to proclaim its absence. Let me then, advance our analysis of “The Many Mathematics of the Brain” by briefly summarizing what mathematics the book did present, and then briefly listing some of the areas which have since been the focus of mathematical attention in neuroscience.

The Automata-Theoretic Approach to Brain Theory

Perhaps the dominant part of the book was the attempt to relate Turing machines, finite automata, and neural networks. It thus looked at the brain in terms of computability at a time when complexity theory had not yet emerged. (Another one of McCulloch’s group at the time I did my Ph.D. at MIT was Manuel Blum, whose work provided one of the fundamental theorems that got this branch of computability theory started.) I also provided a simple proof of Gödel’s incompleteness theorem, and offered an analysis of why such a theorem did not support the philosophical weight that many had assigned to it when they claimed that this result proved that no machine could think. Simply put, Gödel’s incompleteness theorem shows that there are true fact about arithmetic (and thus mathematics generally) that cannot be derived from any given finite set of axioms – unless the axioms are inconsistent, in which case anything can be derived from the axioms, whether true or false. However, people who suggest that this places a limit on machines seem to mistake the nature of machines that would ever be designed to emulate some aspect of human intelligence. I know of no human who never makes mistakes. Indeed, it is part of the human condition that we are continually learning from our experience, and even if most of our core beliefs remain unshaken, many of our approaches to solving the problems of the intellect or of day-to-day living change with the times. In short, then, any human-like machine, like any human, would involve constant updating of the information on which its decisions are based. It would be a learning machine, not a logical machine making inferences from a fixed set of axioms. This is a profound insight, but as I have already mentioned I find the basic automata-theoretic approach to the brain of the original 1943 paper by McCulloch and Pitts to be misleading when we try to understand the brain. To this extent Bronowski was right. There were two other mathematical approaches in the book which I think Bronowski did not appreciate. One of these is control theory, the analysis of feedback, which was at the heart of Wiener’s cybernetics (Wiener, 1948). Here the idea is that an organism does not simply make an action and proceed as if the action has its intended effect; rather it must continually test the world against its expectations, and adjust its actions accordingly to achieve some desired goal. This work has led to an immense mathematical structure within control theory, and has many ideas that are still relevant to the study of the brain, especially those when we work backwards from biomechanical models of how we move and interact with the world, to think about how the spinal cord must control these actions through its own feedback loops, and how that control must be subject to the higher dictates of the midbrain and the forebrain with their access to the rich sensory inputs provided by the head, with its eyes, ears, nose, and tongue as well as its sense of balance. Doing Mathematics about and with the Brain 27

Another topic of enduring value is the attempt to relate synaptic plasticity to overall function in response to experience. This is such a central topic in brain theory, that I will say nothing more here, but instead discuss it more fully later. The final mathematical topic that was treated in my book was Shannon’s information theory. Recently, the relationship of statistical mechanics to neural networks has been revived and had a great impact on mathematical analysis of formal neural networks. Thus, I think I did somewhat better in foreseeing the future of the mathematics of the brain then Bronowski realized. At the same time, I can with hindsight see a number of achievements which go well beyond what my book had to say back then. The first is that I completely ignored the Hodgkin-Huxley equation and could not foresee the beautiful mathematics it was to inspire. Indeed, the dynamics of nonlinear systems has come to play an important part not only in exploring the implications of the Hodgkin-Huxley equations and their cousins, but also in studying the large scale properties of neural networks, exploring how the limit properties of nonlinear dynamics may shed light on the equilibrium points, oscillations, and chaotic behavior of neural networks, both artificial and biological neural networks. Again, Rosenblatt's perceptron provides just one aspect of the possible approaches to synaptic modification, and the error-based style of synaptic change that it offers can be supplemented by the Hebbian style of unsupervised learning as well as by the general indications of success or failure provided by reinforcement learning. These diverse approaches have led on the one hand to a biological quest to understand the mechanisms of synaptic plasticity, revealing that indeed different part of the brain have synapses with different types of plasticity; and also to mathematical efforts which relate different techniques of statistical inference to different synaptic rules, even to the point of making contact with differential geometry where the neural manifolds can be related to probabilities (Amari and Nagaoka, 2000). But perhaps the greatest lack of that view of the early 1960’s was that it was a unilevel view – it focused on what could be done with a single network of simple neurons. This is indeed where much of the work on artificial neural networks remains today, but to truly understand the brain we must go both upwards to understand the contributions made by the many distinct regions of the brain as they work together to solve the overall behavior or thought of the individual –and downwards, as I have already hinted in my discussion of Hodgkin and Huxley and of synaptic plasticity, to probe the mechanisms which underlie the operation of individual neurons, because as we approach this level of reduction, we chart rich properties of the individual neuron beyond the basic computing model offered by McCulloch and Pitts.

The Many Levels of Brain Analysis

Below the Neuron

To complement this analysis on how to move from single neurons up to the level human thought and behavior, let me briefly remind you of the earlier discussion of the challenges of moving down from the neuron towards the Hodgkin-Huxley equation and the analysis of plasticity in individual synapses on down to the channels which mediate the passage of ions and neural transmitters, and the way in which the transmission properties of these channels 28 Michael A. Arbib may both mediate a wide variety of patterns of neural activity and also change over time to yield synaptic plasticity. As we proceed down we may end up with equations which combine the differential equations of chemical kinetics to explore local interactions around a given channel with the diffusion equations, which show how effects at different sites interact with each other. In particular, there has been much attention to compartmental models, in which, rather than analyze the neuron at the level of individual channels, the attempt is to subdivide the neuron into a relatively small number of compartments, each of which is treated as iso- potential, to economically capture the overall behavior of the neuron. One may use hundreds of compartments to capture subtle properties of an individual neuron, but great success has been gained by showing how even two or three compartment s may yield insights into subtle properties of the neuron which cannot be captured with a single compartment. Illuminating examples come from the work of Traub and Miles (1991) on neurons in the hippocampus and from the work of Mainen and Sejnowski on pyramidal cells of cortex. I will not go into further detail here (again, see Koch and Segev, 1998), but just note again the important theme sounded before: namely the use of both computer and mathematical analysis to validate simplifications which allow us to find the right projection of the complexity of units at one level of analysis to provide simplified but useful components for large scale models at the next level of analysis.

Population Coding and the Ergodic Hypothesis

The activity of a single cell involves a pattern of spiking for which it may be very hard indeed to infer what is relevant to the brain’s operation, and yet the envelope of activity of firing of a whole population of neurons may provide a code that is quite clear to the observer. If we look at the firing of a single cell on repeated trials of the same situation, we may see a huge statistical variability, and yet see a clearly interpretable signal when we form a histogram linking the activity over a number of such trials. Of course, the brain cannot react at one given time to the histogram over many trials of the cell's response. However, if there are many neurons with similar response characteristics, then their spatial summation – and that is what may affect other cells to which they are densely connected – may provide the same sort of information that the neurophysiologist can gain from time averages of a single cell’s activity. In short, the brain may well be understandable in many instances if we apply an ergodic hypothesis, to use a term from statistical mechanics, namely the idea that in a large enough population averages over space and time may be interchangeable. Indeed, this hypothesis may anchor the reason for the success of a number of models that people such as myself have constructed in which the fine details of neural activity – so visible when we looked at the effects of different channels on the performance of modified Hodgkin-Huxley equations – may nonetheless be effective. In such models, we do not look at the timing of pulses, but simply look at a moving average of a firing rate of neurons (Arbib, 2001b). What we get, then, is a model with far less neurons than the actual neural network in the brain, but one in which, one may hope, the activity of each formal neuron represents the spatial average of the activity of a large number of similar biological neurons. In this way, the simplified network, with the simplified coding hypothesis, can still provide an informative account of the way in which the brain relates sensory stimuli to motor behavior, and also provide insights into the statistics of the activity of given neurons even though it does not follow their Doing Mathematics about and with the Brain 29 spike-by-spike activation. Having said this, of course, one does note that there are activities in which alternating bursts, as distinct from regular variations in firing rate, are crucial, and one may seek to understand what mathematical approximations may be more appropriate in those cases, as a replacement for the average firing rate code while avoiding all the details of axon by axon spike train analysis. For example, in study of neural oscillators, we may in some cases replace a cell's firing by a simple phase variable.

Cooperative Computation in Neural Fields

In the spirit of approximating the brain by a set of layers in which each neuron is represented by its mean firing rate, rather then its spike by spike reaction, one can go even further and ignore the discrete nature of the network of neurons in the layer and instead look at the “Neural Field” in which the activity at each point of the continuous field might be seen as corresponding to the firing rate of nearby neurons in the biological network for which the field is the mathematical representation. In 1975, I had the pleasure of welcoming to the University of Massachusetts, where I then directed the Center for Systems Neuroscience, the Japanese mathematician Shun-Ichi Amari, who has long established himself as the leading mathematical analyst of neural networks in Japan. Indeed, he has gone on to pioneer the field of information geometry and its use to relate the synaptic pattern of neural networks to the neural manifold whose Riemannian structure reflects probability distributions whose task it is the neural network’s to estimate. However, during his stay with me in the mid 1970’s, Amari developed the mathematical formulation in terms of neural fields for two models that my group had analyzed, and thus set the foundation for the mathematical theory of computation and cooperation in neural networks. The first problem went back to my interest in extending the work of Lettvin, Maturana, McCulloch & Pitts from “What the Frog’s Eye Tells the Frog’s Brain” to “What the Frog’s Eye Tells the Frog”. Here the emphasis was not on how visual input was transformed from retina to tectum, but rather how this transformation served the behavior of the animal. This was indeed to lead to a long and fruitful collaboration with both theorists and experimentalists in the study of frog visuomotor coordination (see, e.g., Arbib and Ewert 1991), but my concern here is with the first model that was built in collaboration with Rich Didday and which addressed the question of what happens when a frog is confronted with two flies. David Ingle had observed that normally the frog will snap at the most “salient” of the two flies, but that in some cases it will snap at the “average fly” (Didday, 1970, 1976). It is easy from a conventional computer programming point of view to write a program that will search a list of values to extract the maximum, and one could thus design a serial program to search positions on the tectum and return the position at which the activity was at maximum. One might even go further and in the case of a close tie, report the mid point of the line joining the two peaks of near maximum activity. However, the question which Rich Didday and I set ourselves was to discover how to find the maximum using a process of distributed computation spread out over a network without the sort of executive control appropriate to a serial computer. We came up with a design which, when implemented on a computer, did indeed function in a way that accorded with Ingle’s observations, and moreover relied on certain classes of cells that seemed close to some observed by Lettvin. 30 Michael A. Arbib

In the work with Amari (Amari and Arbib, 1977), it proved possible to analyze the network using the qualitative theory of differential equations, seeking criteria upon the equilibria of equations as a function of the input values on the basis of different setting of the parameters. We determined conditions under which, if the network converged, there would be at most a fixed number of surviving units, and such that those that survived the competition were indeed the ones receiving maximal input. Interestingly however, we showed that this result held true only if the activity levels of the units had been equal at the beginning of the competition. If the initial activity had been biased towards one unit, then the overall system would exhibit hysteresis, and we devised special mechanisms whereby the model could respond to sudden changes in input to give the newly strong inputs an appropriate chance in the competition rather then being drowned out by the residue of prior activity. The other model we formalized had been developed with Parvati Dev (1975) to model stereopsis, the way in which the input to the two eyes could be combined to yield estimates of the depths of objects in the surrounding world. The catch is the so called “Correspondence problem” or “stimulus-matching problem”, namely how does the left eye “know” which feature on the right eye corresponds to the given feature that it itself is sensing in the environment. Amari and I were able to show that the Dev Model could be recast as a network of Didday-like models, with competition to determine the appropriate depth in each visual direction modulated by the effects of activity in nearby directions, thus favoring the perception of the world as a set of surfaces.

Regularization

A similar model to the Dev-Model was developed by Marr and Poggio (1977), who later on went on to produce a more psychophysically realistic model (Marr and Poggio 1979). Since then a great deal has been done to develop models which are increasingly subtle and provide increasing insights into the psychophysics of depth perception. However, the point of relevance to the present lecture is that Poggio and his colleagues were able to abstract from the type of model Dev and Amari and I had developed, and other work on optic flow, a general and interesting mathematical framework, namely that of regularization (Poggio, Torre and Koch, 1985). In stereo, a three-dimensional world is reduced to two-dimensional snapshots. In optic flow, a changing three-dimensional world is reduced to a discrete sequence of two-dimensional snapshots. In each case, there is no unique inverse, and instead the challenge is to determine which of the possible quasi-inverses is the appropriate one. The choice is to couple the projection criterion with some sort of regularization . One might, for example, choose smoothest as the winning candidate from the set of possible three- dimensional sets of surfaces, the one which is. It proved possible not only to develop some interesting mathematical theory, in this case, but also to show how the result could be reduced to networks of quasi-neural elements, which could easily be implemented using analog VLSI (Koch, 1989). Regularization theory has also played an important role in learning techniques, as well as curve fitting in general and a wide variety of problems in nonlinear analysis that fall well outside the scope of brain theory. Doing Mathematics about and with the Brain 31

Schema Theory

Let me turn now from the issue of the mathematical analysis of computation in layered fields with neural-like properties to another approach that came out of my work with Rich Didday, namely schema theory. In classic studies of pattern recognition, going back to the perceptron, the challenge had been to understand how it was that a particular pattern - such as the letter A vs. the letter B, or one face rather then another – might be discriminated by an appropriate output from a neural network whose synaptic weights had been adjusted by experience. However, consideration of the frog had led me to two further considerations. Firstly, the goal of vision in general is not so much to classify the input as to determine an appropriate course of action. Secondly, in general one is not confronted by a single object but by a scene, and the challenge is to determine which few of the objects in the scene are worthy of attention and then to respond properly to the novel opportunity that they afford through their relationships. In thinking about extending this from the level of the frog responding to a number of prey to a human interacting with the world, I was led to the slide-box metaphor (Arbib and Didday 1970; Arbib 1972) based on the way in which cartoonists could then produce the frames of their movies by the use of cels. The background of a scene could be drawn once and left invariant perhaps for seconds. Trees and houses in the middle distance might be drawn once but moved from scene to scene as the viewpoint changed. Finally, for figures in the foreground, some features might be drawn on a single cel used for multiple frames, whereas others might be used for a single frame because – as for example in the case of mouth movements – they might represent rapidly changing parameters. My idea, then, was that the brain would not operate by taking a snapshot anew each fraction of a second and then completing a segmentation and detailed analysis, but would, rather, come up with a segmentation that could be linked to a “slide” that would represent that fraction of the scene, with much of ensuing perception being to simply update the registration of the slide against different parts of the current scene, and to adjust various parameters to better match the input. What made the idea even more interesting was that it was inherently action-oriented; requiring that each slide did not simply provide a classification of an object, but also provided access to a set of actions appropriate to that object. In subsequent developments, I replaced the term “slide” by the term “schema” to emphasize a commonality of interest with the analysis of schemas by the noted Swiss developmental Psychologist, John Piaget (1971). Subsequent work came to pay further attention to the notion that one should not simply consider schemas on a region by region basis quasi-independently, but should rather stress that any portion of the image might on its own have multiple possible interpretations, so that one region could provide the context for the interpretation of another region. In other words, schemas could both compete to provide the appropriate plan of action related to one region of the environment while mutually coherent regions could have their schemas cooperate to produce a consistent overall interpretation. I also took pains to emphasize that a given high level schema could be decomposed into the interaction of a number of more refined schemas, with this process of decomposition proceeding until one could possibly map the activity of each schema onto a specific neural network of the brain. At the same time, I showed there was no guarantee that a schema defined by observing the behavior of an animal, or a schema decomposition based on analysis of a variety of behaviors, might actually survive the attempt to map it onto brain 32 Michael A. Arbib regions (for a recent review and more on the examples below, see Chapter 3 of Arbib et al., 1998). The canonical demonstration of this came with the analysis with work on the toad by Peter Ewert. Briefly, if a frog or toad sees a small moving object it will snap at it, whereas if it sees a large moving object it will jump away from it. This might be explained by a simple hypothesis in which the perceptual schema for small prey triggers the snapping motor schema whereas the perceptual schema for large moving objects triggers the motor schema for avoidance. That is certainly fine as a functional explanation of the behavior of the normal animal, but Ewert then explored the hypothesis that these schemas were neurally localized, and that, in particular the perceptual schema for prey avoidance was localized in a region of the brain called the pretectum. The simple model I just showed you would predict that an animal with a lesion of pretectum would continue to snap at small moving objects, but would not react at all in the presence of large moving objects. However, Ewert observed that, instead, while the lesioned toad would snap at small moving objects, it would also snap at large moving objects! We explained this by having the schema for approach triggered by the perceptual schema, for we have it triggered by a perceptual schema for all moving objects, not for small moving objects. However, we still had the pretectum implement the perceptual schema for large moving objects, but now it not only excited the motor schema for avoidance, but also inhibited the motor schema for approach. This new schema assemblage not only predicts the behavior of the normal animal but also predicts the behavior of the animal with a pretectal lesion. This makes the point, then, that at first sight may seem like a unitary schema to be implemented by a single brain region – in this case the perceptual schema for small moving objects – is in fact the reflection of the competition and corporation between multiple brain regions. It also shows that even at the level of schemas, hypotheses can be formulated and tested by empirical data whether by lesions as in this case, or other types of monitoring of neural activity in other cases. Given such more refined hypothesis about what the function of a particular region of the brain is, one may then proceed to make a more detailed analysis of the fine data from neuroanatomy and neurophysiology to tease out the neural networks and analyze in more detail how indeed the postulated schemas might be implemented in the networks of the real brain, perhaps further refining the definition of the schemas in the process. Without going into further details here, let me simply say the schemas have been useful both in the neuroscience of vision and action as well as in higher cognitive functions like language and social behavior; and that they have also been useful in artificial intelligence in such areas as computer vision and robot control. There has even been a little bit of related mathematical analysis, at a level which looks more like the theory of networks of interacting automata than anything else. However, I must confess to a two-fold failure here. The first is that the successes of schema theory have been piecemeal rather than systematic, with a number of important models using the developing vocabulary that we have developed to analyze competition and cooperation in schema networks, rather then the development of a powerful mathematical methodology which can be plugged in to yield useful results in a multitude of applications. The other problem is that within experimental neuroscience itself, there has been in the past a tendency for people to focus their neurophysiological efforts on the activity of a single region of the brain in a limited set of related tasks. This has thus stopped people from asking precisely the sort of question for which schema theory tries to Doing Mathematics about and with the Brain 33 find the appropriate methodology: namely how a complex network of tasks is played out over a complex set of interacting neural networks when there is no obvious correspondence between subtasks and networks.

Synthetic PET

I am hopeful that recent developments in the technology of human brain imaging will eventually force cognitive neuroscientists to come to terms with the need for some version of schema theory, and that the result will see a development in both mathematical analysis and computational implementation far beyond what we have seen to date. Briefly, there is a belief that when a part of the brain is more active in one task then another, the regional cerebral blood flow (rCBF) will increase accordingly – though it must be noted that there is still some debate as to what it is about the activity of the region that is actually responsible for the increased blood flow. In our own development of something called Synthetic PET (Arbib et al., 1995), Amanda Bischoff, Andy Fagg, Scott Grafton and I have used the assumption that synaptic activity, rather then cell firing, is the best correlate for the blood flow to a region. Whatever the methodological and theoretical shortcomings, however, the fact is that with these methods – most notably positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) - it becomes possible to compare the activity of the brain in two different tasks, and to report which region of the brain is more active in one task then in another. The initial response to this work was what might be called “boxology”, since it was so easy to make the transition from saying “This is a part of the brain that is more active in task A than in task B” to saying “This is one of the parts of the brain that implements task A rather then task B”, overlooking the fact that a part of the brain might be active in both tasks. Schema theory with its attempt to provide a high level description of what each brain region does (in the context of a broader theory of how competition and cooperation may yield overall behaviors) may provide a better way of speaking about such data. At the same time, developments of Synthetic PET methodology can allow us to build neural network models, informed by homologies between the human brain and those of other species, which can suggest what the fine scale internal workings of each brain region are, and then use predictions from averages of detailed neural activity to get predictions of brain imaging activity to constrain these neural models against human data as well as the animal data on which the homologies were based. I predict that the result will lead to a greatly enhanced understanding of human brain activity, which will require computational neuroscience bridging from neural network modeling to schema theoretic analysis as its foundation, through a continual confrontation of data from animal studies with data from the human.

Space, The Final Frontier

Changing mathematical formalizations of space have provided the grounding for much of mathematical physics. Euclidean geometry provided the framework for Newton’s theory of gravitation, and also provided the framework in which he and Leibnitz developed the differential and integral calculus. It was the mathematical question of whether or not one could prove the parallel postulate from the other axioms of Euclidean geometry that led to the 34 Michael A. Arbib discovery that it was indeed independent of the other axioms, with the definition of the very different non-Euclidean geometries of Riemann and Lobachevsky. Riemannian geometry then provided the mathematical vocabulary in which Einstein could express his amazing insights into special relativity, with the work of Minkowski providing the appropriate metric for linking space and time. Even more impressively, it was prior mathematical work in differential geometry that allowed Einstein to go beyond the fixed geometry of space-time of special relativity to the new space-time of general relativity in which distortions of space due to the presence of masses gave the explanation of how gravity could act not through instantaneous action at a distance but through the propagation of distortions of space-time. Quantum mechanics helped us think of reality not only in terms of the movement of particles through space and time but in the dual terms of particles and of probability amplitudes seen as patterns in Hilbert space. And now in super string theory and M-theory we see increasingly rich mathematical theories driving and responding to the attempt to unify all the physical forces the submersion of the four dimensions of space-time into a space of ten dimensions, more or less. Such work in physics has seen a rich interplay between developments in pure mathematics and developments in physical insight. The record to date in brain theory is far more modest. As yet, there has been no major new mathematical theory which has emerged to find a match of incredible predictive power with the brain. Nonetheless, study of the brain can pose new challenges to our understanding of space which already allow us to think about our interaction with the world in new ways, and which pose exciting challenges for the mathematics of the future. In trying to understand the brain, we can think of space as both an external reality in which our interaction with objects and other actors takes place, or as something that we discover through these interactions. One of my professors at Sydney University, TG Room, an algebraic geometer of the old school, wrote a fine book (Room, 1967) in which he gave a plausible account of how the axioms of Euclidean geometry might have been discovered by a farmer geometer on the banks of the Nile, marking out his fields and observing various patterns of relationship as he sought to lay parallel furrows and estimate the relationship between different measurements he made of his land. What is interesting about this account is that, although in itself a mathematical exercise, it makes the point of space not simply as something “out there”, but also as something whose structure can be discovered by our interactions with the world. A more exotic demonstration of this was given by the young Frenchman Jean Nicod (1970) in his book “Induction and Geometry” in which he imagined the exotic geometries that would be created by the interaction of different creatures with their worlds. One such example was to think of the geometry of a creature whose only activity was to move up and down a keyboard and whose only sensations were of the sounds thus produced. From a different, point of view Von Uexkull imagined the life space of different creatures differing depending upon the different behaviors and perceptual systems at their disposal. Here I want to talk a little bit more about how space is represented in the brain. People often use the terms “frame of reference” or “coordinate systems” to talk of neural codes, but I think this is misleading. I have already mentioned the retinotopy of the frog brain, where we see the world transduced into activity in a set of layers of neurons. In each layer the activity of a neuron corresponds to the correlation of the world in the neighborhood of a corresponding point in space with some basic feature. We thus get the representation of the world not in terms of its three or four coordinates of x, y, z and possibly t, but rather a Doing Mathematics about and with the Brain 35 discrete grid with at each neuron the representation of some feature of the light reflected from some corresponding direction in the world. What is intriguing from the study of the frog onwards into more subtle representations is that such activity has already evolved in terms of the animal’s repertoire. Thus, for example, features extracted by the retina of the frog can be correlated to a first approximation with the presence of the small moving objects that may serve as prey and the large moving objects that may indeed be predators. However, the frog has a lot more brain then just its retina, showing that it is no trivial property to go from such retinotopic feature maps to the determination of a course of action. In my own work, for example, we have looked at the interaction of representations of possible prey objects, possible predators, and possible barriers in the environment to determine a course of action which first requires the decision as to whether to feed upon a prey or escape an enemy, but must then further involve the determination of an appropriate trajectory for that behavior which will avoid a collision with any barriers (Arbib and House 1987, Corbacho and Arbib, 1995). In addition, the behavior will depend not only upon the location of the various objects, and thus on this spatial relationship, but also upon the relative trajectories of that object – for example, it is clearly a good idea to know in which direction your enemy is moving before you determine the appropriate direction of escape. Nowhere in the resultant models – and, I will be bold to say, in the actual brain of a living frog – do we find any place where the brain’s activity channels down to a neural encoding of three coordinates like x, y, and z. Instead, the problem is to go from the full complexity of a scene – of objects disposed and moving in space – to the spatiotemporal pattern of activity on both retinas through various intermediate stages to finally yield the appropriate temporal pattern of activation of all the muscles which control adaptive behavior. To this we add that these relationships can change with experience. Moreover, all these behaviors that I have described in the frog correspond to what in the human brain would be served just the brain stem and spinal cord. So, one might ask what is the job of all that cerebral cortex that is so impressive a distinction of the mammals in general and of humans in particular. I think the answer is that it is our ability to live in an extended present which includes reflection upon the past and planning for the future, considering many alternative courses of action and then laying our activity out in a way which may extend over months or even years to achieve our goals. Another thing I want to stress is that the cues from many different modalities must be integrated. To take just one example, the owl can locate its prey in the dark through hearing, but it does so through a series of transformations that translate the difference in intensity and time of arrival of sounds from its prey to the two ears into a set of signals from the auditory system which play upon the visual map in the superior colliculus (the analog of the tectum we have talked about in the frogs), thus integrating auditory information into a spatial framework. Even more impressively, such integration can occur in the somatosensory system, translating position on the body into visual coordinates, so that one may look at the position on one’s body which has become itchy. This transformation is particularly impressive because it must vary as the limbs change their position since the movement of the eyes and head is to the point on the arm or leg that is being itched no matter how that arm or leg may have moved relative to the body. Amazingly, it is been shown in the frog that the body schema which enables the frog to adjust the motion of its foreleg to wipe an irritant off its body or hind limb can be implemented by the spinal cord – the frog with its brain separated from its spinal cord can still accurately wipe off irritants from body and limbs (Fukson et al., 1986). One more example, before I try to offer a summary of the view of space which the 36 Michael A. Arbib brain suggests. Consider learning to compensate for the effect of prisms when throwing a ball. A priori, one might expect that what is involved is that the part of the brain that transforms the visual input into estimation of the location of position in space – in this case, the position of the target – will adapt to eventually transform the distorted input into a new veridical representation. If this were the case, then once one had learnt to adjust this visual representation to throw effectively, then all other skills should henceforth be unperturbed by the presence of the prisms. Surprisingly, this is not true. If one learns to throw accurately with the right hand, there will be almost no transfer to improve the accuracy of one’s throw with the left hand. Indeed, for many people, learning to throw underarm with one hand will do little to improve the accuracy of throwing over arm even with the same hand. In short, it seems that the cerebellum is not mediating a global recalibration of the visual representation of space, but rather each microzone of the cerebellum contributes to the calibration of a specific visuomotor transformation appropriate to a special class of actions, and it is only to the extent that one task shares microzones with another that there will be transfer of learning of prism adaptation from the first task to the second (Arbib et al., 1995). Let me say one more thing about the notion of coordinate frame. One cannot see in the brain the setting out of coordinates in a way that matches the x, y, z of Euclidean space, or even the local coordinates of some Riemannian manifold. However, one can experimentally show that the activity in a brain region is in a frame of reference in some more generalized sense. For example, when we reach for an object, then the actual commands to our muscles must bring the hand to a position relative to the shoulder that enables us to grasp the object. In this sense, one might say that the pattern of motor neuron firing which controls the lengths of the muscles provides a body-centered representation of the position of the object, even though we immediately note that this mapping via the dynamics of the muscle to an immense vector of motor neuron activity is very far from a simple x, y, z locating the object in Euclidean space relative to some reference point on the body. Similarly, when a pattern of activity in the world stimulates our retinas, then we may view the activity on the retina as in some sense providing a two-dimensional projection of the three-dimensional world which is then expressible in retinal coordinates, indicating what a bundle of activities (remember that the cells of the retina, like cells elsewhere, may be differentially sensitive to different features of the external world, rather than simply relaying light intensity). In between, one might ask, then, whether the firing of cells is in a retinal, a body-centered, or some other frame of reference. For example, if the firing of a cell in response to our sight of an object remains the same no matter how much we move our head or our eyes or our arms, then we might reasonably suggest that the representation is body-centered, especially if we can indeed correlate changes in neural firing with changes in the position of the object relative to the body. Conversely, if the object remains in one place relative to the body but the firing of the cells correlates with the position of the object on the retina, then we may speak of a retinocentric representation. Again, if the activity correlates with the position of the object relative to how our head is pointing, no matter how much our eyes move around, then we might speak of a head-centered representation. And so it goes. In each case, there is no implication that we have fixed a particular coordinate frame on a particular organ of the body and then see neural firing as simply reading off the x, y, z coordinates, or the r, coordinates, within that reference system. Rather, we have a feature map that represents the relationship between some feature array and the world around us, but does so in a way which matches the distribution of objects in the space around us to some distribution of activity across a whole Doing Mathematics about and with the Brain 37 series of neural arrays that does not encode space as such but rather encodes features of the world relevant to our interaction with the world. The internal representation itself is not important; what is important is that the representation can appropriately mediate our actions in such a way that we do indeed carry out the human extensions of the ability to catch our prey and avoid our enemies while not bumping into things too often.

Information Theory and Statistical Mechanics

Shannon’s information theory (Shannon, 1948) was designed to provide a theory of “Reliable Communication in the Presence of Noise” and showed quite remarkably that if one was sending messages down a noisy transmission line – a channel – then the statistics of noise in the channel defined a number called the capacity of the channel, and that given that capacity, so long as information was transmitted at any rate less then the capacity, then the messages could be encoded in such a way as to bring the probability of error in the decoding and reception of these messages down to as small a number as desired. The price to be paid, it was shown, was in the length of blocks that had to be coded to yield enough cross dependencies between the individual symbols in the message to allow errors in one part of the message to be compensated by dependencies on other parts of the message. In other words, it was not necessary to keep sending the same message again and again and again to achieve a desired reliability, thus slowing down the message indefinitely to get very small error rates. In "Brains, Machines and Mathematics", I presented a then recent study by other members of the McCulloch group, Shmuel Winograd and Jack Cowan (1963), who had applied these ideas to come up with a way of achieving neural networks which could function reliably in the presence of noise, an idea which had been introduced by and enthusiastically studied by Warren McCulloch. Interestingly, Shannon’s classical study links information theory strongly to statistical mechanics, with the entropy of a gas being related to the information content of a statistical ensemble of messages. Recently, the relationship of statistical mechanics to neural networks has been revived and had a great impact on mathematical analysis of formal neural networks. The most famous example is that of the Hopfield network, although Hopfield’s basic result (Hopfield 1982) can be seen as a special case of a result by Cohen and Grossberg (1983). Briefly, Hopfield showed that if one took a McCulloch-Pitts style neural network and imposed the condition that the connections were symmetric – in other words, the synaptic weight from neuron i to neuron j must equal the synaptic weight from neuron j to neuron i – then one could associate an “energy function” with the neural network with the property that if the network were constrained to have only one neuron chosen at random respond to its inputs at each timestep, then the neural network would necessarily either stay unchanged or make a transition which would reduce energy at each time step. I hasten to stress that this energy is not the energy of the neurons considered as physical systems composed of highly complex interacting molecules, but is rather a mathematical construct based on the informational interactions via the firing of these abstract neurons, inspired by an analogy with statistical mechanics. The basic model of a magnet in physics goes back to Ising (1929) who represented a magnet as a one-dimensional array of atomic magnets. He hoped to explain various phenomena about magnets, including hysteresis effects and the discovery by Pierre Curie that there was a crucial temperature such that a magnet could maintain imposed 38 Michael A. Arbib magnetization below this temperature but not above this temperature. This is a dramatic example of a phase transition, akin to the sharp transition from solid to liquid that H2O exhibits at 0 C, or the transition from liquid to gas that it exhibits at 100 C. Unfortunately, it can be shown that the one-dimensional model of Ising cannot support a phase transition, but Lars Onsager (1944), in work in mathematical physics that earned him the Nobel Prize, was able to show that a two-dimensional lattice patterned on the work of Ising could indeed exhibit phase transitions. Ferromagnetism was modeled by the fact that an atomic magnet would flip from the up state to the down state with a probability that increased to the extent that its neighbors were are already in that state; a random fluctuation was added which increased with temperature. Hopfield had a tremendous impact upon the mathematical theory of neural networks, achieving success which was the envy of those in the field who felt they had already achieved important results, because he made clear to the physics community that there was a rich analogy between the down and up states of the spins in a “spin glass”, as the general class of such models was known, and the non-firing and firing states of a neuron. This has led to a very rich set of studies in which physicists have applied the tools of statistical mechanics to prove a wide range of results about, for example, the pattern recognition capabilities of neural networks in response to a randomly generated ensemble of patterns, showing that if indeed the ensemble of patterns is random there will be a critical level, a phase transition of sorts, at which the ability for pattern recognition breaks down as the patterns become so numerous that they interfere with each other (see Hertz et al., 1991, for an excellent textbook account). Hopfield’s theory is one recognition of the fact that nonlinear systems may exhibit multiple fixed point equilibria. Pattern recognition can be modeled by setting the network into an initial state from which it will then move to equilibrium, and this equilibrium can be taken to encode the prototype pattern of which the pattern coded in the initial state was an exemplar. The convergence of the system guaranteed by Hopfield’s energy function can also be seen as allowing one to map optimization problems on to symmetrical networks. Given an optimization problem, one finds a similar mathematical function which meets the constraints required for the mathematical form of the energy function of a neural network; one then reads off the connection weights and thresholds of the neurons of the network with that energy function, and can then simply set that network going to reach a minimum of the energy function. Unfortunately, the optimum to which the process of energy minimization takes the network may be a local optimum rather then a global optimum. This led Hinton, Sejnowski and Ackley (1984) to adapt the technique of simulated annealing to the design of such networks, providing as the analog of temperature a noise term which would allow the network to escape from local minima and move on to more global minimum. Eventually, this temperature had to be lowered so that once it had approached the neighborhood of a global minimum the system would stay there rather than “bouncing out”. It could be proved that such a system would converge to a global minimum, but the catch was that it could only do so over an inordinately protracted annealing schedule. I will not go into further details here, but simply note that this provided another linkage between the study of neural networks and the mathematics of statistical mechanics. Doing Mathematics about and with the Brain 39

Nonlinear Dynamic Systems

I’ve already mentioned the fascinating challenges for mathematics posed by abstracting from the details of the Hodgkin-Huxley equations to develop qualitative theories of how they may yield a wide variety of patterns, not only firing that tracks the intensity of some input variable, but also spontaneous oscillation (we speak of pacemaker cells) and bursting behavior which can provide a very different type of code from the average firing rate code that has proved valuable in many studies of visual processing and cognitive function. Here, I want to briefly note some of the work that has looked at nonlinear dynamical systems at the level of neural networks, rather than at the level of neural compartments. However, I want for the moment to leave the world of statistics and focus in on the fact that a nonlinear dynamical system may not necessarily settle into a fixed point, but may exhibit two other forms of equilibrium trajectory – either a cycle, spoken of as a limit cycle, in which case the system undergoes regular oscillations, or what has been called a “strange attractor”, in which case the system exhibits chaotic behavior. Much interest has been taken not only in oscillations in isolation but in the way in which oscillators can be coupled in pattern generation. For example, in the Lamprey, a primitive form of fish, swimming is mediated by a traveling wave along the body, which serves to displace the water in such a way as to propel the fish forward. Intriguingly, this traveling wave can be exhibited in the activity of neurons in the isolated spinal cord kept alive in a dish, showing that there is a central pattern generator, which can determine the basic rhythm without any sensory input (Wallen and Williams, 1984). Of course, movement in the real world requires a variety of further controls, mediated both by the sensory receptors in the animal’s body, and by those long distance senses in the head. Bridging between water and land, the salamander will exhibit a traveling wave along its body while swimming, and a standing wave (appropriately!) when walking on land. As we also know, most quadrupeds can exhibit a variety of different gaits from walking to cantering to trotting to running, and so on. Thus, it becomes an interesting issue to understand how a chain of oscillators can be coupled in such a way as to exhibit these different phase relationships, which yield the overall control of the body’s movement, and a number of elegant theorems in nonlinear dynamics have addressed this fact . Another body of study of linkage of nonlinear oscillators has come from the suggestion (one which has not completely convinced me) that the way the brain solves the binding problem – of determining which activity in different parts of the brain should be bound together to represent features of a common object or segment of the scene – can be mediated by treating each feature detecting neuron as an oscillator, and then having those neurons which represent the same object or segment fire in phase with each other. Whatever the final outcome of the neural synchronization model for binding, it remains the fact that interesting theorems on the synchronization of coupled nonlinear oscillators has arisen from such questions (e.g., Collins and Stewart, 1993). Turning to strange attractors and chaos, it must be noted while it is easy to show that certain very simple equations – whether in ecosystems or neural networks – can generate chaotic behavior, it is still very difficult to prove unequivocally that a particular pattern of behavior exhibited by a neural network is indeed rather then random, and whether if it were chaotic there would be any functional significance to that fact. An intriguing claim for a 40 Michael A. Arbib possible role of chaos is in the Mauthner cell of the fish, which is involved both in the animal’s hearing and in its escape behavior. It is been claimed that this neuron will exhibit basically rhythmic behavior during hearing, but will switch to chaotic behavior during escape, with the result that the direction of escape cannot be predicted by the predator, something of great survival value to the fish. However, the same survival value would result from amplifying the effect of noise during periods of escape, rather then through deterministic chaos (Faure and Korn, 1997). Others have suggested a role for chaos in the olfactory system and even in cognitive behavior, but much remains to be done before these claims are settled one way or another. In any case, it is clear that the study of nonlinear dynamical systems has provided a great deal of insight into the behavior of neurons all the way from a patch of neuron described by Hodgkin-Huxley- like equations up to the behavior of very large networks, for example in understanding how the neural networks of the hippocampus may in some circumstances exhibit an adaptive rhythm called the theta rhythm and yet in other circumstances may exhibit the disturbing oscillations characteristic of an epileptic attack(Traub and Miles 1991). The brain has thus come to be understood in many ways through the application of nonlinear systems, and has provided the stimulus for new mathematical investigations.

Bringing in Plasticity: Hebbian, Error-Based and Reinforcement

Simple Perceptrons are neural networks with no loops, and such that no matter how many the layers of neurons there is synaptic plasticity only in the inputs to the layer which provided the network’s output. It is then a basic theorem that a rule called the perceptron learning rule can find a setting of weights for a given simple perceptron which would yield a desired input/output function specified by a teacher, so long as such a setting of weights existed. The set of functions for which such weights exist were the functions which were linearly separable as based on the input features provided by the earlier layers of the neural network. This mathematical result was the stimulus for a variety of mathematical activity which had initially negative (Minsky and Papert 1969) but eventually positive implications for the development of artificial neural networks as a parallel, adaptive computing technology, but also had much relevance for attempts to understand more broadly the different kinds of synaptic plasticity, and their localization in different parts of the brain. However, simple Perceptrons cannot learn general input-output functions, and it has subsequently been shown that it is enough to interpose one layer of adaptive neurons between the input and the output to be able to approximate arbitrary input-output functions, although now the degree of approximation will depend upon the number of neurons in the so called hidden layer. The work has been further generalized to the inclusion of loops in the network with the subsequent learning of sequential input-output relations. However, in general, the learning rules here are not guaranteed to yield an optimal approximation. Rather, they use the general notion of gradient descent to yield the mathematical derivation of rules which have “a good chance” of driving the synaptic weights to at least a locally optimal approximation to the given training set. Much work has gone into showing how these approximation methods may be made more successful and more efficient, in many cases by the addition of noise to the training process, as in our earlier discussion of the Boltzmann machine and simulated annealing. Doing Mathematics about and with the Brain 41

However, the perceptron learning rule was preceded by ten years by the learning rule presented by Donald Hebb (1949) in his book “The Organization of Behavior”. Hebb's rule was not based on neurophysiological investigation, but was part of a neuropsychological theory saying that “If people learn in this sort of way then this type of neural change might form a plausible substrate”. His rule was that if particular inputs are active in firing a neuron, then their synaptic weights will be increased, thus making it more likely that that same pattern of input will fire on future occasions. A number of people looked at this both conceptually and mathematically and concluded that the original rule would not work for the simple reason that any synapse might occasionally be engaged in the successful firing of the neuron, and thus all synapses would increase to a saturation level in which they would respond to any stimulus whatsoever. Thus a variety of methods were introduced to lower synaptic weights that were not involved in a successful firing as well as increasing those that were involved. The method was further extended to have neurons recruit some other neurons to learn patterns similar to their own, yet block many other neurons from firing for similar patterns. The result was that each neuron would make a statistical estimate of some feature of the environmental flux, with high frequency capturing more neurons than those with low frequency features. Again, this type of work has spawned a great deal of mathematical analysis linking the statistics of the ensemble of inputs to a network to the statistical distribution of features which will eventually emerge in the activity of the individual neurons (again, see Bishop 1995 for a review). The idea is that such unsupervised training, by making a statistical analysis of the most salient feature of a given environment, can simplify the result of any pattern recognition, by making it necessary to only learn patterns on a reduced set of important features, rather then on the arbitrary set of patterns of illumination of the retina. We see here the circle being closed with the ideas that motivated Pitts and McCulloch in their 1947 paper, which appeared just slightly before Hebb’s book. However, although we often think of Hebbian learning as being unsupervised, we can also think of it as being supervised if we provide it with a training input, much as Marr posited the climbing fiber to be for the Purkinje cells of the cerebellum (Marr, 1969). The teacher, by triggering the firing of the cell, can then guarantee the pattern of activity on the other fibers can be modified according the Hebbian rule, or one of its improved successors. Eventually, it was shown that synapses of certain cells in the hippocampus did indeed exhibit Hebbian – like synaptic plasticity (Bliss and Lømo, 1973) and since then much work has been done on establishing what variants on the Hebbian rule do indeed describe a variety of synaptic types in the brain. The basic distinction in learning rules made so far is that between supervised learning in which the synaptic weights change to get the output of the neuron to better match the output specified by a teacher; and unsupervised learning in which neurons extract statistical patterns. A third broad type of learning is known as reinforcement learning (Sutton and Barto, 1981). This corresponds to the case in which the teacher who provides explicit feedback is replaced by a critic who simply indicates to the network whether or not its overall behavior was relatively successful (positive reinforcement) or relatively unsuccessful (negative reinforcement). Although learning cannot proceed as rapidly with reinforcement learning as with explicit instruction from a teacher, there are many regions of the brain – including at least the basal ganglia and certain of the networks of the cerebral cortex – in which reinforcement learning (in some cases with dopamine acting as the reinforcement signal) appears to be the best description. Again, a variety of mathematical theorems are available to 42 Michael A. Arbib look at the way in which different forms of reinforcement learning can express different types of statistical analysis. However, I now want to distinguish a statistical pattern of learning which serves, for example, to build up our basic recognition of objects and actions, as well as our improved skills, from a non-statistical type of learning, namely episodic memory, which extracts specific events. In the movie “Memento”, the hero exhibits a very dramatic form of brain damage in which episodic memories can be formed only transiently but not transferred into long-term memory. He can remember events before the brain trauma that brought on the syndrome but cannot remember anything that has since happened prior to the current episode in his life. Thus, you might have a conversation with such a person and he would appear completely normal, yet if you were to leave the room for even a short while he wouldn’t have no recollection of your earlier conversation. However, suppose that each day you would return and play some game of skill with him. Each day he would have no recollection of having seen the game before and would play in response to your instructions. Yet, each day, despite having no recollection of having played the game before, his skill would increase – showing that he could learn new skills even though he could not lay down the memory of novel episodes. This leads us to view the brain’s many regions in terms not only of processing different sensory information or controlling different effectors, or mediating different types of cognitive activity, but also as providing different learning skills to the mix, with cerebellum perhaps most involved in the tuning and coordination of motor skills (though with some extensions to cognitive skills which abstract from motor skills), with the basal ganglia more involved in learning general strategies and plans through reinforcement learning; and the cerebral cortex exhibiting Hebbian learning in at least the sensory periphery and reinforcement learning elsewhere (see the chapters on these different brain systems in Arbib et al., 1998). Thus, in analyzing learning we seek to understand both those part of the brain which statistically build up pattern recognition or skilled behavior, and those parts of the brain which are more concerned with the extraction of information from particular episodes and interactions. We find that different parts of the brain have not only different patterns of input and output connections but also different cell types whose activity could presumably be understood through compartmental modeling, and finally different mechanisms of synaptic plasticity. An interesting mathematical challenge, then, is to relate these three factors – connectivity, morphology, and synaptic plasticity – to the type of statistical machine or episodic learning machine that this region constitutes as it contributes to the dynamic interaction of the brain’s many, many regions.

Bridging the Levels: From Behavior to Neurochemistry

My own work has mainly proceeded from the neuron up, seeking ways to understand patterns of activity in large networks of neurons, and in interacting networks of such networks, while I have usually left to others the task of proceeding down into the depths of more detail accounts of neural functions. However, I want to give an example of one place in which I descended from the heights of systems neuroscience to ponder awhile the details of neurochemistry. In working with Nicolas Schweighofer, I attempted to model the role of the cerebellum in a prism adaptation (Arbib et al., 1995), of adjusting the coordination between hand and eye to successfully throw a ball or dart at a target, even when the view of that target Doing Mathematics about and with the Brain 43 was displaced by prisms. Our approach to the cerebellum was based in part on the Cambridge thesis of David Marr (1969) who had suggested the notion that the output cells of the cerebellum, the Purkinje cells, should be viewed as little Perceptrons. The patterns of activity on the many parallel fibers entering a given Purkinje cell should be adjusted on the basis of a training signal provided by the single climbing fiber that ramified over the surface of the Purkinje cell. Marr hypothesized that the coincidence of a climbing fiber signal with a pattern of activity on the parallel fibers should serve to strengthen the activated synapses. By contrast, Jim Albus (1971) had developed a similar model but in which he postulated that such a coincidence would weaken the synapses, because he noted that the output of the Purkinje cells had been discovered by Masao Ito to have an inhibitory effect. Inspired by the Marr-Albus view, Ito then set to work to determine whether in fact coincident activation of a band of parallel fibers and a Purkinje fiber would weaken or strengthen the response of a Purkinje cell to parallel fiber stimulation. Eventually (e.g., Ito 1982), he showed that the Albus view was correct, and that such activation yielded long term depression (LTD) of synaptic efficacy. However, if one has only lowering of synaptic efficacy one runs the risk that all synapses will eventually decay to zero, and so Schweighofer and I, in our modeling, modified the formulation to have the effect of the climbing fiber predominantly take the form of LTD, while with the absence of such a training signal there would be slower long term depression (LTP). In addition, we placed an emphasis, lacking in the original Marr and Albus studies, on the notion that the Purkinje cells were not coding for individual movements, but rather performed as part of larger structures called microzones, which could modulate the parameters of various sensory-motor coordinations. The details of the modeling are outside the focus of this article, but the point I want to make is the following. The classic learning models were based on the idea of coincidence of the training signal and the pattern. However, in considering the example of dart throwing, we had to note that the activity of Purkinje cells in modulating the motor pattern generator for the throw had to be followed by the actual time to move the arm as the well as the time for the dart to fly to the target, and that the visual report of the error of the throw had then to be relayed back to the inferior olive which was the source of the climbing fibers error signal – a process which in all would take about 200 milliseconds. This, of course, ruled out the notion of simultaneity of the parallel fiber activity with the climbing fiber activity that evaluated it. We thus came up with a simple neurochemical model showing how it might be that the effect of parallel fiber activation of a synapse would yield a transient increase in a chemical signal called a second messenger in such a way that it would peak at about 200 milliseconds, and that the change in efficacy of a synapse would be a function not of the immediate firing of a parallel fiber synapse, but rather the eligibility of the synapse – the second messenger concentration in the synapse – at the time the climbing fiber had its effect upon the synapse. In other words, this model demonstrated very strongly the way in which the study of the brain at the systems level could lead one to probe right down to synaptic details which required new neurochemical modeling for their analysis. Of course, this new modeling also raises new questions for experimentalists to address. Having said this, I must note that this suggests that approximation must be an essential part of any mathematical attack on the brain. A human brain has perhaps 1011 neurons, and a typical neuron may have 104 synapses, 1015 synapses in all. If we were to model each synapse with the sort of complicated differential equations we used for our second messenger model, and if we were to model each patch of membrane with the four Hodgkin-Huxley equations 44 Michael A. Arbib appropriate for it, then even a single neuron would overwhelm our computing resources, and so instead we must ask what form of compact representation of synaptic plasticity best represents the neurochemical activity within it, something that we did with a simple eligibility curve peaking at 200 milliseconds. One must then ask how the placement of tens of thousands of synapses on a cell – perhaps as many as 200,000 in the case of the Purkinje cell – can be analyzed in a way which allows a good description of the overall pattern of activation to yield insight into the overall response of the single cell; and given this, we might seek in turn to see how relatively subtle representations of single cells might be further simplified when we wish to analyze the interactions of hundreds of thousands or even millions of such cells.

Self-Organization

I’ve already made it clear that a given network may change its connections through experience, and that the network may thus recognize particular patterns of activity in a way which reflects the training it has received. But synaptic plasticity plays a role not only in allowing the child or adult to learn, but also in the very wiring up of the nervous system. It is now well understood that even the overall pattern of connectivity, while resting on certain genetically specified biases, has most of its detail shaped by the way in which activity – whether internally generated or as a result of external stimulation – impinges upon the neurons and their synapses. Let me just offer one example. If we follow the retinal fibers back from the ganglion cells of the retina through the various layers of processing in the lateral nucleus of the thalamus and on to the visual cortex, and trace the terminations of the LGN fibers in layer 4B of the cortex, we discover that they are arranged in alternating stripes of fibers coming from the left eye and the right eye (LeVay et al., 1975). It would seem that this distribution into alternating stripes – the so-called ocular dominance columns – is a specific genetically specified function of the brain. However, Constantine-Paton and Law (1978) did a fascinating experiment with frogs that suggests to the contrary. In humans, monkeys and cats, all of which have forward facing eyes, the left half of the visual field projects via both eyes to the right brain and the right half of the visual field projects through both eyes to the left-brain. However, in the frog the fibers coming from the left eye are all directed to the right tectum and vice versa so that in the recipient layers for these retinal tectal fibers, there is no pattern of ocular dominance columns since there is only monocular input. However, Constantine- Paton and Law grafted a third eye in the forehead of the tadpole and let it grow as the frog matured. In this case the fibers from that third eye would go either to the left tectum or the right tectum and there would indeed form ocular dominance columns! In other words, we conclude that ocular dominance columns are not the result of explicit instructions to form one stripe here and one stripe there and so on, but rather reflect general patterns of interaction between fibers carrying similar or dissimilar signals. In other parts of the brain we may find segregations of different types of fibers or different types of cells arrayed not in terms of alternating stripes or alternating layers but in terms of blobs characteristic of one type dispersed in an overall matrix of cells of the other type. There has thus been a notable history of both computational modeling and mathematical analysis to look at the relationship between the genetically specified substrate and the activity that plays upon it in forming patterns of cellular connectivity. Doing Mathematics about and with the Brain 45

Intriguingly, Turing (1952) whose universal computing machine played such an important role in the development of the automata-theoretic approach to neural networks of McCulloch and Pitts also wrote a seminal paper which has made a fundamental contribution to the continuum description of neural pattern formation. In what I might call Turing’s stripe machine, he had a ring of cells in which there were two chemicals interacting according to the same reaction equation in each cell, and also diffusing to adjacent cells. Intuitively, one might expect that the diffusion would lead to a uniform concentration of each of the substances across all cells, but Turing was able to show that the ring would exhibit standing waves. Indeed, he was asked whether his model could explain the stripes of the zebra and he replied “The stripes are easy; it’s the horse part that I have trouble with”! In any case, many modelers of embryological pattern formation have built variants of Turing’s reaction-diffusion theory, and many mathematicians have studied pattern formation by proving theorems about, for example, the way in which Turing-like processes in a two-dimensional sheet can with different parameters yield the stripes of the tiger or the spots of the leopard. It is an intriguing question, then, for the mathematician to understand to what extent it is possible to find a mathematical framework in which one can deepen one’s insight both into Turing style pattern formation and the type of activity dependent formation of pattern in the developing nervous system. Let me just close with two empirical facts of some interest. One is that Mriganka Sur has shown that if fibers from the retina are blocked from reaching the visual cortex and instead routed to the auditory cortex, then by presumably Hebbian mechanisms they will indeed form orderly receptive fields in which auditory cells are responding to the sort of edge detectors that Hubert and Wiesel found in the visual cortex. One wonders what the sound of an edge may be to a ferret upon which such an experiment has been conducted. Another study by Michael Merzenich (Merzenich et al., 1991) looked at the effect on monkeys of binding two fingers in such a way that they would always receive the same stimulation as the monkey manipulated objects. He mapped the receptive fields of the area of somatosensory cortex that received sensory stimuli from the hand of the monkey. Before the two fingers were bound together he found five distinct areas with activity corresponding to touch on the four distinct fingers and the thumb; but after a period in which the monkey had had the two fingers bound together he found that fields for those two fingers had fused into one large field responsive to stimulation of either finger alone – thus showing that the mechanisms of reorganization are present even in the adult monkey and not simply in the developing infant.

Brain Evolution

There is a dramatic difference in brain size between chimpanzee and human, two primates of comparable body size. However, brains differ not only in size but also in terms of the variety of brain regions they contain, and in the size and structure of those brain regions. For example, the portion of SI (the primary somatosensory cortex which receives the basic input to cerebral cortex concerning the skin and muscles) in devoted to input from the animal's forepaws is much larger in the raccoon, an animal known for its delicate manual exploration of objects, than in the coatimundi, a less dexterous but in other ways similar animal.. 46 Michael A. Arbib

Krubitzer (1998) examines a variety of different patterns of neural architecture revealed in somatosensory cortex and visual cortex when stains show the location of terminations of different regions of the sensory periphery. At first it might be thought that the genetic code of these creatures explicitly laid out the pattern of segregation of sensory inputs in these areas. In particular, the alternation of "ocular dominance columns", i.e., the alternating bands in which the terminations of thalamic inputs from the left and right eyes are arranged in visual cortex in mammals with forward facing eyes, once seemed a primary example of such genetic coding, presumably selected for as an aid to binocular vision. But too facile adoption of this view is blocked by our earlier discussion of the results of Constantine-Paton and Law (1978). As a result of this dramatic result, and a variety of modeling studies as well as further experiments, we now believe that the general rules of sorting of synaptic connections will automatically yield such patterning. However, the type of patterning can be influenced by parameters of the growth process to yield stripes or blobs or random interspersal. In the case of the frog, these parameters just happen to fall in the "stripe range" for tectum. Selection for such parameters of neural connection may play a role in brain evolution. Butler and Hodos (1996) analyze a variety of factors involved in the course of brain evolution among vertebrates. Among these are

Formation of multiple new nuclei through elaboration or duplication; Regionally specific increases in cell proliferation in different parts of the brain; and Gain of some new connections and loss of some established connections.

These phenomena can be influenced by relatively simple mutational events that can thus become established in a population as the result of random variation. Selective pressures determine whether the behavioral phenotypic expressions of central nervous system organization produced by these random mutations increase their proportional representation within the population and eventually become established as the normal condition. Kaas (1993) and Krubitzer (1998) suggest that cortical fields might be modified or added to during evolution by selection for the type of parameters which affect connections between different regions of the brain. New possibilities for fusion and segregation of neural projections arise when new afferents reach a brain region. In short, selection at the level of parameters of self- organization may be crucial in determining the resultant neural geometry.

From Brain to Mind: Psychology and Linguistics

We are coming close to the end of this review of the many mathematics that are used to describe the brain. My aim in this article is to give a broad overview of a number of by now "classical" examples of the interaction of neuroscience and mathematics. A far broader and up-to-the-moment view of these interactions will be found in the hundreds of articles in the second edition of The Handbook of Brain Theory and Neural Networks which I am editing for publication by The MIT Press in late 2002. I have given you some sense of the mathematics involved in vision and motor control, and in learning and development. I have even briefly given you some indication of how such insights may help us better understand the evolution of the brain from species to species. But this last topic brings us to the question of what it is that is special about the human brain and how it got that way. We turn from basic patterns of Doing Mathematics about and with the Brain 47 perception and action to a deeper understanding of planning and thought, of language and consciousness. Each of these topics would require a separate article or more in itself, so let me be brief and discuss the complimentary approaches of rules and connectionism to modeling some aspects of human psychology and language, and then close with an evolutionary perspective on the human brain that leads us, perhaps, back to mathematics as a creative human activity. At the moment we have a patchwork of intriguing models of specific phenomena, rather then a deep mathematical framework for the study of these problems. The discrete approach to the study of language may be traced back to the theory of computation that provided the framework for Turing machines, finite automata and McCulloch-Pitts neural networks. An important strand leading up to Turing’s work had been the attempt by Hilbert and others including Frege and Russell to provide a formalization of mathematical proof, stripped of its semantics and seen as a process of string manipulation. From this perspective, the axioms of a logical theory are just strings of symbols, and the rules of inference are procedures to take a certain number of available strings of symbols and form a new string. A proof is then simply a sequence of strings such that each string of the sequence is either an axiom or is obtainable from earlier strings of the sequence by application of rules of inference. And then, finally, a theorem is whatever can be obtained as one of the strings of a proof. This formalist program was of course greatly affected by Gödel's incompleteness theorem which proved that no finite set of axioms with well defined rules of inference of this kind could yield as theorems all and only the truth of arithmetic. This itself is a very exciting result in the domain of mathematical logic. But, it also laid open the issue of looking at sets of strings of symbols as objects themselves worthy of mathematical study. In particular, one could look at the effects of restricting the set of axioms and the set of rules of inference to see what were the sets of symbols so generated and what their formal properties were. It was the genius of Noam Chomsky (1956) to construct a hierarchy of languages in this sense, with the languages acceptable by finite state machines (the regular languages) at one end, and the languages generable by Turing machines (the recursively enumerable sets) at the other. In between, he defined context free and context sensitive languages and asked the question whether English – or any other natural human language – could be viewed as a formal language at one of these levels of the hierarchy. In the end his answer was to show that the context free languages were not rich enough to provide the sort of strings of symbols that would occur as sequences of words in the sentences of English, and over the years has set himself the task of looking for a Universal Grammar, in the sense of a family of grammars obtainable by setting parameters in some master grammar with the property that the languages so generated contain all and only those sets of strings which can serve as human languages. Chomsky's approach places the study of language on a very abstract setting, not in terms of its use for thought or communication, nor in terms of its roots in the biology of humans interacting with their worlds, but simply as an abstract faculty for telling of given strings whether or not they are well formed. I am deeply critical of this view of language, but my point here is simply to note that it is one way of placing the study of human languages within a discrete framework in which there are many interesting theorems that have been proved and are available to be proved in future (Chomsky, 1957, 1965, 1981, 1995). Chomsky and his supporters claim that their attempt to characterize what it is that makes a set of strings eligible for consideration as a human language sheds light on the nature of the human brain. However, their use of the term “brain” is really empty. The extremely formalist 48 Michael A. Arbib approach is divorced from any knowledge of, or appreciation for the actual structure of neural nets in general or of the many regions and their interaction in the human brain in particular. The currently popular connectionist approach to language, which I will take up next, is at least a step in the direction of neural networks, but these are highly artificial neural networks rather then neural networks of the human brain. By this I mean that each study of language within the connectionist framework starts with an artificial neural network of relatively simple and regular structure and then seeks to determine whether some interesting phenomenon in the use of language can be seen as a result of learning a corpus of language data. In this case, there is no attempt to identify the neural network with a particular region of the brain, or the artificial neurons which constitute that network with actual biological neurons. Rather, the suggestion is that if one is to provide a computational model of language, then the adaptive highly distributed computing scheme offered by artificial neural networks provides far more psychological insight than can be obtained by construing the brain most unnaturally as a string rewriting (or, more generally, tree rewriting) system. A simple of example of the debate between the generative grammarians (as the followers of Chomsky are called) and the connectionists occurs in the study of the past tense of verbs. Many of the most frequently used verbs of English having an irregular past tense, while the less frequent verbs are more likely to have a regular past tense. The rule-based account of this phenomenon is simply to say that there is a list of exceptions to the regular past tense, and that one determines the past tense of a verb by seeing whether it is on this list of exceptions. If so, one reads off the irregular past tense; if not one forms the past tense by the regular method of adding "-ed". The catch with this is that if one is exposed to a nonsense verb which one has never met before, one will readily form the past tense, and if this new word is similar to an irregular verb, then one is more likely to give the parallel irregular past tense then the regular past tense – even though, of course, it could not appear on the list of exceptions. Human performance can be predicted by an alternative model which is simply to form an artificial neural network whose input is the stem of a verb and whose output is the past tense of the verb. By training the network on a variety of verbs one will eventually have it to the point where it will indeed tend to yield the regular past tense for many verbs and the correct irregular past tense for those irregular verbs to which it has been repeatedly exposed. The two points of interest here are the following: First, there is nothing in the neural network which can be seen as the explicit representation of a rule. Rather, it is simply that the pattern of synaptic weights is such that the input-output transformation will indeed add "-ed" for many input patterns. We say that the network does not implement a rule so much as it exhibits rule- like behavior. Second, as you might have expected, this manner of forming the past tense does indeed yield the psycholinguistically observed patterns of formation of irregular past tenses for many novel nonsense verbs (Daugherty & Seidenberg,). In this case, after many papers back and forth, I would say that the connectionists have won the round from the formalists. However, the connectionists have not won the war, because it is hard to imagine that the brain comes with separate boxes for each little package of grammatical knowledge, such as how to form past tenses. Thus, although the connectionists have been successful in providing artificial neural networks whose behavior provides us with substantial insights into psycholinguistic data, they as yet offer no systematic view of how it is that human languages have the structure they do. Moreover, these studies use artificial neural networks with no connection with data on the human brain. We know very well that lesions to certain parts of the human brain will often have Doing Mathematics about and with the Brain 49 considerable effects upon language, impairing comprehension, or production, or yielding a variety of semantic or phonological anomalies. In the end, if our concern is genuinely with the brain, and not simply with the formal aspects of language, one wants to know how the cooperative computation of many brain regions each composed of a bewildering but intriguing variety of neurons can allow us to not simply make correct grammatical judgments but also to use language to describe the world around us, or to ask questions and to mediate the social interactions and intellectual quests and economic transactions that constitute our lives. In particular, we must find it unlikely that there can be a sudden transition in evolution, which magically grafts a Chomskian string rewriting system atop a distributed network previously implicated in vision, action and memory. To the contrary, my colleague Giacomo Rizzolatti and I have recently advanced a hypothesis about how the neural mechanisms of language may have evolved from more basic mechanisms by which an animal can recognize another's actions (Rizzolatti and Arbib, 1998; Arbib, 2002).

The Brain Doing Mathematics

Whether or not you are a Platonist in your view of mathematics, you will agree that whatever mathematical system humans use must itself be represented in the brain, and so one can be intrigued by the question of “What is the mathematics of the brain doing mathematics?” What happens when one thinks about category theory or algebraic topology? I am not going to develop this particular theme in any detail here, but its consideration immediately raises a number of questions. One of the most famous books about the process of mathematical creation was Jacques Hadamard’s “Psychology of invention in the mathematical field”, and there he noted how much of mathematical creation does not take the form of calculation of rigorous formulas or the step by step extension of a proof of the kind that might be done today by a computer theorem proving system. Instead, the process of mathematical creation may include musical analogies or pictures or sensory motor experiences, which suggest the relationships between different concepts. This has two interesting features. One is that the way in which we humans operate as mathematicians is not merely a reflection of the constraints of the particular mathematical system in which we work at a given time, but also reflects the underlying sensory-motor structures of our brains. Indeed, one of the strange things about being a mathematician is that one often achieves psychological certainty about a particular theorem – even if not all the parametric details – long before one can find a proof. Indeed the formal structure of proofs often serves more a way of verifying one’s intuitions and then communicating them with others, rather than a path to mathematical discovery The main lesson that I want to draw from this initial discussion is that the representation of mathematics in the brain will look little like a “proof engine”. Rather, the neural representation of a proof engine will be a rather small part of that machinery which allows us to assimilate our mathematical understanding to our wealth of prior sensory-motor and social experience. Of course, this mathematical understanding will itself change the way in which our brain represents the world, so that for most mathematicians and physicists the notion of an infinite dimensional space is quite "intuitive", and we can happily think of results in arbitrarily many finite dimensions as extending into Hilbert spaces or Banach spaces or some other abstraction which builds upon and then greatly enriches our sensory-motor experience. 50 Michael A. Arbib

For another example, concepts of Newtonian mechanics which were extremely hard to assimilate until well into the middle of the 20th century have become far more intuitive for those who have grown up on moon shots and orbiting satellites. The changing concepts of the world mastered by a few have led to new physical realities which in turn reshape our intuition and prepare us to better understand the new world in which we live at the same time as we prepare ourselves to develop new scientific and technological insights. This is all very well. At some level of analysis our brains work in a way which mediates between our everyday physical and social interactions and an expanding universe of symbolic and technological insights. And yet there is a sense in which the brain itself while changing even as you read my words nonetheless remains the same. What, then, is the mathematics for describing the basic nature of the brain that underlies our ability to learn; what is the mathematics required to describe that process of learning; and to what extent can we find the level of description of the brain’s operation which somehow can leave many of these details behind to better yield a level of abstraction we somehow would expect to be appropriate to make contact with our language and our science and our mathematics?

Against Platonism

The Platonist view of mathematics is that mathematical objects exist in some abstract realm, awaiting their discovery by the appropriate community of mathematicians. However, my own experiences predispose me against this view. For a number of years in the 1970’s, I worked with Ernie Manes to develop a categorical theory of systems, where the input of a finite automaton or a dynamical system was represented not as a set or a linear space but as a functor which could transform the state space into a new space upon which the next state function could operate. After some time, we turned our attention to applying similar categorical techniques to the development of a semantics for programming languages. Ernie recognized that for the set of program building operations we were concerned with, a mathematical structure called an additive category provided the perfect framework so long as the programs were non-deterministic. However, of course, a great deal of computer science is concerned with deterministic programs. While visiting Montreal, I wrestled with the discrepancy between the intuition I had about programming in languages such as Pascal and the mathematical intuitions about additive categories I had gained from talking to Ernie Manes. In the end, I understood how to tighten part of the definition of an additive category to come up with the new notion of a partially-additive category. The initial formulation had some bugs in it but the overall intuition was correct, and over the period of about a year we were able to come to a satisfactory axiomatization (Arbib and Manes, 1980) that allowed us to prove many theorems about program semantics in a way in which, to our minds if not to the general approval of others, seemed to give us a much richer mathematical understanding of the then popular lattice theoretic semantics which had been developed by Dana Scott. The general setting allowed us to view a number of the apparently ad hoc assumptions made in Scott’s theory as being the particular consequences of our generalized insights when applied to the appropriate partially-additive categories of certain lattices. Now a resolute Platonist might argue that the notion of a partially-additive category was there all along waiting to be discovered, but to me it seems to be a genuine invention, one that would not have been possible without the confluence of three different strands: (i) the Doing Mathematics about and with the Brain 51 development of category theory from the early work of Eilenberg and MacLane which in due course yielded the particular concept of an additive category; (ii) the blending of my intuition about automata and systems with Ernie Manes’s understanding of the category-theoretic approach to algebraic structures that led to our theory of machines in that category; and then (iii) the confrontation of this body of knowledge with intuitions that were only made possible by the 20th century development of explicit recursive descriptions of computational programming languages, and the emergence of a related body of work on their semantics which was in turn rooted in a tradition in mathematical logic stretching back to the work of Frege at the end of the 19th century. I may seem at this stage to be wandering very far from my stated topic of “The Many Mathematics of the Brain”, but in fact I have circled back here to a question of great current debate, and one in which as noted above I myself am actively engaged, namely the evolution of brain mechanisms for language (Rizzolatti and Arbib, 1998; Arbib, 2002).. Thinking about this problem indeed unites my earlier discussion of the way in which our mathematical activity is integrated with prior cognitive processes in our brains; and the present theme of the historical invention of new mathematical structures. If we look at languages as used by humans today, we can see that there is much in common between them. All have something like verbs to describe actions, and all have something like nouns to describe objects. Relations between actors and objects can be expressed in any language, although using somewhat different grammatical tricks to do so. But the structure of languages can also vary in surprising ways. For almost all of us, the notion of an adjective seems like a totally crucial building block for language. Yet, in some languages, there are only a few words-like “light” “and “dark”, or “light” and ”heavy” that are treated in a fashion similar to the way in which English treats adjectives. Instead, in such languages, the information that we would convey by adjectives is conveyed by a quite different structures such as the equivalent of saying “This apple has a redness” (Dixon, 1997) In short then, if a linguist really tries to understand the structure of the world’s languages, he will on the one hand be attracted by the search for universals, for properties of language that are exhibited by most if not all languages, while at the same time being intrigued by some major differences. If they have a cataloging state of mind, they will look for a descriptive framework, which captures both the universals, and catalogues the key differences, perhaps as a result coming up with several “parameters” whose values can specify the broad properties of the grammar of a specific language. So far, so good, and Noam Chomsky achieved the status of perhaps the greatest linguist of the 20th century by his attempt to systematically create a framework in which the description of a vast range of grammatical properties of language could be made. This aspect of his work is not without its critics, especially since the grammatical framework he offers has shifted dramatically on several occasions. But here I want to speak specifically of his claim that this “Universal Grammar” which is able to describe both the universals and the variations of the grammar of different languages is not merely a creation of the linguist, but is actually embedded within the structure of the human brain, and is perhaps the essential element of what makes us human. The process of learning the grammar of its language by a child is, on this view, not really learning so much as “parameter setting” – the child quickly recognizes, it is claimed, whether subjects do or do not precede verbs, for example, in the language that they hear and so – “click” – another parameter is set as the Universal Grammar embedded in the child’s brain becomes more and more specialized until it conforms with the grammar of the child’s language community. 52 Michael A. Arbib

On this account, the possibility of having language with a very small stock of adjectives or an open-ended stock of adjectives was already anticipated in the evolution of the human brain, and it is this that provides a child’s ability to learn a language of either kind. To me, this seems absurd, and elsewhere I have argued at length against it. Instead, I would see language as something that has been discovered piece by piece starting from basic forms of communication about a set of species-specific social occasions and interactions and slowly through a process of individual discovery and social diffusion and even the vulgar operation of fashion yielding more or less coherent languages. These could themselves develop in ways which vary from community to community just as we have seen Latin flower into the wide range of Romance languages that we know today. In this view, I would ask what it is that makes the brain “language-ready”, what it is that provides us the ability to learn how to name things and actions in relationships and – in rare case – to come up with new words for new concepts of our own. From this point of view, I would even argue that mathematics is in some sense a continuation of language, as some of us have learned to think thoughts that cannot be captured within the particular lexicon or semantics of our native languages, and so have learned, or even helped create, new notions with new expressive power. I think of the invention of the matrix notation in the 19th century or the set and arrow notation in the 20th as greatly increasing our power to produce complex sentences.

References

Albus, J. S., 1971, A theory of cerebellar functions, Mathematical Bioscience,10: 25-61. Amari, S., and Arbib, M. A., 1977, Competition and cooperation in neural nets, in Systems Neuroscience (J. Metzler, Ed.), Academic Press, pp. 119-165. Amari, S., and Nagaoka, H., 2000, Methods of Information Geometry, AMS and Oxford University Press, New York. Arbib, M.A., 1964, Brains, Machines and Mathematics, by McGraw-Hill: New York. Arbib, M.A., 1969,Theories of Abstract Automata, Prentice-Hall: Englewood Cliffs, N. J. Arbib, M.A., 1972, The Metaphorical Brain: An Introduction to Cybernetics as Artificial Intelligence and Brain Theory, Wiley-Interscience: New York. Arbib, M.A., 1987, Brains, Machines and Mathematics Second Edition. Arbib, M.A., 1989, The Metaphorical Brain 2: Neural Networks and Beyond, Wiley- Interscience. Arbib, M.A., 2000, Warren McCulloch's Search for the Logic of the Nervous System, Perspectives in Biology and Medicine, 43:193-216. Arbib, M.A., 2001a, NeuroInformatics: The Issues, in Computing the Brain: A Guide to Neuroinformatics, (M.A. Arbib, and J.S. Grethe, Eds.), San Diego: Academic Press, pp.3- 28. Arbib, M.A., 2001b, Modeling the Brain, in Computing the Brain: A Guide to Neuroinformatics, (M.A. Arbib, and J.S. Grethe, Eds.), San Diego: Academic Press, pp.43-69. Arbib, M.A., 2002, The Mirror System, Imitation, and the Evolution of Language, in Imitation in Animals and Artifacts, (Chrystopher Nehaniv and Kerstin Dautenhahn, Editors), The MIT Press, to appear. Doing Mathematics about and with the Brain 53

Arbib, M.A., and Didday, R.L., 1971, The organization of action-oriented memory for a perceiving system. I. The basic model. J. Cybernet. 1:3-18. Arbib, M.A., and House, D.H., 1987, Depth and Detours: An Essay on Visually-Guided Behavior, in Vision, Brain, and Cooperative Computation, (M.A. Arbib and A.R. Hanson, Eds.), pp.129-163, A Bradford Book/MIT Press, Cambridge, MA. Arbib, M.A., and J.-P. Ewert, Eds., 1991, Visual Structures and Integrated Functions, Research Notes in Neural Computing 3, Springer-Verlag. Arbib, M.A., and Manes, E.G., 1980, Partially Additive Categories and Flow-Diagram Semantics, J. Algebra 62:203-227. Arbib, M.A., Bischoff, A., Fagg, A. H., and Grafton, S. T., 1994, Synthetic PET: Analyzing Large-Scale Properties of Neural Networks, Human Brain Mapping, 2:225-233. Arbib, M.A., Érdi, P. and Szentágothai, J., 1998, Neural Organization: Structure, Function, and Dynamics, Cambridge, MA: The MIT Press. Arbib, M.A., Schweighofer, N., and Thach, W. T., 1995, Modeling the Cerebellum: From Adaptation to Coordination, in Motor Control and Sensory-Motor Integration: Issues and Directions, (D. J. Glencross and J. P. Piek, Eds. ), Amsterdam: North-Holland Elsevier Science, pp. 11-36. Bishop, C.M., 1995, Neural Networks for Pattern Recognition, Oxford University Press. Bliss, T.V.P. and Lømo, T., 1973, Long-lasting potentiation of synaptic transmission in the dentate area of the anaesthetized rabbit following stimulation of perforant path, J. Physiol. 232: 331-356. Butler, A.B., & Hodos, W., 1996, Comparative Vertebrate Neuroanatomy: Evolution and Adaptation. John Wiley & Sons, New York. Chomsky, N., 1956, Three Models for the Description of Language. I.R.E. Transactions on Information Theory, IT-2, 113-124. Chomsky, N., 1957, Syntactic Structures, The Hague: Mouton Chomsky, N., 1965, Aspects of the Theory of Syntax, Cambridge, MA: The MIT Press. Chomsky, N., 1981, Lectures on Government and Binding: The Pisa Lectures. Foris, Dordrecht. Chomsky, N., 1995, The Minimalist Program. Cambridge, MA: MIT Press. Cohen, M., and Grossberg, S., 1983, Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans. on Systems, Man and Cybernetics, 13:815-826. Collins, J.J., and Stewart, I.N., 1993, Coupled nonlinear oscillators and the symmetries of animal gaits, J. Nonlin. Sci., 3:349-392. Constantine-Paton, M. , and Law. M.I., 1978. Eye-specific termination bands in the tecta of three-eyed frogs. Science 202:639-641. Corbacho, F.J., and Arbib, M.A., 1995, Learning to Detour, Adaptive Behavior, 4:419-468. Daugherty, K., & Seidenberg, M.S., 1992, Rules or connections? The past tense revisited. Proceedings of the 14th annual meeting of the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Dev, P., 1975, Perception of Depth Surfaces in Random-dot Stereograms: A Neural Model, Int. J. Man-Machine Studies, 7:511-528. Didday, R.L., 1970, The Simulation and Modeling of Distributed Information Processing in the Frog Visual System. Ph.D. Thesis, Stanford University. 54 Michael A. Arbib

Didday, R.L., 1976, A model of visuomotor mechanisms in the frog optic tectum, Math. Biosci. 30: 169-180. Dixon, R.M.W., 1997, The Rise and Fall of Languages, Cambridge: Cambridge University Press. Faure, P. & Korn, H., 1997 A nonrandom dynamic component in the synaptic noise of a central neuron. Proc. Nat. Aca. Sc. USA, 94:6506-6511. Fukson, O.I., Berkinblit, M.B., and Feldman, A.G. (1986). The spinal frog takes into account the scheme of its body during the wiping reflex, Science, 209, 1261-1263. Hebb, D.O., 1949, The Organization of Behavior, John Wiley & Sons. Hertz, J., Krogh. A., and Palmer, R.G., 1991, Introduction to the Theory of Neural Computation, Santa Fe Institute Studies in the Sciences of Complexity, Addison-Wesley. Hinton, G.E., Sejnowski, T.J., and Ackley, D.H., 1984, Boltzmann Machines: Constraint Satisfaction Networks that Learn, Cognitive Science, 9:147-169. Hodgkin, A.L., and Huxley, A.F., 1952, A Quantitative Description of Membrane Current and its Application to Conduction and Excitation in Nerve, J.Physiol. London, 117:500- 544. Hopfield, J., 1982, Neural Networks and Physical Systems with Emergent Collective Computational Properties, Proc. Nat. Acad. Sci., USA, 79:2554-2558. Hubel, D.H. and T.N. Wiesel, 1977, Functional Architecture of Macaque Monkey Cortex, Proc. Royal Society of London B, 198:1-59. Hubel, D.H. and Wiesel, T.N., 1962, Receptive fields, binocular and functional architecture in the cat's visual cortex. J. Physiol., 160: 106-154. Ising, E., 1925, Z. Physik, 31:253. Ito, M., 1982, Mechanisms of motor learning. In: Competition and Cooperation in Neural Nets (Amari, S. and Arbib, M.A., Eds.), Lecture Notes in Biomathematics, Springer- Verlag. Kaas, J., 1993, Evolution of multiple areas and modules within the cortex. Perspectives on Developmental Neurobiology, 1:101-107. Koch, C., 1989, Seeing chips: Analog VLSI circuits for computer vision, Neural Computation, 1:184-200. Koch, C., and Segev, I., 1998, Methods in Neuronal Modeling: From Ions to Networks, 2nd Edition, Cambridge, MA: The MIT Press. Krubitzer, L., 1998, Constructing the Neocortex: influences on the pattern of organization in mammals, in Brain and Mind: Evolutionary Perspectives, (M.S. Gazzaniga and J. S. Altman, Eds.), Strasbourg: HFSP, pp.19-34. Lettvin, J. Y., Maturana, H., McCulloch, W. S. and Pitts, W. H., 1959, What the frog's eye tells the frog brain, Proc. IRE. 47: 1940-1951. LeVay, S., Hubel, D.H., and Wiesel T.N., 1975, The pattern of ocular dominance columns in macaque revealed by a reduced silver stain, J. Comp. Neurol.159:559-576. Littlewood, J.E., 1953, A Mathematician's Miscellany, London: Methuen and Co. Ltd. Manes, E. G., and Arbib, M.A., 1986, Algebraic Approaches to Program Semantics, Springer-Verlag. Marr, D. and Poggio, T., 1977, Cooperative computation of stereo disparity, Science 194:283- 287. Marr, D. and Poggio, T., 1979, A computational theory of human stereopsis, Proc. Roy. Soc. London Ser. B, 204:301-328. Doing Mathematics about and with the Brain 55

Marr, D., 1969, A theory of cerebellar cortex. J. Physiol (London) 202: 437. McCulloch, W.S., 1961, What is a number that a man may know it, and a man that he may know a number?; reprinted in W.S. McCulloch, 1965, Embodiments of Mind, Cambridge MA: The MIT Press. McCulloch, W.S., and Pitts, W.H., 1943, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5: 115-133; reprinted in W.S. McCulloch, 1965, Embodiments of Mind, Cambridge MA: The MIT Press. Merzenich, M.M., Recanzone, G.H., Jenkins, W.M., and Nudo, R.J., 1991, How the brain functionally rewires itself, , in Natural and Artificial Computation (M.A. Arbib and J.A. Robinson, Eds.), The MIT Press, pp.177-210. Minsky, M.L., and Papert, S., 1969, Perceptrons, An Essay in Computational Geometry, The MIT Press. Nicod, J, 1970, Geometry and Induction, (Jean Bell and Michael Woods, Translators), University of California Press, Berkeley, CA. Onsager, L., 1944, Phys. Rev., 65:117. Piaget, J., 1971, Biology and Knowledge, Edinburgh University Press. Pitts, W.H., and McCulloch, W.S., 1947, How we know universals, the perception of auditory and visual forms. Bull. Math. Biophys., 9:127-147. Poggio, T., Torre, V., and Koch, C., 1985, Computational vision and regularization theory, Nature, 317:314-319. Rizzolatti, G, and Arbib, M.A., 1998, Language Within Our Grasp, Trends in Neuroscience, 21(5):188-194. Room, T. G., 1967, A Background (Natural, Synthetic and Algebraic) to Geometry, Cambridge University Press, Cambridge. Rosenblatt, F., 1958, The perceptron: A probabilistic model for information storage and organization in the brain, Psychol. Rev., 65:386-408. Shannon, C.E., 1948, The mathematical theory of communication, Bell System Tech. J., 27:379-423,623-656. (Reprinted with an introductory essay by W. Weaver as C.E. Shannon and W. Weaver, 1949, Mathematical Theory of Communication, University of Illinois Press.) Sutton, R. S., and Barto, A. G., 1981, Toward a modern theory of adaptive networks: Expectation and prediction. Psychol. Rev. 88: 135-170. Traub, R.D. and Miles, R., 1991, Neuronal Networks of the Hippocampus, Cambridge Univ. Press. Turing, A. M., 1952, The chemical basis of morphogenesis, Phil. Trans. Roy. Soc. Lond., B237: 37-72. Turing, A.M., 1936, On Computable Numbers with an Application to the Entscheidungsproblem. Proc. London Math. Soc. Ser. 2, 42: 230-265. Wallen, P., Williams T.L., 1984, Fictive locomotion in the lamprey spinal cord in vitro compared with swimming in the intact and spinal animal, J. Physiol. 347:225-239. Wiener, N., 1948, Cybernetics: or Control and Communication in the Animal and the Machine, The Technology Press and Wiley (Second Edition, The MIT Press, 1961). Winograd, S., and Cowan, J.D., 1963, Reliable Computation in the Presence of Noise, The MIT Press.

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 57-132 © 2010 Nova Science Publishers, Inc.

Chapter 4

PHYSICS OF LIFE FROM FIRST PRINCIPLES

Michail Zak Jet Propulsion Laboratory California Institute of Technology Pasadena, CA 91109

Abstract

The objective of this work is to extend the First Principles of Newtonian mechanics to include modeling of behavior of Livings. One of the most fundamental problems associated with modeling life is to understand a mechanism of progressive evolution of complexity typical for living systems. It has been recently recognized that the evolution of living systems is progressive in a sense that it is directed to the highest levels of complexity if the complexity is measured by an irreducible number of different parts that interact in a well-regulated fashion. Such a property is not consistent with the behavior of isolated Newtonian systems that cannot increase their complexity without external forces. Indeed, the solutions to the models based upon dissipative Newtonian dynamics eventually approach attractors where the evolution stops, while these attractors dwell on the subspaces of lower dimensionality, and therefore, of the lower complexity. If thermal forces are added to mechanical ones, the Newtonian dynamics is extended to the Langevin dynamics combining both mechanics and thermodynamics effects; it is represented by stochastic differential equations that can be utilized for more advanced models in which randomness stands for multi-choice patterns of behavior typical for living systems. However, even those models do not capture the main property of living systems, i.e. their ability to evolve towards increase of complexity without external forces. Indeed, the Langevin dynamics is complemented by the corresponding diffusion equation that describes the evolution of the distribution of the probability density over the state variables; in case of an isolated system, the entropy of the probability density cannot decrease, and that expresses the second law of thermodynamics. From the viewpoint of complexity, this means that the state variables of the underlying system eventually start behaving in a uniform fashion with lesser distinguished features, i.e. with lower complexity. Reconciliation of evolution of life with the second law of thermodynamics is the central problem addressed in this paper. It is solved via introduction of the First Principle for modeling behavior of living systems. The structure of the model is quantum-inspired: it acquires the topology of the Madelung equation in which the quantum potential is replaced with the information potential. As a result, the model captures the most fundamental property of life: the progressive evolution, i.e. the ability to evolve from disorder to order without any external interference. The mathematical structure of the model can be obtained from the Newtonian equations of motion (representing the motor dynamics) coupled with the 58 Michail Zak

corresponding Liouville equation (representing the mental dynamics) via information forces. The unlimited capacity for increase of complexity is provided by interaction of the system with its mental images via chains of reflections: What do you think I think you think…?. All these specific non-Newtonian properties equip the model with the levels of complexity that match the complexity of life, and that makes the model applicable for description of behaviors of ecological, social and economics systems.

“Life is to create order in the disordered environment against the second law of thermodynamics.”

E. Schrödinger, 1945

1. Introduction

It does not take much knowledge or experience to distinguish a living matter from inanimate in day-to-day situations. Paradoxically, there is no formal definition of life that would be free of exceptions and counter-examples. There are at least two reasons for that. Firstly, many complex physical and chemical phenomena can mimic prints of life so closely that special methods are required to make the distinction. Secondly, extraterrestrial life, in principle, can be composed of components which are fundamentally different from those known on Earth. Therefore, the main objective of this paper is to formulate some invariants of life in terms of phenomenology of behavior. Modeling of life can be performed on many different levels of description. While there is no universal agreement on the definition of life, scientists generally accept that the biological manifestation of life exhibits the following phenomena (Wikipedia): Organization - Living things are composed of one or more cells, which are the basic units of life. Metabolism - Metabolism produces energy by converting nonliving material into cellular components (synthesis) and decomposing organic matter (catalysis). Living things require energy to maintain internal organization (homeostasis) and to produce the other phenomena associated with life. Growth - Growth results from a higher rate of synthesis than catalysis. A growing organism increases in size in all of its parts, rather than simply accumulating matter. The particular species begins to multiply and expand as the evolution continues to flourish. Adaptation - Adaptation is the accommodation of a living organism to its environment. It is fundamental to the process of evolution and is determined by the organism's heredity as well as the composition of metabolized substances, and external factors present. Response to stimuli - A response can take many forms, from the contraction of a unicellular organism when touched to complex reactions involving all the senses of higher animals. A response is often expressed by motion: the leaves of a plant turning toward the sun or an animal chasing its prey. Reproduction - The division of one cell to form two new cells is reproduction. Usually the term is applied to the production of a new individual (asexually, from a single parent organism, or sexually, from at least two differing parent organisms), although strictly speaking it also describes the production of new cells in the process of growth. In this paper, we will address only one aspect of Life: a biosignature, i.e. mechanical invariants of Life, and in particular, the geometry and kinematics of behavior of Livings disregarding other aspects of Life. By narrowing the problem in this way, we will be able to extend the mathematical formalism of physics’ First Principles to include description of Physics of Life from First Principles 59 behavior of Livings. In order to illustrate the last statement, consider the following situation. Suppose that we are observing trajectories of several particles: some of them physical (for instance, performing a Brownian motion), and others are biological (for instance, bacteria), Figure 1. Is it possible, based only upon the kinematics of the observed trajectories, to find out which particle is alive? The test for the proposed model is to produce the correct answer.

Figure 1. Which particle is alive?

Thus, the objective of this paper is to introduce a dynamical formalism describing the behavior of Livings. All the previous attempts to develop models for so called active systems (i.e., systems that possess certain degree of autonomy from the environment that allows them to perform motions that are not directly controlled from outside) have been based upon the principles of Newtonian and statistical mechanics, (A. S. Mikhailov, 1990). These models appear to be so general that they predict not only physical, but also some biological and economical, as well as social patterns of behavior exploiting such fundamental properties of nonlinear dynamics as attractors. Not withstanding indisputable successes of that approach (neural networks, distributed active systems, etc.) there is still a fundamental limitation that characterizes these models on a dynamical level of description: they propose no difference between a solar system, a swarm of insects, and a stock market. Such a phenomenological reductionism is incompatible with the first principle of progressive biological evolution associated with Darwin (I. Prigogine, 1980, H. Haken, 1988). According to this principle, the evolution of living systems is directed toward the highest levels of complexity if the complexity is measured by an irreducible number of different parts which interact in a well- regulated fashion (although in some particular cases deviations from this general tendency are possible). At the same time, the solutions to the models based upon dissipative Newtonian dynamics eventually approach attractors where the evolution stops while these attractors dwell on the subspaces of lower dimensionality, and therefore, of the lower complexity (until a “master” reprograms the model). Therefore, such models fail to provide an autonomous progressive evolution of living systems (i.e. evolution leading to increase of complexity), Figure 2. Let us now extend the dynamical picture to include thermal forces. That will correspond to the stochastic extension of Newtonian models, while the Liouville equation will 60 Michail Zak extend to the so called Fokker-Planck equation that includes thermal force effects through the diffusion term. Actually, it is a well-established fact that evolution of life has a diffusion- based stochastic nature as a result of the multi-choice character of behavior of living systems. Such an extended thermodynamics-based approach is more relevant to model of living systems, and therefore, the simplest living species must obey the second law of thermodynamics as physical particles do. However, then the evolution of living systems (during periods of their isolation) will be regressive since their entropy will increase (I. Prigogine, 1961), Figure 3. As pointed out by R. Gordon (1999), a stochastic motion describing physical systems does not have a sense of direction, and therefore, it cannot describe a progressive evolution. As an escape from this paradox, Gordon proposed a concept of differentiating waves (represented by traveling waves of chemical concentration or mechanical deformation) which are asymmetric by their nature, and this asymmetry creates a sense of direction toward progressive evolution. Although the concept of differentiating waves itself seems convincing, it raises several questions to be answered: Who or what arranges the asymmetry of the differentiating waves in the “right” direction? How to incorporate their formalism into statistical mechanics providing progressive evolution without a violation of the second law of thermodynamics? Thus, although the stochastic extension of Newtonian models can be arranged in many different ways (for instance, via relaxation of the Lipschitz conditions, (M. Zak, 1992), or by means of opening escape-routes from the attractors), the progressive evolution of living systems cannot be provided.

Figure 2. Dissipative dynamics.

The limitations discussed above have been addressed in several publications in which the authors were seeking a “border line” between living and non-living systems. It is worth noticing that one of the “most obvious” distinctive properties of the living systems, namely, their intentionality, can be formally disqualified by simple counter-examples; indeed, any mechanical (non-living) system has an “objective” to minimize action (the Hamilton principle) as well as any isolated diffusion-based stochastic (non-living) system has an “objective” to maximize the entropy production (“The Jaynes Principle,” H. Haken, 1988). The departure from Newtonian models via introduction of dynamics with expectations and Physics of Life from First Principles 61 feedback from future has been proposed by B. Huberman and his associates (B. Huberman, 1988). However, despite the fact that the non-Newtonian nature of living systems in these works was captured correctly, there is no global analytical model that would unify the evolution of the agent’s state variables and their probabilistic characteristics such as expectations, self-images etc.

Figure 3. Dynamics with diffusion.

Remaining within the framework of dynamical formalism, and based only upon kinematics of the particle, we will associate life with the inequality that holds during, at least, some time interval

(1) where

(1a)

Here H is entropy of the particle, V is the particle velocity, and ρ is the probability density characterizing the velocity distribution. Obviously, the condition (1) is only sufficient, but not necessary since even a living particle may choose not to exercise its privilege to decrease disorder. It seems unreasonable to introduce completely new principles for Living’s behavior since Livings belong to the Newtonian world: they obey the First Principles of Newtonian mechanics, although these Principles are necessary, but not sufficient: they should be complemented by additional statements linked to the Second Law of thermodynamics and enforcing Eq. (1). One of the main objectives of this paper is to extend the First Principles of classical physics to include phenomenological behavior on living systems, i.e. to develop a new mathematical formalism within the framework of classical dynamics that would allow one to capture the specific properties of natural or artificial living systems such as formation of the collective mind based upon abstract images of the selves and non-selves, exploitation of this collective mind for communications and predictions of future expected characteristics 62 Michail Zak of evolution, as well as for making decisions and implementing the corresponding corrections if the expected scenario is different from the originally planned one. The approach is based upon our previous publications (M. Zak, 1999a, 2003, 2004, 2005a, 2006a, 2007a, 2007b and 2007c) that postulate that even a primitive living species possesses additional non-Newtonian properties which are not included in the laws of Newtonian or statistical mechanics. These properties follow from a privileged ability of living systems to possess a self-image (a concept introduced in psychology) and to interact with it. The proposed mathematical formalism is quantum-inspired: it is based upon coupling the classical dynamical system representing the motor dynamics with the corresponding Liouville equation describing the evolution of initial uncertainties in terms of the probability density and representing the mental dynamics. (Compare with the Madelung equation that couples the Hamilton-Jacobi and Liouville equations via the quantum potential.)The coupling is implemented by the information-based supervising forces that can be associated with the self-awareness. These forces fundamentally change the pattern of the probability evolution, and therefore, leading to a major departure of the behavior of living systems from the patterns of both Newtonian and statistical mechanics. Further extension, analysis, interpretation, and application of this approach to complexity in Livings and emergent intelligence will be addressed in this paper. It should be stressed that the proposed model is supposed to capture the signature of life on the phenomenological level, i.e., based only upon the behavior, and therefore, it will not include a bio-chemical machinery of metabolism. Such a limitation will not prevent one from using this model for developing artificial living systems as well as for studying some general properties of behavior of natural living systems. Although the proposed model is supposed to be applicable to both open and isolated autonomous systems, the attention will be concentrated upon the latter since such properties of Livings as free will, prediction of future, decision making abilities, and especially, the phenomenology of mind, become more transparent there. It should be emphasized that the objective of the proposed approach is not to overperform alternative approaches to each particular problem (such approaches not only exist, but they may be even more advanced and efficient), but rather to develop a general strategy (by extending the First Principles of physics) that would be the starting point for any particular problem. The impotence of the general strategy can be illustrated by the following example-puzzle: Suppose that a picture is broken into many small pieces that are being mixed up; in order to efficiently reconstruct this picture, one has to know how this picture should look; otherwise the problem becomes combinatorial, and practically, unsolvable. This puzzle is directly related to the top-down approach to Physics of Life. The paper presents a review of the author’s publications (M. Zak, 1999a, 2003, 2004, 2005a, 2006a, 2006b, and 2006c).

2. Dynamics with Liouville Feedback

A. Destabilizing Effect of Liouville feedback

We will start with derivation of an auxiliary result that illuminates departure from Newtonian dynamics. For mathematical clarity, we will consider here a one-dimensional motion of a unit mass under action of a force f depending upon the velocity v and time t

Physics of Life from First Principles 63

(2)

If initial conditions are not deterministic, and their probability density is given in the form

(3) while ρ is a single- valued function, then the evolution of this density is expressed by the corresponding Liouville equation

(4)

The solution of this equation subject to initial conditions and normalization constraints (3) determines probability density as a function of V and t : (V,t) . In order to deal with the constraint (3), let us integrate Eq. (4) over the whole space assuming that 0 at |V | and | f | . Then

(5)

Hence, the constraint (3) is satisfied for t 0 if it is satisfied for t 0. Let us now specify the force f as a feedback from the Liouville equation

(6) and analyze the motion after substituting the force (6) into Eq.(2)

(7)

This is a fundamental step in our approach. Although the theory of ODE does not impose any restrictions upon the force as a function of space coordinates, the Newtonian physics does: equations of motion are never coupled with the corresponding Liouville equation. Moreover, it can be shown that such a coupling leads to non-Newtonian properties of the underlying model. Indeed, substituting the force from Eq. (6) into Eq. (5), one arrives at the nonlinear equation for evolution of the probability density

64 Michail Zak

(8)

Let us now demonstrate the destabilizing effect of the feedback (6). For that purpose, it should be noted that the derivative / v must change its sign, at least once, within the interval v , in order to satisfy the normalization constraint (3). But since

(9)

there will be regions of v where the motion is unstable, and this instability generates randomness with the guided by the Liouville equation (8). It should be noticed that the condition (9) may lead to exponential or polynomial growth of v (in the last case the motion is called neutrally stable, however, as will be shown below, it causes the emergence of randomness as well if prior to the polynomial growth, the Lipcshitz condition is violated).

B. Emergence of Randomness

In order to illustrate mathematical aspects of the concepts of Liouville feedback, as well as associated with it instability and randomness let us take the feedback (6) in the form

(10) to obtain the following equation of motion

(11)

This equation should be complemented by the corresponding Liouville equation (in this particular case, the Liouville equation takes the form of the Fokker-Planck equation)

(12)

2 Here v stands for a particle velocity, and is the constant diffusion coefficient. The solution of Eq. (12) subject to the sharp initial condition is Physics of Life from First Principles 65

(13)

Substituting this solution into Eq. (11) at V=v one arrives at the differential equation with respect to v (t)

(14) and therefore,

(15) where C is an arbitrary constant. Since v=0 at t=0 for any value of C, the solution (15) is consistent with the sharp initial condition for the solution (13) of the corresponding Liouvile equation (12). The solution (15) describes the simplest irreversible motion: it is characterized by the “beginning of time” where all the trajectories intersect (that results from the violation of Lipschitz condition at t=0, Figure 4), while the backward motion obtained by replacement of t with (-t) leads to imaginary values of velocities. One can notice that the probability density (13) possesses the same properties.

Figure 4. Stochastic process and probability density.

For a fixed C, the solution (15) is unstable since

(16) and therefore, an initial error always grows generating randomness. Initially, at t=0, this growth is of infinite rate since the Lipschitz condition at this point is violated

66 Michail Zak

(17)

This type of instability has been introduced and analyzed by (Zak, M., 1992). Considering first Eq. (15) at fixed C as a sample of the underlying stochastic process (13), and then varying C, one arrives at the whole ensemble characterizing that process, (see Figure 4). One can verify that, as follows from Eq. (13), (Risken., 1989) that the expectation and the variance of this process are, respectively

(18)

The same results follow from the ensemble (15) at C . Indeed, the first equality in (18) results from symmetry of the ensemble with respect to v=0; the second one follows from the fact that

(19)

It is interesting to notice that the stochastic process (15) is an alternative to the following Langevin equation, (Risken., 1989)

(19a) that corresponds to the same Fokker-Planck equation (12). Here (t) is the Langevin (random) force with zero mean and constant variance . The results described in this sub-section can be generalized to n-dimensional case, (Zak, M, 2007b)

C. Emergence of Entanglement

In order to introduce and illuminate a fundamentally new non-Newtonian phenomenon similar to , let us assume that the function ( ) in Eq. (6) is invertible, i.e. 1( f ) . Then Eqs. (7) and (8) take the form, respectively

(7a)

(8)

Physics of Life from First Principles 67

As follows from Eq. (7) with reference to the normalization constraint (3)

(20)

It should be mentioned that non-Newtonian properties of solution to Eq. (8a) such as shock waves in probability space have been studied in (Zak, M., 2004, 2006c), Figure 24. Thus, the motions of the particles emerged from instability of Eq. (7a) are entangled: they must satisfy the global kinematical constraint (20). It is instructive to notice that quantum entanglement was discovered in a special class of quantum states that become entangled in the course of their simultaneous creation. Similar result can be obtained for the feedback (10). Indeed, let us rewrite Eq. (11) in the form

(21) where

(22)

Integrating Eq. (21) over the whole region C ,one arrives at the following global constraint in the form

(23) that entangles accelerations of different samples of the same stochastic process. The same result follows from symmetry of the acceleration field plotted in Figure 5. It is important to notice that the Fokker-Planck equation (12) does not impose any constraints upon the corresponding Langevin equation (20), i.e. upon the accelerations v. In order to demonstrate it, consider two particles driven by the same Langevin force (t)

(24)

Obviously, the difference between the particles accelerations, in general, is not zero: it represents a stationary stochastic process with the zero mean and the variance 2 .That confirms the fact that entanglement is a fundamental non-Newtonian effect: it requires a feedback from the Liouville equation to the equations of motion. Indeed, unlike the Langevin equation, the solution to Eq. (11) has a well-organized structure: as follows from Eq. (15) and Figures 4 and 5, the initial “explosion” of instability driven by the violation of the Lipschitz 68 Michail Zak condition at t=0 distributes the motion over the family of smooth trajectories with the probability expressed by Eq. (22). Therefore, as follows from Eq.(23), the entanglement effect correlates different samples of the same stochastic process. As a result of that, each entangled particle can predict motion of another entangled particle in a fully deterministic manner as soon as it detects the velocity or acceleration of that particle at least at one point; moreover, since Eqs. (11) and (12) are invariant with respect to position of the particles, the distance between these particles will not play any role in their entanglement.

Figure 5. Entangled accelerations.

It should be emphasized that the concept of a global constraint is one of the main attribute of Newtonian mechanics. It includes such idealizations as a rigid body, an incompressible fluid, an inextensible string and a membrane, a non-slip rolling of a rigid ball over a rigid body, etc. All of those idealizations introduce geometrical or kinematical restrictions to positions or velocities of particles and provides “instantaneous” speed of propagation of disturbances. However, the global constraint

(25) is fundamentally different from those listed above for two reasons. Firstly, this constraint is not an idealization, and therefore, it cannot be removed by taking into account more subtle properties of matter such as elasticity, compressibility, etc. Secondly, it imposes restrictions not upon positions or velocities of particles, but upon the probabilities of their positions or velocities, and that is where the entanglement comes from. Physics of Life from First Principles 69

Continuing this brief review of global constraints, let us discuss the role of the reactions to these constraints, and in particular, let us find the analog of reactions of global constraints in quantum mechanics. One should recall that in an incompressible fluid, the reaction to the global constraint v 0 (expressing non-negative divergence of the velocity v) is a non- negative pressure p 0 ; in inextensible flexible (one- or two-dimensional) bodies, the 0 reaction of the global constraint gij g ij , i,j =1,2 (expressing that the components of the metric tensor cannot exceed their initial values) is a non-negative stress tensor ij 0 , i,j=1,2. Turning to quantum mechanics and considering the uncertainty inequality

(26) in which x and p are the standard deviation of coordinate and impulse, respectively 2 2 as a global constraint, one arrives at the quantum potential as a “reaction” of this 2m constraint in the Madelung equations

(27)

(28)

iS / h Here and S are the components of the wave function e , and  is the Planck constant divided by 2 . But since Eq. (27) is actually the Liouville equation, the quantum potential represents a Liouville feedback similar to those introduced above via Eqs. (6) and (10. Due to this topological similarity with quantum mechanics, the models that belong to the same class as those represented by the system (7), (8) are expected to display properties that are closer to quantum rather than to classical ones.

D. Summary

A new kind of dynamics that displays non-Newtonian properties such as self-generated randomness and entanglement of different samples of the same stochastic process has been introduced. These novel phenomena result from a feedback from the Liouville equation to the equation of motion that is similar (but not identical) to those in quantum mechanics. 70 Michail Zak

3. From Disorder to Order

A. Information Potential

Before introducing the model of Livings, we have to discuss another non-trivial property of systems with the Liouville feedback. For that purpose, consider a Newtonian particle of mass comparable to the mass of the molecules. Such a particle is subjected to a fluctuating (thermal) force qL(t) called the Langevin force. The Brownian motion (in terms of the velocity v) of that particle is described by a stochastic differential equation

(29)

Here q = const is the strength of fluctuations. The probability density ρ(V,t) of the velocity distribution is described by the corresponding Fokker-Planck equation

(30)

The solution to this equation that starts with the sharp initial value at V = 0

(31) demonstrates monotonous increase of entropy as a measure of disorder. Another property of this solution is its irreversibility: replacement of t with (-t) leads to imaginary values of density ρ. It should be emphasized that Eqs. (29) and (30) are not coupled; in other words, the particle “does not know” about its own probability distribution. This fact represents a general property of Newtonian dynamics: the equations of motion (for instance, the Hamilton-Jacobi equation) are not coupled with the corresponding Liouville equation as well as the Langevin equations are not coupled with the corresponding Fokker-Planck equation. However, in quantum mechanics, the Hamilton-Jacobi equation is coupled with the Liouville equation via the quantum potential, and that constitutes the fundamental difference between the Newtonian and quantum worlds. Following quantum formalism, let us couple Eqs. (29) and (30). For that purpose, introduce a function

(32)

This function can be called “the information potential” since its expected value [-M ln ρ] is equal to the Shannon information capacity H. (However, this potential cannot be identified Physics of Life from First Principles 71 with a potential energy since it depends upon velocities rather than upon positions). The gradient of this potential taken with the opposite sign can represent an information-based force F = -grad Π per unit mass.

(33)

B. Negative Diffusion

Let us apply this force to the particle in Eq. (29) and present this equation in a dimensionless form assuming that the parameter α absorbs the characteristic velocity and the characteristic time If one chooses α = q2, then the corresponding Liouville equation (that takes the form of the Fokker-Planck equation) will change from (30) to the following

(34)

Thus, the information force stops the diffusion. However, the information force can be even more effective: it can reverse the diffusion process and push the probability density back to the sharp value in finite time. Indeed, suppose that in the information potential

(35)

Then the Fokker-Planck equation takes the form

(36)

Multiplying Eq.(36) by V2 , then integrating it with respect to V over the whole space,

(37) one arrives at ODE for the variance D(t) Thus, as a result of negative diffusion, the variance D monotonously vanishes regardless of the initial value D (0). It is interesting to note that the time T of approaching D = 0 is finite

(38) 72 Michail Zak

This terminal effect is due to violation of the Lipchitz condition, (Zak, M.,1992) at D = 0

(39)

Let us turn to a linear version of Eq. (36)

(40) and discuss a negative diffusion in more details. As follows from the linear equivalent of Eq. ((39)

(41)

Thus, eventually the variance becomes negative, and that disqualifies Eq. (40) from being meaningful. It has been shown (Zak, M., 2005) that the initial value problem for this equation is ill-posed: its solution is not differentiable at any point. (Such an ill-posedness expresses the Hadamard instability studied in (Zak, M., 1994)). Therefore, a negative diffusion must be nonlinear in order to protect the variance from becoming negative, (see Figure 6.). One of possible realization of this condition is placing a terminal attractor (Zak, M., 1992) at D=0 as it was done in Eq. (36).

Figure 6. Negative diffusion.

It should be emphasized that negative diffusion represents a major departure from both Newtonian mechanics and classical thermodynamics by providing a progressive evolution of complexity against the Second Law of thermodynamics, Figure 7. Physics of Life from First Principles 73

Figure 7. Living system: deviation from thermodynamics.

C. Drift

One notes that Eq. (36) is driftless, i.e. its solution preserves the initial mean value of the state variable. Obviously, the drift can be easily introduced via both Newtonian or information forces. In our approach we will follow the “Law of Parsimony”, or “Occam’s Razor”: Pluritas non est ponenda sine necessitate, i.e. if a drift can be generated by classical forces, it is not necessary to duplicate it with the information forces since that would divert attention from a unique capability of information forces, namely, from their role in progressive evolution of complexity.

D. Summary

A progressive evolution of complexity has been achieved via information potential that implements the Liouville feedback and leads to a fundamentally new nonlinear phenomenon- negative diffusion- that evolves “against the Second Law of thermodynamics”.

4. Model of Livings

Consider a system of n particles as intelligent agents that interact via the information potential in the fashion described above. Then, as a direct generalization of Eqs. (33) and (36), one obtains

(42) 74 Michail Zak

where αij are function of the correlation moments Dks

(43)

(44) and

(45)

Here is a positive constant that relates Newtonian and information forces. It is introduced in order to keep the functions ij dimensionless. The solution to Eqs. (42) and (45)

(46) must satisfy the constraint that is an n-dimensional generalization of the constraint D 0, namely, a non-negativity of the matrix | Dij |, i.e. a non-negativity of all the left-corner determinants

(47)

B. Simplified Model

Since enforcement of the constraints (47) is impractical, we will propose a simpler model on the expense of its generality assuming that the tensors ij and Dij are co-axial, and therefore, the functions (43) can be reduced to the following

(48)

where ii and Dii are the eigenvalues of the corresponding tensors. Referring Eqs. (42) and (45) to the principal coordinates, one obtains

(49) Physics of Life from First Principles 75

Since Egs. (49) and (50) have tensor structure, they can be rewritten in arbitrary system of coordinates using standard rules of tensor transformations.

(50)

2 Let us express Eq. (50) in terms of the correlation moments: multiplying it byVi , then using partial integration, one arrives at an n-dimensional analog of Eq. (37)

(51)

The last step is to choose such a structure of the functions (48) that would enforce the constraints (47), i.e.

(52)

The simplest (but still sufficiently general) form of the functions (48) is a neural network with terminal attractors

(53) that reduces Eqs.(51) to the following system (Zak, M., 2006a)

(54)

Here D0 is a constant scaling coefficient of the same dimensionality as the correlation coefficients Dii , and wij are dimensionless constants representing the identity of the system. Let us now analyze the effect of terminal attractor and, turning to Eq.(54), start with the D matrix | ii |. Its diagonal elements, i.e. eigenvalues, become infinitely negative when the Dii variances vanish since

(55)

76 Michail Zak while the rest terms are bounded. Therefore, due to the terminal attractor, Eq. ((54) linearized with respect to zero variances has infinitely negative characteristic roots, i.e. it is infinitely stable regardless of the parameters wij . Therefore the principal variances cannot overcome zero if their initial values are positive. This provides the well-posedness of the initial value problem.

C. Invariant Formulation

Eqs. (49) and (50) can be presented in the following invariant form

(56)

(57)

D. Variation Principle

Let us concentrate upon the capability of Livings to decrease entropy (see Eq. (1)) without external forces. This privileged motion is carried out through negative diffusion that requires negativity of the eigenvalues of the tensor α in Eqs. (56) and (57). Redefining the concept of information potential for n-dimensional case (see Eq. (32))

(32a) one can rewrite Eq. (56) as

(56a)

Then, along the trajectory defined by Eq. (56a) the following inequality holds

(58)

Thus, the privileged motion of Livings monotonously maximizes the information potential along a chosen trajectory. This means that if the initial density 0 1, and therefore, 0 0 at the chosen trajectory, the density would monotonously decrease along this trajectory, i.e.  0. Conversely, if 0 1, and therefore, 0 0 , the density would monotonously increase, i.e.  0 . Hence, the privileged motion of Livings driven by negative diffusion monotonously decreases the flatness of the initial density distribution. Obviously, the initial choice of the trajectory is determined by Eq. (57). But the most likely trajectory is those that pass through maxima of density distributions. Since that trajectory has Physics of Life from First Principles 77 also maxima of the information potential (32a) (compare to other trajectories), one arrive at the following principle: The most likely privileged motion of Livings delivers the global maximum to the information potential (32a), (M. Zak, 2005a).

Remark

Strictly speaking, the formulated variation principle is necessary, but not sufficient for derivation of the governing equations (56) and (57): this principle should be considered only as a complement to the Hamiltonian principle taken in its non-variation formulation (since the information “potential” П depends upon velocities rather than upon coordinates):

(58a)

Here S- is action, and is an elementary work of the non-conservative information force . It is obvious that the Hamiltonian principle in the form (58a) leads to equation of motion (56), while the inequality (58) (along with the continuity equation (57)) specifies the information force.

E. Summary

A closed system of dynamical equations governing behavior of Livings has been introduced and discussed.

5. Interpretation of the Model

A. Mathematical Viewpoint

The model is represented by a system of nonlinear ODE (56) and a nonlinear parabolic PDE (57) coupled in a master-slave fashion: Eq. (57) is to be solved independently, prior to solving Eq. (56). The coupling is implemented by a feedback that includes the first gradient of the probability density, and that converts the first order PDE (the Liouville equation) to the second order PDE (the Fokker-Planck equation). Its solution, in addition to positive diffusion, can display negative diffusion as well, and that is the major departure from the classical Fokker-Planck equation. Figure 7. It has been demonstrated that negative diffusion must be nonlinear with an attractor at zero variances to guarantee well-posedness of the initial value problem, and that imposes additional constraints upon the mathematical structure of the model, (see Eqs. (47) and (52)). The nonlinearity is generated by a feedback from the PDE (57) to the ODE (56), (that is the same feedback that is responsible for parabolicity of the PDE (57)). As a result of the nonlinearity, the solutions to PDE can have attractors (static, periodic, or chaotic) in probability space (see Eqs. (51). The multi-attractor limit sets allow one to introduce an extension of neural nets that can converge to a prescribed type of a 78 Michail Zak stochastic process in the same way in which a regular neural net converges to a prescribed deterministic attractor. The solution to ODE (56) represents another major departure from classical ODE: due to violation of Lipchitz conditions at states where the probability density has a sharp value, the solution loses its uniqueness and becomes random. However, this randomness is controlled by the PDE (57) in such a way that each random sample occurs with the corresponding probability, (see Figure4).

B. Physical Viewpoint

The model represents a fundamental departure from both Newtonian and statistical mechanics. In particular, negative diffusion cannot occur in isolated systems without help of the Maxwell sorting demon that is strictly forbidden in statistical mechanics. The only conclusion to be made is that the model is non-Newtonian, although it is fully consistent with the theory of differential equations and stochastic processes. Strictly speaking, it is a matter of definition weather the model represents an isolated or an open system since the additional energy applied via the information potential is generated by the system “itself” out of components of the probability density. In terms of a topology of its dynamical structure, the proposed model links to quantum mechanics: if the information potential is replaced by the quantum potential, the model turns into the Madelung equations that are equivalent to the Schrödinger equation (Takabayasi, T., 1953), Figure 8. It should be noticed that the information potential is introduced via the dimensionless parameter α that is equal to the rate of decrease of the disorder (-dD/dt), (see Eq. (51)) and a new physical parameter δ of dimension [δ] = m2/sec3 describing a new physical quantity that relates the rate of decrease of disorder to the specific (information-based) force. Formally the parameter introduces the information potential in the same way in which the Planck constant  introduces quantum potential. The system of ODE (56) characterized by velocities vi describes a mechanical motion of the system driven by information forces. Due to specific properties of these forces discussed above, this motion acquires properties similar to those of quantum mechanics. These properties are discussed below.

Figure 8. Classical Physics, Quantum Physics and Physics of Life. Physics of Life from First Principles 79

α.Superposition

In quantum mechanics, any observable quantity corresponds to an eigenstate of a Hermitian linear operator. The linear combination of two or more eigenstates results in of two or more values of the quantity. If the quantity is measured, the projection postulate states that the state will be randomly collapsed onto one of the values in the superposition (with a probability proportional to the square of the amplitude of that eigenstate in the linear combination). Let us compare the behavior of the model of Livings from that viewpoint. As follows from Eq. (15), all the particular solutions intersect at the same point x=0 at t=0, and that leads to non-uniqueness of the solution due to violation of the Lipcshitz condition (see Eq. (17)). Therefore, the same initial condition x=0 at t=0 yields infinite number of different solutions forming a family (15); each solution of this family appears with a certain probability guided by the corresponding Fokker-Planck equation. For instance, in case of Eq. (15), the “winner” solution is x 0 since it passes through the maxima of the probability density (13). However, with lower probabilities, other solutions of the family (15) can appear as well. Obviously, this is a non-classical effect. Qualitatively, this property is similar to those of quantum mechanics: the system keeps all the solutions simultaneously and displays each of them “by a chance”, while that chance is controlled by the evolution of probability density (12). It should be emphasized that the choice of displaying a certain solution is made by the Livings model only once, at t=0, i.e. when it departs from the deterministic to a random state; since than, it stays with this solution as long as the Liouville feedback is present.

β. Entanglement

Quantum entanglement is a phenomenon in which the quantum states of two or more objects have to be described with reference to each other, even though the individual objects may be spatially separated. This leads to correlations between observable physical properties of the systems. For example, it is possible to prepare two particles in a single such that when one is observed to be spin-up, the other one will always be observed to be spin-down and vice versa, this despite the fact that it is impossible to predict, according to quantum mechanics, which set of measurements will be observed. As a result, measurements performed on one system seem to be instantaneously influencing other systems entangled with it. Qualitatively similar effect has been found in the proposed model of Livings (see Eqs. (21)-(23)) that demonstrate that different realizations of motion emerged from instability of Eq. (15) are entangled: they must satisfy the global kinematical constraint (23). Therefore, as follows from Eq. (23), the entanglement effect correlates different samples of the same stochastic process. It is instructive to notice again that quantum entanglement was discovered in a special class of quantum states that become entangled in the course of their simultaneous creation.

γ. Decoherence

In quantum mechanics, decoherence is the process by which quantum systems in complex environments exhibit classical behavior. It occurs when a system interacts with its 80 Michail Zak environment in such a way that different portions of its wavefunction can no longer interfere with each other. Qualitatively similar effects are displayed by the proposed model of Livings. In order to illustrate that, let us turn to Eqs. (11), (12), and notice that this system makes a choice of the particular solution only once i.e. when it departs from the deterministic to a random state; since then, it stays with this solution as long as the Liouville feedback is present, ( 0) . However, as soon as this feedback disappears, ( 0) , the system becomes classical, i.e. fully deterministic, while the deterministic solution is a continuation of the corresponding “chosen” random solution, (see Figure 9).

Figure 9. Switching between hybrid superposition and classical states.

δ.

In quantum physics, the Heisenberg uncertainty principle states that one cannot measure values (with arbitrary precision) of certain conjugate quantities which are pairs of of a single elementary particle. These pairs include the position and momentum. Similar (but 2 not identical) relationship follows from Eq. (15): vv C / 2 , i.e. the product of the velocity and the acceleration is constant along a fixed trajectory. In particular, at t=0, v and v can not be defined separately.

C. Biological Viewpoint

The proposed model illuminates the “border line” between living and non-living systems. The model introduces a biological particle that, in addition to Newtonian properties, possesses the ability to process information. The probability density can be associated with the self- image of the biological particle as a member of the class to which this particle belongs, while Physics of Life from First Principles 81 its ability to convert the density into the information force - with the self-awareness (both these concepts are adopted from psychology). Continuing this line of associations, the equation of motion (such as Eqs (56)) can be identified with a motor dynamics, while the evolution of density (see Eqs. (57) –with a mental dynamics. Actually the mental dynamics plays the role of the Maxwell sorting demon: it rearranges the probability distribution by creating the information potential and converting it into a force that is applied to the particle. One should notice that mental dynamics describes evolution of the whole class of state variables (differed from each other only by initial conditions), and that can be associated with the ability to generalize that is a privilege of living systems. Continuing our biologically inspired interpretation, it should be recalled that the second law of thermodynamics states that the entropy of an isolated system can only increase, Eig.3. This law has a clear probabilistic interpretation: increase of entropy corresponds to the passage of the system from less probable to more probable states, while the highest probability of the most disordered state (that is the state with the highest entropy) follows from a simple combinatorial analysis. However, this statement is correct only if there is no Maxwell’ sorting demon, i.e., nobody inside the system is rearranging the probability distributions. But this is precisely what the Liouville feedback is doing: it takes the probability density from Equation (57), creates functionals and functions of this density, converts them into a force and applies this force to the equation of motion (56). As already mentioned above, because of that property of the model, the evolution of the probability density becomes nonlinear, and the entropy may decrease “against the second law of thermodynamics”, Figure7. Obviously the last statement should not be taken literary; indeed, the proposed model captures only those aspects of the living systems that are associated with their behavior, and in particular, with their motor- mental dynamics, since other properties are beyond the dynamical formalism. Therefore, such physiological processes that are needed for the metabolism are not included into the model. That is why this model is in a formal disagreement with the second law of thermodynamics while the living systems are not. In order to further illustrate the connection between the life- nonlife discrimination and the second law of thermodynamics, consider a small physical particle in a state of random migration due to thermal energy, and compare its diffusion i.e. physical random walk, with a biological random walk performed by a bacterium. The fundamental difference between these two types of motions (that may be indistinguishable in physical space) can be detected in probability space: the probability density evolution of the physical particle is always linear and it has only one attractor: a stationary stochastic process where the motion is trapped. On the contrary, a typical probability density evolution of a biological particle is nonlinear: it can have many different attractors, but eventually each attractor can be departed from without any “help” from outside. That is how H. Berg, 1983, describes the random walk of an E. coli bacterium:” If a cell can diffuse this well by working at the limit imposed by rotational Brownian movement, why does it bother to tumble? The answer is that the tumble provides the cell with a mechanism for biasing its random walk. When it swims in a spatial gradient of a chemical attractant or repellent and it happens to run in a favorable direction, the probability of tumbling is reduced. As a result, favorable runs are extended, and the cell diffuses with drift”. Berg argues that the cell analyzes its sensory cue and generates the bias internally, by changing the way in which it rotates its flagella. This description demonstrates that actually a bacterium interacts with the medium, i.e., it is not isolated, and that reconciles its behavior with the second law of 82 Michail Zak thermodynamics. However, since these interactions are beyond the dynamical world, they are incorporated into the proposed model via the self-supervised forces that result from the interactions of a biological particle with “itself,” and that formally “violates” the second law of thermodynamics. Thus, the proposed model offers a unified description of the progressive evolution of living systems. Based upon this model, one can formulate and implement the principle of maximum increase of complexity that governs the large-time-scale evolution of living systems. It should be noticed that at this stage, our interpretation is based upon logical extension of the proposed mathematical formalism, and is not yet corroborated by experiments.

D. Psychological Viewpoint

The proposed model can be interpreted as representing interactions of the agent with the self-image and the images of other agents via the mechanisms of self-awareness. In order to associate these basic concepts of psychology with our mathematical formalism, we have to recall that living systems can be studied in many different spaces such as physical (or geographical) space as well as abstract (or conceptual) spaces. The latter category includes, for instance, social class space, sociometric space, social distance space, semantic space e.t.c.Turning to our model, one can identify two spaces: the physical space x,t in which the agent state variables vi xi evolve,(see Eqs.(56)), and an abstract space in which the probability density of the agent’ state variables evolve (see Eq.(57)).The connection with these spaces have been already described earlier: if Eqs. (56) are run many times starting with the same initial conditions, one will arrive at an ensemble of different random solutions, while Eq. (57) will show what is the probability for each of these solutions to appear. Thus, Eq. (57) describes the general picture of evolution of the communicating agents that does not depend upon particular initial conditions. Therefore, the solution to this equation can be interpreted as the evolution of the self- and non-self images of the agents that jointly constitutes the collective mind in the probability space, Figure (10). Based upon that, one can propose the following interpretation of the model of communicating agents: considering the agents as intelligent subjects, one can identify Eqs. (56) as a model simulating their motor dynamics, i.e. actual motions in physical space, while Eq.(57) as the collective mind composed of mental dynamics of the agents. Such an interpretation is evoked by the concept of reflection in psychology, (V. Lefebvre, 2001). Reflection is traditionally understood as the human ability to take the position of an observer in relation to one’s own thoughts. In other words, the reflection is the self-awareness via the interaction with the image of the self. Hence, in terms of the phenomenological formalism proposed above, a non-living system may possess the self-image, but it is not equipped with the self-awareness, and therefore, this self-image is not in use. On the contrary, in living systems the self-awareness is represented by the information forces that send information from the self-image (57) to the motor dynamics (56). Due to this property that is well-pronounced in the proposed model, an intelligent agent can run its mental dynamics ahead of real time, (since the mental dynamics is fully deterministic, and it does not depend explicitly upon the motor dynamics) and thereby, it can predict future expected values of its state variables; then, by interacting with the self- image via the information forces, it can change the expectations if they are not consistent with Physics of Life from First Principles 83 the objective. Such a self-supervised dynamics provides a major advantage for the corresponding intelligent agents, and especially, for biological species: due to the ability to predict future, they are better equipped for dealing with uncertainties, and that improves their survivability. It should be emphasized that the proposed model, strictly speaking, does not discriminate living systems of different kind in a sense that all of them are characterized by a self-awareness-based feedback from mental (57) to motor (56) dynamics. However, in primitive living systems (such as bacteria or viruses) the self-awareness is reduced to the simplest form that is the self-nonself discrimination; in other words, the difference between the living systems is represented by the level of complexity of that feedback.

E. Neuro-Science Viewpoint

The proposed model represents a special type of neural net. Indeed, turning to Eqs. (54) and reinterpreting the principal correlation moments Dii as neurons’ mean soma potentials, one arrives at a conventional neural net formalism. α. Classical neural nets. We will start with a brief review of the classical version of this kind of dynamical models in order to outline some of its advantages and limitations. The standard form of recurrent neural networks (NN) is

(59)

where xi are state variables, wij are synaptic interconnection, or weights (associated with the NN topology). The system (59) is nonlinear and dissipative due to the sigmoid function tanh. The nonlinearity and dispassivity are necessary (but not sufficient) conditions that the system (59) has attractors. The locations of the attractors and their basins in phase (or configuration) space are prescribed by an appropriate choice of the synaptic interconnections which can be found by solving the inverse problem (followed by the stability analysis), or by learning (that is a dynamical relaxation procedure based upon iterative adjustments of as a result of comparison of the net output with known correct answers). In both cases, are constants, and that is the first limitation of recurrent NN. Indeed, although the NN architecture (59) is perfectly suitable for such tasks as optimization, pattern recognition, associative memory, i.e., when fixed topology is an advantage, it cannot be exploited for simulation of a complex dynamical behavior that is presumably comprised of a chain of self- organizing patterns (like, for instance, in genome) since for that kind of tasks, variable topology is essential. However, there is no general analytical approach to the synthesis of such NN. And now we are coming to the second limitation of NN (1): their architecture does not have a tensor structure. Indeed, the state variables and the interconnections cannot be considered as a vector and a tensor, respectively, since their invariants are not preserved under linear transformations of the state variables. Obviously, the cause of that is the nonlinearity in the form of the sigmoid function. That is why the dynamical system (59) (even with a fixed topology) cannot be decoupled and written in a canonical form; as a result of that, the main mathematical tools for NN synthesis are based upon numerical runs.

84 Michail Zak

β. Mystery of Mirror Neuron

The proposed model represents a special type of neural net. Indeed, turning to Eqs. (54) and reinterpreting the principal correlation moments Dii as neurons’ mean soma potentials, one arrives at a conventional neural net formalism. The analysis of Eq. (59) can be linked to the concept of a mirror neuron. The discovery of mirror neurons in the frontal lobes of macaques and their implications for human brain evolution is one of the most important findings of neuroscience in the last decade. Mirror neurons are active when the monkeys perform certain tasks, but they also fire when the monkeys watch someone else perform the same specific task. There is evidence that a similar observation/action matching system exists in humans. In the case of humans, this phenomenon represents the concept of imitation learning, and this faculty is at the basis of human culture. Hence, a mirror neuron representing an agent A can be activated by an expected (or observed) action of an agent B which may not be in a direct contact with the agent A at all. Quoting the discoverer of the mirror neuron, Dr. Giacomo Rizzolatti, “the fundamental mechanism that allows us a direct grasp of the mind of others is not conceptual reasoning, but direct simulation of the observed event through the mirror mechanism.” In other words, we do not need to think and analyze, we know immediately the intensions of other people. In terms of the mathematical formalism, such a direct grasp of the mind of others represents a typical non-locality, i.e. an “invisible” influence on a distance that is similar to quantum entanglement.

γ. Mirror Neural Nets

In order to capture this fundamentally new phenomenon, we will apply the approximated version of dynamical formalism developed in the Section 3, and modify Eq. (49) and (53). In the simplest case when αij= const., the solution describing the transition from initial (sharp) to current density in Eq. (50) is given by the Gaussian distribution, ( Risken,H., 1989),

(60)

Here the Green function in matrix notation is

(61) where the matrix elements of a and G are given by aij and Gij, respectively. If αij are not constants, for instance, if they are defined by Eqs. (53), the solution (60) are valid only for small times. (Actually such an approximation follows from expansion of a probability density in Gram-Charlier series with keeping only the first term of that series.) Nevertheless, for better physical interpretation, we will stay with this approximation in our further discussions. Substituting the solution (60) into Eq.(49), one obtains

(62) Physics of Life from First Principles 85

Eqs. (62) are to be complemented with Eq. (54). (In both equations, the summation is with respect to the index j.) Formally Eq.(54) represents a classical neural net with terminal attractors where the principle variances Dii play the role of state variables. Depending upon ~ the chosen values of constant synaptic interconnections wij, the variances Dii can converge to static, periodic or chaotic attractors. In particular, if wij = wji, the net has only static attractors. However, regardless of synaptic interconnections, the state = 0 (i = 1,2,…n), is always a terminal attractor that protects the variances from crossing zeros. Eqs.(62) represent a stochastic neural net driven by its own principle variances that, in turn, is governed by the neural net (54). The connection between the nets (54) and (62) is the following: if Eqs (62) are run many times, one will arrive at an ensemble of different random solutions, while Eq. (54) will show what is the probability for each of these solutions to appear. Thus, Eq. (54) describes the general picture of evolution that does not depend upon particular initial conditions. Therefore, the solution to this equation can be interpreted as the evolution of the self- and non-self images of the neurons that jointly constitutes the collective mind in the probability space. Let us recall that in classical neural nets (59), each neuron receives fully deterministic information about itself and other neurons, and as a result of that, the whole net eventually approaches an attractor. In contradistinction to that, in the neural net (62), each neuron receives the image of the self and the images of other neurons that are stored in the joint probability distribution of the collective mind, Figure 10. These images are not deterministic: they are characterized by uncertainties described by the probability distribution, while that distribution is skewed toward the expected value of the soma potential, with variances controlled by Eq. (54). In our view, such a performance can be associated with the mirror neurons. Indeed, as mentioned above, a mirror neuron fires both when performing an action and when observing the same action performed by another subject. The way in which this neuron works is the following. It is assumed that all the communicating neurons belong to the same class in a sense that they share the same general properties and habits. It means that although each neuron may not know the exact values of soma potentials of the rest of the neurons, it, nevertheless, knows at least such characteristics as their initial values (to accuracy of initial joint probability density, or, at least, initial expected values and initial variances). This preliminary experience allows a neuron to reconstruct the evolution of expected values of the rest of the neurons using the collective mind as a knowledge base. Hence, the neuron representing an agent A can be activated by an expected action of an agent B that may not be in a direct contact with the agent A at all, and that can be associated with the mirror properties of the neuron. The qualitative behavior of the solution to the mirror-neuron-based net (62) and (54) is illustrated in Figure 11. The collective properties of mirror neurons, i.e. the mirror neural nets, have a significant advantage over the regular neural nets: they possess a fundamentally new type of attractor – the stochastic attractor that is a very powerful generalization tool. Indeed, it includes a much broader class of motions than static or periodic attractors. In other words, it provides the highest level of abstraction. In addition to that, a stochastic attractor represents the most complex patterns of behavior if the mirror net describes a set of interacting agents. Indeed, consider a swarm of insects approaching some attracting pattern. If this pattern is represented by a static or periodic attractor, the motion of the swarm is locked up in a rigid pattern of behavior that may decrease its survivability. On the contrary, if that pattern is represented by a stochastic attractor, the swarm still has a lot of freedom, and only the statistics invariants of 86 Michail Zak the swarm motion is locked up in a certain pattern of behavior. Figure 12. It should be emphasized that, due to the multi-attractor structure, the proposed model provides the following property: if the system starts from different initial conditions, it may be trapped in a different stochastic pattern. Such a property, in principle, cannot be provided by regular neural nets or cellular automata since they can have only one stochastic attractor, Figure 13.

Figure 10. Collective mind.

Figure 11. Evolution of mirror neuron and its self-image.

Figure 12. Mirror neural nets. Physics of Life from First Principles 87

If this pattern is represented by a static or periodic attractor, the motion of the swarm is locked up in a rigid pattern of behavior that may decrease its survivability. On the contrary, if that pattern is represented by a stochastic attractor, the swarm still has a lot of freedom, and only the statistics invariants of the swarm motion is locked up in a certain pattern of behavior. Figure 12. It should be emphasized that, due to the multi-attractor structure, the proposed model provides the following property: if the system starts from different initial conditions, it may be trapped in a different stochastic pattern. Such a property, in principle, cannot be provided by regular neural nets or cellular automata since they can have only one stochastic attractor, Figure 13.

Figure 13. Multi-attractor structure of variance.

δ. Link to Quantum Entanglement

Continuing the discussion about a formal similarity between the concept of mirror neuron and quantum entanglement, we will emphasize again that both of these phenomena result from the global constraint originated from the Liouville feedback, namely, the quantum potential (see Eqs. ( 27) and (28)), and the information potential (see Eqs. (56) and (57)). In the both cases, the probability density ρ enters the equation of motion imposing the global constraint upon the state variables via the normalization condition (see Eq. (23)). However, it should be emphasized the difference between the status of these models. The model of quantum entanglement is well-established (at least, within the Schrödinger formalism), being corroborated by enormous number of experiments. On the other hand, the model of mirror neuron nets has been proposed in this paper. It is based upon two principles that complement . First principle requires the system capability to decrease entropy by internal effort to be “alive”; the second principle is supposed to provide global constraint upon the motions of mirror neurons. Both of these principles are implemented via a specially selected feedback from the Liouville equation. This feedback is different from those in quantum mechanics; however the topologies of quantum and mirror neuron systems are similar. Now we are ready to address the fundamental question: how Nature implements global constraints via probability? Does it mean that there exists some sort of “universal mind”? The point is that the concept of probability does not belong to objective physical 88 Michail Zak world; it rather belongs to a human interpretation of this world. Indeed, we can observe and measure a random process, but we cannot measure directly the associated probabilities: that would require a special data processing, i.e. a “human touch”. As far as quantum mechanics is concerned, this question is still unanswered. However, for Livings, the global constraint via probability can be associated with the concept of collective mind. Indeed, based upon the assumption that all Livings which belong to the same class possess the identical abstract image of the external world, and recalling that, in terms of the proposed formalism, such an image is represented by the joint probability, one concludes that in Livings, the global constraint is implemented by the mechanism of “general knowledge” stored in the collective mind and delivered to the mirror neurons via the information forces. Several paradigms of self-organization (such as transmission of conditional information, decentralized coordination, cooperative computing, and competitive games in active systems) based upon entanglement effects, have been proposed in (Zak, M., 2002a).

F. Social and Economic Viewpoint

One of the basic problem of social theory is to understand “how, with the richness of language and the diversity of artifacts, people can create a dazzlingly rich variety of new yet relatively stable social structures”, (M.Arbib, 1986). Within the framework of the dynamical formalism, the proposed model provides some explanations to this puzzle. Indeed, social events are driven by two factors: the individual objectives and social constraints. The first factor is captured by the motor dynamics (56), while the social constraint is created by the collective mind (57). A balance between these factors (expressed by stochastic attractors) leads to stable social structures, while a misbalance (expressed by stochastic repellers) causes sharp transitions from one social structure to another (revolutions) or to wandering between different repellers (chaos, anarchy). For an artificial “society” of communicating agents, one can assign individual objectives for each agent as well as the collective constrains imposed upon them and study the corresponding social events by analyzing the governing equations (56) and (57). However, the same strategy is too naïve to be applied to a human society. Indeed, most human as members of a society, do not have rational objectives: they are driven by emotions, inflated ambitions, envy, distorted self- and nonself images, etc. At least some of these concepts can be formalized and incorporated into the model. For instance, one can consider emotions to be proportional to the differences between the state variables v and their expectations

(63)

Eq.(63) easily discriminates positive and negative emotions. Many associated concepts (anger, depression, happiness, indifference, aggressiveness. and ambitions) can be derived from this definition (possibly, in combination with distorted self- and nonself images). But the most accurate characteristic of the human nature was captured by cellular automata where each agent copies the behaviors of his closest neighbors (which in turn, copy their neighbors, etc.). As a result, the whole “society” spontaneously moves toward an unknown emerging“objective”. Although this global objective is uniquely defined by a local operator Physics of Life from First Principles 89 that determines how an agent processes the data coming from his neighbors, there is not known any explicit connection between this local operator and the corresponding global objective: only actual numerical runs can detect such a connection. Notwithstanding the ingenuity of his model, one can see its major limitation: the model is not equipped with a collective mind (or by any other type of a knowledge base), and therefore, its usefulness is significantly diminished in case of incompleteness of information. At the same time, the proposed model (56) and (57) can be easily transformed into cellular automata with the collective mind. In order to do that one has to turn to Eqs.(56), replace the sigmoid function by a local operator, and the time derivative -by the time difference. Then the corresponding Fokker-Planck equation (57) reduces to its discrete version that is Markov chains,( Zak, M., 2000). On the conceptual level, the model remains the same as discussed in the previous sections. The Figure 14 illustrates a possible approach to the social dynamics based upon the proposed model.

Figure 14. Social dynamics.

G. Language Communications Viewpoint

Language represents the best example of a communication tool with incomplete information since any message, in general, can be interpreted in many different ways depending upon the context i.e. upon the global information sheared by the sender and the 90 Michail Zak receiver. Therefore, the proposed model is supposed to be relevant for some language- oriented interpretations. Indeed, turning to Eqs.(56), one can associate the weighted sum of the state variables with the individual interpretations of the collective message made by the agents. The sigmoid functions of these sums form the individual responses of the agents to this message. These responses are completed by the information forces that compensate the lack of information in the message by exploiting the global sheared information stored in the collective mind, (see Eq. (57)). The agent’s responses converted into the new values of their state variables are transformed into the next message using the same rules, etc. These rules determined by the structure of Eqs. (56) and (57) can be associated with the grammar of the underlying language. In particular, they are responsible for the convergence to- or the divergence from the expected objective. It should be noticed that the language structure of the proposed model is invariant with respect to semantics. Hence, in terms of the linguistics terminology that considers three universal structural levels: sound, meaning and grammatical arrangement, (Yaguello, M.,1998), we are dealing here with the last one. To our opinion, the independence of the proposed model upon the semantics is an advantage rather than a limitation: it allows one to study invariant properties of the language evolution in the same way in which the Shannon information (that represents rather an information capacity) allows one to study the evolution of information regardless of a particular meaning of the transmitted messages.

Figure 15. Structure of language. Physics of Life from First Principles 91

Let us now try to predict the evolution of language communications based upon the proposed model. As mentioned earlier, the evolution of the living systems is always directed toward the increase of their complexity. In a human society, such a progressive evolution is effectively implemented by increase or the number of reflections in a chain”What do you think I think you think, etc.” The society may be stratified into several levels or “clubs” so that inside each club the people will shear more and more global information. This means that the language communications between the members of the same club will be characterized by the increased capacity of the collective mind (see Eq.(57)), and decreased information transmitted by the messages (see Eqs.(56)). In the theoretical limit, these messages will degenerate into a string of symbols, that can be easily decoded by the enormously large collective mind The language communications across the stratified levels will evolve in a different way: as long as the different clubs are drifting apart, the collective mind capacity will be decreasing while the messages will become longer and longer. However, the process of diffusion between these two streams (not included in our model) is very likely, see Figure 15.

H. Summary

Interpretation of the proposed dynamical model of Livings has been introduced and discussed from viewpoints of mathematics, physics, biology, neurosciences, etc.

6. Complexity for Survival of Livings

A. Measure of Survivability

In this sub-section, we will apply the proposed model to establish a connection between complexity and survivability of Livings. We will introduce, as a measure of survivability, the strength of the random force that, being applied to a particle, nullifies the inequality (2). For better physical interpretation, it will be more convenient to represent the inequality (2) in terms of the variance D

(64) remembering that for normal probability density distribution

(65) while the normal density is the first term in the Gram-Charlier series for representation of an arbitrary probability distribution. Thus, the ability to survive (in terms of preserving the property (2) under action of a random force) can be achieved only with help of increased complexity. However, physical complexity is irrelevant: no matter how complex is Newtonian or Langevin dynamics, the second law of thermodynamics will convert the inequality (2) into the opposite one. The only 92 Michail Zak complexity that counts is those associated with mental dynamics. Consequently, increase of complexity of mental dynamics, and therefore, complexity of the information potential, is the only way to maximize the survivability of Livings. This conclusion will be reinforced by further evidence to be discussed in the following section.

B. Mental Complexity via Reflection of Information

In this sub-section, we will show that communication between living particles via information potential increases their survivability. For that purpose, consider a system of two living particles (or intelligent agents) that interact via the simplest linear information potential in the fashion described above. We will start with the case when each particle interacts only with its own image. The motor dynamics of the system is the following

(66)

(67)

Then the mental dynamics in terms of the variance is described by two uncoupled equations

(68)

(69)

The solutions to these equations subject to zero initial conditions asymptotically approach 2 2 the stationary values q / 1 and q / 2 , respectively, while

(70)

Thus, non-communicating agents (66) and (67) do not expose the property of living systems (64) and behave as a physical non-living system. Let us now increase the complexity by allowing the agents to communicate:

(71)

Physics of Life from First Principles 93

(72)

In this case, each agent interacts with the image of another agent. Then the mental dynamics is represented by two coupled equations

(73)

(74)

This system describes harmonic oscillations induced by a step-function-force originated from noise of the strength q2, (Zak, M., 2007a). Obviously, there are periods when

(75) i.e. the communicating agents expose the property of life (64). It should be emphasized that no isolated physical system can oscillate in probability space since that is forbidden by the second law of thermodynamics, Figure 13. Oscillation of variances described by Eqs. (73) and (74), represents an exchange of information between the agents, and it can be interpreted as a special type of communications: conversation. During such a conversation, information is reflected from one agent to another, back and forth. Obviously, we are dealing here with Shannon information capacity that does not include semantics. This paradigm can be generalized to the case of n communicating agents

(76)

(77)

Since the matrix of the coefficients ij is skew-symmetric, its characteristic roots are pure imaginary, and therefore, the solution to Eq. (77) is a linear combination of weighted harmonic oscillations of different frequencies. The interpretation of this solution is similar to those of the case of two agents: it describes a conversation between n agents. Indeed, the information from the ith agent is distributed among all the rest agents, then reflected and distributed again, etc. Obviously, that solution possesses the property of life in the sense of the inequalities (75), Figure 16. 94 Michail Zak

Non-communicating agents: Communicating agents: From order to disorder From order to disorder to order, etc.

Figure 16. Complexity and survivability.

C. Image Dynamics: What Do You Think I Think You Think…

In this sub-section we will discuss a special case when additional information is needed to compensate incomplete information of agents about each other. We will start with the simplest model of two interacting agents assuming that each agent is represented by an inertionless classical point evolving in physical space. That allows us to consider positions x (instead of velocities v) as state variables. We will also assume that the next future position of each agent depends only upon its own present position and the present position of his opponent. Then their evolutionary model can be represented by the following system of differential equations

(78)

(79)

We will start with the assumption that these agents belong to the same class, and therefore, they know the structure of the whole system (78), (79). However, each of the agents may not know the initial condition of the other one, and therefore, he cannot calculate the current value of his opponent’s state variable. As a result of that, the agents try to reconstruct these values using the images of their opponents This process can be associated with the concept of reflection; in psychology reflection is defined as the ability of a person to create a self-nonself images and interact with them. Let us turn first to the agent 1. In his view, the system (78), (79) looks as following

(80)

(81)

Physics of Life from First Principles 95

where x11 is the self-image of the agent 1, x21 is the agent’s 1 image of the agent 2, and x121 is the agent’s 1 image of the agent’s 2 image of the agent 1. This system is not closed since it includes an additional 3-index variable x121. In order to find the corresponding equation for this variable, one has to rewrite Eqs.(30),(31) in the 3-index form. However, it is easily verifiable that such form will include 4-index variables, etc., i.e. this chain of equations will never be closed. By interchanging the indices 1 and 2 in Eqs.(78) and (79), one arrives at the system describing the view of the agent 2. The situation can be generalized from two- to n – dimensional systems. It is easy to calculate that the total number of equations for the m-th level of reflection, i.e. for the m-index variables, is

(82)

The chain of reflections illustrating the paradigm: “what do you think I think you think…” is presented in Figure 17.

Figure 17. Chain of reflections.

Thus, as follows from Eq. (82), the number of equations grows exponentially with the number of the levels of reflections, and it grows linearly with the dimensionality n of the original system. It should be noticed that for each m-th level of reflection, the corresponding system of equations always includes (m+1)-index variables, and therefore, it is always open. Hence, for any quantitative results, this system must be supplemented by a closure, i.e. by additional equations with respect to extra-variables. In order to illustrate how it can be done, let us first reduce Eqs.(78) and (79) to the linear form

(83)

(84)

96 Michail Zak

Taking a position of the agent 1, we can rewrite Eqs.(83) and (84) in the form of a chain of reflections describing interactions between his images:

(85)

Here the agent 1 starts with Eq. (83) and transforms it into the first equation in the system (85) by replacing the unknown agent’s 2 state variable x2 with the predicted value x12 that describes the agent 2 in view of the agent 1. In order to get the governing equation for the new variable x2, the agent 1 transforms Eq. (84) into the second equation in the system (85) by replacing x2 with x12 and x1 with x121 that describes the agent 1 view on the agent 2 view on the agent 1. This process is endless: each new equation includes a new variable that requires another governing equation, etc. Hence, for any quantitative results, this system must be supplemented by a closure, i.e. by additional equations with respect to extra-variables. However, before discussing the closure, we will simplify the indexing of the variables in Eqs. (85) in the following way:

(86)

Obviously, the variable yi describes prediction after i number of reflections. Here we assumed that the agent 1 has complete information about himself, i.e. x11 = x1. Therefore, in our future discussions, any repeated indices (following one another) will be compressed into one index. In the new notations, Eqs. (85) can be presented in a more compact form

(87)

It should be noticed that a simple two-dimensional system (83), (84) gave rise to an n- dimensional system of reflection chain (87). Although the system (87) is also linear, the Physics of Life from First Principles 97 structure of its coefficients is different; for instance, if the matrix of coefficients in Eqs. (83), (84) is symmetric, i.e. a12 a21 , the matrix of the coefficients in Eqs. (87) is not symmetric, and therefore, the corresponding properties of symmetric systems, such as real characteristic roots, are not preserved. For the purpose of closure, it is reasonable to assume that after certain level of reflection, the image does not have significant change; for instance

(88)

This equation complements the system (87). So far, we were dealing with a physical (non-living) system. Let us apply the same strategy to mental dynamics (73) (74) presenting this system in the form of interaction of the first agent with its own images

(89)

…………………………………..etc or, adopting the notations (86)

(90)

The system (90) is closed, and the matrix of its coefficients has the following characteristic roots

(91)

Hence, the general solution of Eqs. (90) has the following structure

98 Michail Zak

(92)

The arbitrary constants C1, C2…Cn are supposed to be found from initial conditions, and actually they represent the degree of incompleteness of information that distinguishes the images from reality. These constants can be specified if actual values of y2 are known at least at n different instants of time to be compared with the corresponding images of y12. Obviously, the solution (92) satisfies the condition (64), since, due to the presence of harmonic oscillations, there are periods when

(93)

By interchanging indices 1 and 2 in Eqs. (89)- (93), one arrives at dynamics of interaction of the agent 2 with its images. It is worth emphasizing that in this section we are dealing with complex mental dynamics created by a single agent that communicates with its own images, images of images, etc. Thus, we came to the following conclusion: the survivability of Livings, i.e. their ability to preserve the inequality (2), is proportional to reflection-based complexity of mental dynamics that is, to complexity of information potential.

D. Chain of Abstractions

In view of importance of mental complexity for survival of Livings, we will take a closer look into cognitive aspects of information potential. It should be recalled that classical methods of information processing are effective in a deterministic and repetative world, but faced with the uncertainties and unpredictabilities, they fail. At the same time, many natural and social phenomena exhibit some degree of regularity only on a higher level of abstraction, i.e.in terms of some invariants. Indeed, it is easier to predict the state of the solar system in a billion years ahead than to predict a price of a stock of a single company tomorrow. In this sub-section we will discuss a new type of attractors and associated with them a new chain of abstraction that is provided by complexity of mental dynamics.

α. Attractors in Motor Dynamics

We will start with neural nets that, in our terminology, represent motor dynamics without mental dynamics. The standard form of classical neural nets is is given by Eqs. (59). Due to the sigmoid function tanh xi , the system (59) is nonlinear and dissipative, and these two conditions are necessary for existence of several attractors that can be static, periodic, or chaotic; their type, location, and basins depend upon the choice of the synaptic weights. For instance, if these weights are symmetric

(94)

Physics of Life from First Principles 99 then the solution of Eqs. (59) can have only static attractors. As illustrated in Figure 18, static attractors perform generalization: they draw a general rule via abstraction, i.e. via removal of insignificant details.

Figure 18. Two levels of abstruction.

β. Attractors in Mental Dynamics

Significant expansion of the concept of an attractor as well as associated with it generalization via abstraction is provided by mental dynamics. We will start with mental neural nets based upon mirror neurons being discussed in Section 5. First, we will reformulate the motor dynamics (49) by changing notations of the state variables from v to x to be consistent with the notations of this section

(95)

Here, for further convenience, we have introduced new compressed notations

(96)

The corresponding mental dynamics in the new notations follows from Eq. (50)

(97) 100 Michail Zak

In the same way, the mental (mirror) neural nets can be obtained from Eqs. (54)

(98) where the state variables xii represent variances of . Obviously, the discussion of the performance of the neural net (54) can be repeated for the performance of the neural net (98).

E. Hierarchy of Higher Mental Abstractions

Following the same pattern as those discussed in the previous sub-section, and keeping the same notations, one can introduce the next generation of mental neural nets starting with the motor dynamics

(99)

Here, in addition to the original random state variables xi , new random variables are included into the structure of information potential. They represent invariants (variances) of the original variables that are assumed to be random too, while their randomness is described by the secondary joint probability density (x11,...xnn ) . The corresponding Fokker-Planck equation governing the mental part of the neural net is

(100)

Then, following the same pattern as in Eqs. (95), (97), and (98), one obtains

(101)

(102)

(103)

Here Eqs. (101) and (103) describe dynamics of the variances and variances of variances xiiii respectively, while Eq. (102) governs the evolution of the secondary joint Physics of Life from First Principles 101 probability density (x11,..xnn ) . As follows from Eqs. (99)- (103), the only variables that have attractors are the variances of variances; these attractors are controlled by Eq. (103) that has the same structure as Eq. (98). The stationary values of these variables do not depend upon the initial conditions: they depend only upon the basins where the initial conditions belong, and that specifies a particular attractor out of the whole set of possible attractors. On the contrary, no other variables have attractors, and their values depend upon the initial conditions. Thus, the attractors have broad membership in terms of the variables xiiii , and that represents a high level of generalization. At the same time, such “details” as values of xi and xii at the attractors are not defined being omitted as insignificant, and that represent a high level of abstraction. It should be noticed that the chain of abstractions was built upon only principal variances, while co-variances were not included. There are no obstacles to such an inclusion; however, the conditions for preserving the positivity of the tensors xij and xijkq are too cumbersome while they do not bring any significant novelty into cognitive aspects of the problem other than increase of the number of attractors, (see Eqs. (42), (43), and (45)). It is interesting to note that Eqs.(101) and (102) have the same structure as Eqs. (11) and

(12), and therefore, the velocities of variances xii are entangled in the same way as the accelerations v are, (see Eq. (23) ). That means that the chain of abstractions considered above, gives rise to an associated chain of entanglements of the variables xii , xiiii ,...etc.

F. Abstraction and Survivability

In this sub-section, we will demonstrate that each new level of abstraction in mental dynamics increases survivability of the system in terms of its capability to increase order (see the condition (64)) regardless of action of random force. For easier physical interpretation, we will investigate a one-dimensional linear case by starting with Eq. (66). This equation describes an agent that interacts with its own image in the simple linear form. As shown in the previous section, such an agent does not expose the property of living systems (64) and behaves as a physical non-living particle. Let us introduce now the second level of mental complexity when the same agent interacts with its image and the image of this image. The dynamical model describing such an agent follows from a one-dimensional version of Eqs. (99)- (103) in which the neural net structure is replaced by a linear term and to which noise of the strength q2 is added

(104)

(105)

102 Michail Zak

(106)

(107)

(108)

In order to prove our point, we have to show that there exists such a set of initial conditions that provides the inequality (64)

(109) under the action of random force of a strength q2. Let us concentrate on Eqs. (106) and (108) and choose the following initial conditions

(110) where (111)

Now as follows from Eqs. (106) and (108)

(112)

Therefore

(113)

Thus, an additional layer of mental complexity that allows an agent to interact not only with its image, but also with the image of that image, makes the agent capable to increase order under action of a random force, i.e. increase its survivability.

G. Activation of New Levels of Abstractions

A slight modification of the model of motor-mental dynamics discussed above leads to a new phenomenon: the capability to activate new levels of abstraction needed to preserve the Physics of Life from First Principles 103 inequality (64). The activation is triggered by the growth of variance caused by applied random force. In order to demonstrate this, let us turn to Eq. (33) and rewrite it in the following form

(114)

Then Eq. (36), and (37) are modified to

(115)

(116) respectively. Here is a new variable defined by the following differential equation

(117)

One can verify that Eq. (117) implements the following logic:

(118)

Indeed, Eq. (117) has two static attractors: 1 and 0 ; when D 0 , the first attractor is stable; when D 0 , it becomes unstable, and the solution switches to the second one that becomes stable. The transition time is finite since the Lipchitz condition at the attractors does not hold, and therefore, the attractors are terminal, (Zak, M., 2005a). Hence, when there is no random force applied, i.e. q=0, the first level of abstraction does not need to be activated, since then D 0 , and therefore. is zero. However, when random force is applied, i.e. q 0, the variance D starts growing, i.e. D 0. Then the first level of abstraction becomes activated, switches to 1, and, according to Eq. (116), the growth of dispersion is eliminated. If the first level of abstraction is not sufficient, the next levels of abstractions considered in the previous sub-sections, can be activated in a similar way.

H. Summary

A connection between survivability of Livings and complexity of their behavior is established. New physical paradigms – exchange of information via reflections, and chain of abstractions- explaining and describing progressive evolution of complexity in living (active) systems are introduced. A biological origin of these paradigms is associated with a recently 104 Michail Zak discovered mirror neuron that is able to learn by imitation. As a result, an active element possesses the self-nonself images and interacts with them creating the world of mental dynamics. Three fundamental types of complexity of mental dynamics that contribute to survivability are identified.

7. Intelligence in Livings

A. Definition and General Remarks

The concept of intelligence has many different definitions depending upon the context in which it is used. In this paper we will link intelligence to the proposed model of Livings, namely, associating it with a capability of Livings to increase their survivability via successful interaction with their collective mind, i.e. with self- image and images of others. For an illustration of this definition, one can turn to the sub-section g of the section 5, where the preservation of survival (in terms of inequality (64)) is achieved due to activation of new level of abstraction, (see Eqs. (114)- (118)).

B. Intelligent Control in Livings

In this sub-section we will discuss the ability of Livings to controls their own behavior using information force as an actuator. These dynamical paradigm links to reflective control being performed by living systems for changing their behavior by “internal effort” with the purpose to increase survivability. In this sub-section we will modify the model of Livings’ formalism subject to intelligent control. We will start with the linear version of the Langevin equations known as the Ornstein-Uhlenbeck system

(119) subject to initial conditions presented in the form

(120)

(121)

(122)

Our attention will be concentrated on control of uncertainties represented by non-sharp initial conditions, Langevin forces Li(t) and errors in values of parameters ai. All these Physics of Life from First Principles 105 uncertainties will be measured by expected deviations of the state variables from their mean values (i.e. variances)

(123)

Now we will introduce the following information-based control

(124)

The associated Liouville-Fokker-Planck equation describing evolution of the probability density ρ reads

(125) while

(126)

Multiplying Eq.(125) by Vi , then using partial integration, one obtains for expectations

(127)

Similarly one obtains for variances

(128)

Thus, as follows from Eqs.(125) and (128), the control force ij does not affect the expected values of the state variables : it affects only the variances. Before formulating the control strategy, we will discuss the solution to Eq.(125). In the simplest case when αij= const., the solution describing the transition from initial density (126) to current density is given by the following equation (compare with Eq. (60))

(129)

106 Michail Zak

Here the Green function in matrix notation is expressed by Eq. (61). Substituting the solution (129) into Eq.(124), one obtains the Langevin equation in the form

(130)

If the control forces αij are not constants, for instance, if they depend upon variances

(131) the solution (129) as well as the simplified form (130) of the original Langevin equation are valid only for small times. Nevertheless, for better physical interpretation, we will stay with this approximation in our future discussions. First we have to justify the nonlinear form for the control forces (131). For that purpose, let us turn to Eq. (128). As follows from this equation, the variances Dij are not “protected” from crossing zeros and becoming negative, and that would not have any physical meaning. To be more precise, the non-negativity of all the variances, regardless of possible coordinate transformations, can be guaranteed only by non-negativity of the matrix |D| that, in turn, requires non-negativity of all its left-corner determinants

(132)

In terms of the Fokker-Planck equation (125), negative variances would lead to negative diffusion, and that is associated with ill-posedness of initial-value problems for parabolic PDE. Mathematical aspects of that phenomenon are discussed in (Zak, M.,2005a). In order to enforce the inequalities (132), we will specify the control forces (131) as linear functions of variances with one nonlinear term as terminal attractor

(133)

Then, as follows from Eq.(128), for small variances

(134)

But the strength-of-noise matrix qij is always non-negative, and therefore, the constraints (132) are satisfied. The role of the terminal attractor will be discussed below. Now the original dynamical system (124) with intelligent control (133) reads

(135)

Physics of Life from First Principles 107 or, applying the approximation (129)

(136) where αij is expressed by Eq. (133). Eqs. (135) and (136) must be complemented by the additional controller that implements the dynamics of variances

(137)

As follows from Eq. (137), the dynamics of the controller depends only upon the design parameters {a}, but not on the state variables {v} of the underlying dynamical system, and therefore, the control strategy can be developed in advance. This strategy may include classical lead-lag compensation for optimization of transient response or large steady-state errors, while the role of the classical compensation system is played by the controlling dynamics (137) in which the design parameters {b} can be modified for appropriate change of root locus, noise suppression, etc. Figure 19. However, in contradistinction to classical control, here the compensation system is composed of the statistical invariants produced by the underlying dynamical system “itself” via the corresponding Liouville or Fokker-Planck equations.

Figure 19. Block-diagram of intelligent control.

Let us now analyze the effect of terminal attractor and, turning to Eq.(133), start with the  matrix [ Dij / Dlk ]. Its diagonal elements become infinitely negative when the variances vanish 108 Michail Zak

(138)

 while the rest elements Dij / Dij at ij lk are bounded, (compare with Eqs. (54) and (55)). Therefore, due to the terminal attractor, the controller (53) linearized with respect to zero variances has infinitely negative characteristic roots, i.e. it is infinitely stable regardless of the parameters {a} of the original dynamical system as well as of the chosen parameters {b} of the controller. This effect has been exploited in (Zak, M.,2006a) for suppression of chaos. Another effect of the terminal attractor is in minimization of residual noise. A simplified example of such a minimization has been discussed in the Section 2, (see Eqs. (33-37)) where, for the purpose of illustration, it was assumed that the noise strength q in known in advance and that allowed us to eliminate noise completely. In reality, the noise strength is not known exactly, and it can only be minimized using the proposed controller. Let  us turn to Eq.(137) at equilibrium, i.e. when Dij 0 , and find the ratio of residual noise and variance

(139)

As follows from Eq. (137), without the terminal attractor, i.e. when cij = 0, this ratio is finite; but with the terminal attractor it is unbounded. This means that the terminal attractor almost completely suppresses noise, (see Figure 20).

Figure 20. Effect of terminal attractor.

It should be recalled that a non-living system may possess the self-image, but it is not equipped with the self-awareness, and therefore, this self-image is not in use. On the contrary, in living systems the self-awareness is represented by the expectation-based forces that send information from the self-image to the motor dynamics, and these forces can be exploited by Livings as actuators implementing control. Due to this property that is well-pronounced in the proposed model, an intelligent agent can run its mental dynamics ahead of real time, (since Physics of Life from First Principles 109 the mental dynamics is fully deterministic, and it does not depend explicitly upon the motor dynamics) and thereby, it can predict future expected values of its state variables; then, by interacting with the self-image via the supervising forces, it can change the expectations if they are not consistent with the objective. Such a self-supervised, or intelligent control provides a major advantage for the corresponding intelligent agents, and especially, for biological species: due to the ability to predict future, they are better equipped for dealing with uncertainties, and that improves their survivability.

C. Modeling Common Sense

α. General Remarks

A human common sense has always been a mystery for physicists, and an obstacle for artificial intelligence. It was well understood that human behavior, and in particular, the decision making process, is governed by feedbacks from the external world, and this part of the problem was successfully simulated in the most sophisticated way by control systems. However, in addition to that, when the external world does not provide sufficient information, a human turns for “advise” to his experience and intuition, and that is associated with a common sense. The simplest representation of human experience is by if-then rules. However, in real world situations, the number of rules grows exponentially with the dimensionality of external factors, and implementation of this strategy is hardly realistic. One of the ways to fight such a combinatorial explosion is to represent rules in a more abstract and more generalized form by removing insignificant details and making the rules more inclusive. This procedure is usually accompanied by coercion, i.e. by reducing uncertainties about the world via forcing it into a known state regardless of the initial state. Indeed, many natural and social phenomena exhibit some degree of regularity only on a higher level of abstraction, i.e. in terms of some invariants. Within the mathematical formalism of the model of mental dynamics, this means that only variables of the highest level of abstraction are capable to classify changes in the external world and send them to the corresponding attractor. Based upon that, the model reacts to these changes, and that can be associated with common-sense- based decision. Collaboration, competition and games are examples of those types of human activities that require common sense for prediction of intentions of a collaborator or a competitor in order to make a right decision when objective information for a rational decision in incomplete.

β. The Model

In this paper, by common sense we will understand a feedback from the self-image (a concept adapted from psychology), and based upon that, we will propose a physical model of common sense in connection with the decision making process. This model grew out of the model of Livings proposed and discussed in the previous sections. Previously similar model in quantum implementation has been introduced and discussed in (Zak, M., 2000b) We will start with the one-dimensional case based upon the augmented version of Eq. (11)

110 Michail Zak

(140)

(141) where

(142)

Eq. (142) has two equilibrium points: y = 0 and y = 1. At both these points, the Lipschitz condition does not hold since

(143) and therefore

(144)

Hence, y = 1 is a terminal attractor, and y = 0 is a terminal repeller, (Zak,M.,1989). Regardless of these “abnormality”, the closed form solution is easily obtained by separation of variables

(145)

However, this “abnormality” becomes crucial for providing a finite time T of transition from the repeller y = 0 to the attractor y = 1

(146)

It should be recalled that in classical theory of ODE, when the Lipschitz condition is preserved, an attractor is always approached asymptotically, and the transition period is theoretically unbounded. Hence, this limitation is removed due to a special form of the governing equation (142). Qualitatively, the result (146) holds even if a = a (t). Then the solution to Eq.(142) is Physics of Life from First Principles 111

(147)

Similar results can be obtained for a < 0: then the attractor becomes a repeller and wise versa. When the function a (t) changes its sign sequentially, the solution switches from 0 to 1 and back, respectively, while the transition time is always finite. Selecting, for instance,

(148) one arrives at periodical switches (with the frequency ω) from a decision mode to a passive (pre-decision) mode and back, since, in the context of this sub-section, the state y=0 and y=1 can be identified with the passive mode and the decision mode, respectively. As shown in Figure 21, after each switch to the decision mode, the system may select different solutions from the same family (15), i.e. a different decision, so that the entire solution will include jumps from one branch of the family (15) to another. In order to preserve the autonomy of the dynamical model (141), (142), one can present Eq. (148) as a limit-cycle-solution to the following autonomous non-linear ODE

(149)

Figure 21. Switching between pre-decision and decision modes.

The periodic solution (148) to this oscillator is usually associated with modeling brain rhythms, while the frequency ω is found to be a monotonic function of a stimulus intensity. The model we propose for common sense simulation is based upon augmented n- dimensional version of Eqs.(140) and (141)

(150) 112 Michail Zak

(151)

Here

(152)

where vi is expected value of vi , y is defined by Eqs. (142) and (149), while the functions f and β provide non-zero mean and non-zero drift of the underlying stochastic process. The solution to Eq. (151) has the same fundamental properties as the solution to its simplified one-dimensional version (12). Indeed, starting with the initial conditions

(153) and looking first for the solution within an infinitesimal initial time interval , one reduces Eq. (151) to the form

(154) since

(155) and therefore, all the drift terms can be ignored within the interval . The solution to Eq. (154) is a multi-dimensional version of Eq. (13). Substitution of this solution to Eqs. (150) yields

(156)

since the functions f i are considered to be bounded. The solution to Eqs. (156) is of the form (13)

(157) and this can be illustrated by the same Figure4 with the only difference that here the solution is valid only within an infinitesimal initial time interval . But the most important phenomenon- phase transition- occurs exactly in this interval: due to violation of the Lipchitz Physics of Life from First Principles 113 condition, the solution splits into ensemble of different samples, while the probability of the solution to represent certain sample is controlled by the density following from Eq. (154). Thus, the solution to Eqs. (150) and (151) within the infinitesimal initial interval captures instability-based randomization effect.

γ. Decision Making Process

Decision making is the cognitive process of selecting a course of action from among multiple alternatives. We will distinguish two types of decision making processes. The first type is associated with the concept of a rational agent; the models of this type are largely quantitative and are based on the assumptions of rationality and near perfect knowledge. They are composed of agent’s believe on the basis of evidence followed by construction and maximization of utility function. The main limitation of these models is in their exponential complexity: on the level of believe nets, the exponential complexity is caused by the fact that encoding a joint probability as a function of n propositional variables requires a table with n 2 entries, (Perl,J.,1986); the same rate complexity occurs in rule-based decision trees. The second type of a decision making process is based upon psychological models; these models concentrate on psychological and cognitive aspects such as motivation, need reduction, and common sense. They are qualitative rather than quantitative and build on sociological factors like cultural influences, personal experience, etc. The model proposed in this paper is better structured for the second type of decision making process that exploits interaction between motor and mental dynamics in which the mental dynamic plays the role of a knowledge base replacing unavailable external information. In order to demonstrate the proposed approach to decision making, we will augment the model (150), (151) by two additional devices. The first one, a sensor device, measures the difference between the actual and expected values of the state variables

(158)

The second one, the logical device, is based upon Eq. (142) with

(159) where A>0 is a sensitivity threshold specified for the decision making process. We will start the analysis of the model performance from its pre-decision mode when 0, and therefore, y=0 and

(160)

(161) 114 Michail Zak

At this mode, the motor dynamics (150), and the mental dynamics (151) are not coupled. The performance starts with mental dynamics that runs ahead of actual time, calculates expected values of the state variables as well as the difference at the end of a selected lead time T, (see Eq. (158), and sends this information to the sensor device (159). As soon as becomes large, it changes the sign of the control parameter a (see Eqs. (142) and (159)), the value of y changes from zero to one, and the system (160), (161) switches to the decision mode taking the form (150), (151). After this switch, a deterministic trajectory defined by the solution to Eq. (160) splits into a family of the solution to the coupled system (150), (151) (see Eq. (157), and Figure 21). But, as discussed above, the system selects only one trajectory of this family at random with the probability defined by the information forces i ( ) and i ( ) . Thus, the topology of the proposed model demonstrates how a decision is made, while the structure of the information forces defines what decision is made. This structure will be discussed in the following sub-section.

δ. Decision via Choice of Attractors

Let us return to the system (150), (151) in the decision mode

(162)

(163)

and assume that ii and i in Eq. (152) depend only upon two first invariants of the probability density ρ, namely, upon the means vi and principle variances Dii

(164)

As follows from Eqs. (150) and (151), the means and the variances must satisfy the following ODE, (Compare with Eqs. (127) and (128))

(165)

(166)

Physics of Life from First Principles 115

Thus, although the state variables vi are statistically independent, i.e.

Dij 0 if i j, the time evolution of their statistical invariants is coupled via Eqs. (165) and (166). It should be noticed that Eqs. (165) and (166) can be considered as an ODE-based approximation to the mental dynamics (163). Next we will introduce the following structure into the functions (164)

(167)

(168)

where ij , ij ,wij ,vij are constants, and rewrite the system (165), (166) as follows

(169)

(170)

We will start our analysis with Eqs. (169). First of all, in that particular setting, these equations do not depend upon Eqs. (170). Moreover, each equation in the system (169) is coupled to other equations of the same systems only via the signs of the sums wij j that th include contribution from all the equations. Indeed, if wij j 0 , the i equation has the terminal attractors,

(171) and terminal repellers

(172) where m = n , k = n-1 if n is even, and m = n-1, k =n if n is odd. 2 2 The n / 2 attractors (171) can be pre-stored using n weights wij . These attractors are the limit values of the means of the stochastic process that occurs as a result of switch from 116 Michail Zak the pre-decision to the decision making mode, and these attractors are approached in finite time, (see eq. (146). Turning to Eqs. (170) and noticing that they have the same structure as Eqs. (169), one concludes that the principle variances Dii of the same stochastic process approach their limiting values

(173)

that can be pre-stored using weightsuij . Thus, when the system switches to the decision making mode, it chooses a sample of the corresponding stochastic process, and this sample represents the decision. The choice of the sample is controlled by the normal probability density with means and principle variances approaching their limit values (171) and (173). The choice of a particular attractor out of the set of (171) and (173) depends upon the initial values of means and variances at the end of the pre-decision period: if these values fall into the basin of attraction of a certain attractor, they will eventually approach that attractor, Figure 22.

Figure 22. Basins of attraction separated by repellers.

ε. Decision via Phase Transition

In the previous sub-section, we consider the case when pre- and post-decision states are not totally disconnected: they belong to the same basin of attraction. However, in general, decision may require fundamental change of the dynamical structure, and that can be achieved via phase transition. The distinguishing characteristic of a phase transition is an abrupt sudden change in one or more physical properties of a dynamical system. In physics, a phase transition is the transformation of a thermodynamic system from one phase to another (evaporation, boiling, melting, freezing, sublimation, etc). In engineering, such transitions occur because of change of system configuration (new load, new open valve, new switch or a logic device in control systems with variable structure, etc.). In living systems, dynamics includes the concept of “discrete events”, i.e. special critical states that give rise to branching solutions, or transitions from deterministic to random states and vice versa (for instant, life- non-life transitions). First mathematical indication of a phase transition is discontinuities in Physics of Life from First Principles 117 some of the state variables as functions of time. In classical approach, these discontinuities are incorporated through a set of inequalities imposing unilateral constraints upon the state variables and the corresponding parameters of the governing differential equations. The most severe changes in dynamics are caused by vanishing coefficients at the highest space or time derivatives. The best example to that is the transition from the Navier-Stokes to the Euler equations in where due to vanishing viscosity, the tangential velocity on the rigid boundary jumps from zero (the non-slip condition) to a final value (the slip condition). In this sub-section, we will modify the proposed model by introducing phase transitions that are accompanied by finite jumps of the state variables. Let us modify Eq. (150), in the following way

(174)

where yi is defined as following

(175)

and assume that n becomes large; that changes the sign of the control parameter an , the value of yn changes from zero to one, and the system (160), (161) switches to the decision mode in the form

(176)

(177)

(178)

Since Eq. (177) is stationary, one of the state variables, for instance, xn can be expressed via the rest of the variables. Hence, formally the decision making mode is described by Eqs, (162)-(170), but the dimensionality of the system drops from n to n-1; obviously, the functions fi (v1,...vn 1 ), i (D11,...Dn 1,n 1 ), i (vi ,...vn 1 ),become different from their n-dimensional origins, and therefore, the location of the attractors (171) and (173) becomes different as well , and this difference may be significant. The number of decision making 118 Michail Zak modes associated with phase transitions is equal to the number of possible sub-spaces, and it grows exponentially with a linear growth of the original dimensionality of the model.

D. Emergent Intelligence

In many cases, in order to simulate the ability of living systems to make decisions, it is more convenient to introduced Boolean model of motion, or Boolean net. Boolean net consists of N vertices that are characterized by Boolean variables taking values 0 or 1. Each vertex receives input from other vertices and updates its value according to a certain rule. Such a model can be associated via a master-slave relationship with the continuous behavior of the model of Livings discussed above. Indeed, let us turn to Eqs. (56),(57) and introduce the following Boolean functions:

(179)

(180)

The conditions (179), (180) extract topological invariants of the solution to Eqs. (56), (57) disregarding all the metrical properties. In terms of logic, Yi can be identified with the statements variable; then P describes the Boolean truth functions as a set of compound statements in these variables, and it can be represented in disjunctive normal form as well as in the form of the corresponding logic circuit. It should be noticed that changes in statements as well as in the structure of the truth functions are driven by the solution to the original model (56), (57). Therefore, for non-living systems when there is no feedback information potential Π is applied, in accordance with the second law of thermodynamics, the truth function P (as well as its master counterpart ρ) will eventually approach zero, i.e. the lowest level of logical complexity. On the contrary, as demonstrated above, living systems can depart from any initial distribution toward decrease of entropy; it means that the corresponding truth function P can depart from zero toward the higher logical complexity without external interactions. The logics constraints Eqs.(179), (180) can be represented by terminal dynamics (Zak,M., 1992) via relaxing the Lipschitz conditions:

(181)

(182) where

Physics of Life from First Principles 119

Indeed, Eq. (181) has two static attractors: Y = 0 and Y = 1; when v>0, the first attractor is stable; when v <0, it becomes unstable, and the solution switches to the second one that becomes stable. The transition time is finite since the Lipschitz condition at the attractors does not hold, and therefore, the attractors are terminal. The same is true for Eq. (182). In both cases the transition time can be controlled by the constants k and m to guarantee that the signs of v(t) and ρ(t) do not change during the transition period. Thus, any dynamical process described by the system (56), (57) can be “translated” into a temporal sequence of logical statements via Boolean dynamics (181), (182) while the complexity of the truth function can increase spontaneously, and that represents emergent intelligence. One can introduce two (or several) systems of the type (56), (57) with respect to variables vi and v*i which are coupled only via Boolean dynamics, for instance

(183)

In this case, the dynamical systems will interact through a logic-based “conversation”. It should be noticed that, in general, decisions made by a living system may affect its behavior, i.e. the Boolean functions Y and P can enter Eqs. (56), (57) as feedback-forces. That would couple Eqs. (56), (57) and Eqs. (181), (182) thereby introducing the next level of complexity into the living system behavior. However, that case deserves a special analysis, here we will confine ourselves only by a trivial example. For that purpose consider the simplest case of Eqs. (56), (57), (181), and (182) and couple these equations as following

(184)

(185) where

(186)

(187)

(188)

Here τ is a constant time delay. For the initial conditions

v 0 at t 0, (t 0), Y(0) 0, P(0) 1, the solution is approximated by the following expressions 120 Michail Zak

(189)

During the initial period v(t ) 1: 1, Y 0, P 1. After that, the solution switches to: β = 1, Y=1, and P=0. According to the new solution, x(t) starts decreasing, and ρ(t) starts increasing. When v(t-τ)<1, and ρ(t)>1/2, the solution switches back to the first pattern etc. Thus, the solution is represented by periodical switches from the first pattern to the second one and back, Figure 23. From the logical viewpoint, the dynamics simulates the operation of negation: P(0) =1, P(1) =0.

Figure 23. Emergent intelligence.

E. Summary

Intelligence in Livings is linked to the proposed model by associating it with a capability of Livings to increase their survivability via successful interaction with their collective mind, i.e. with self- image and images of others. The concept is illustrated by intelligent control, common-sense-based decision making, and Boolean-net-based emergent intelligence.

8. Data-Driven Model Discovery

The models of Livings introduced and discussed above are based upon extension of the First Principles of Newtonian mechanics that define the main dynamical topology of these models. However, the First Principles are insufficient for detecting specific properties of a particular living system represented by the parameters wij as well as by the universal constant ζ: these parameters must be found from experiments. Hence, we arrive at the following inverse problem: given experimental data describing the performance of a living system discover the underlying dynamical model, i.e. find its parameters. The model will be sought in the form of Eqs. (42) and (45)

(190) Physics of Life from First Principles 121

(191)

where αij are function of the correlation moments Dks

(192)

With reference to Eq. (192), Eq. (191) can be replaced by its simplified version (compare to Eq. (51))

(193)

Then the inverse problem is reduced to finding the best-fit-weights wijks , cij and . It should be noticed that all the sought parameters enter Eq. (193) linearly. We will assume that the experimental data are available in the form of time series for the state variables in the form

(194)

Here each function at a fixed Ci describes a sample of the stochastic process associated with the variable vi , while the family of these curves at Ci 1,2,...mi , approximates the th whole i ensemble, (see Figures 4 and 9). Omitting details of extracting the correlation moments

(195) from the functions (194), we assume that these moments as well as their time derivatives are  reconstructed in the form of time series. Then, substituting Dij , and Dij into Eq. (193) for times t1 ,...tq , one arrives at a linear system of algebraic equation with respect to the constant parameters and that, for compression, can be denoted and enumerated as Wi

122 Michail Zak

(196) where m is the number of the parameters defining the model, and

(197) are the coefficients at the parameters Wi and the free term, respectively. Introducing values of

Ai and B at the points t t j , j 1,...q, and denoting them as Aij, and Bj , one obtains a n linear system of 2 q algebraic equations

(198) with respect to m unknown parameters. It is reasonable to assume that

(199) so the system becomes overdetermined. The best-fit solution is found via pseudo-inverse of the matrix

(200)

Here

(201)

As soon as the parameters W are found, the model is fully reconstructed.

Remark

It is assumed that the correlation moments found from the experiment must automatically satisfy the inequalities (47) (by the definition of variances), and therefore, the enforcement of these inequalities is not needed. Physics of Life from First Principles 123

9. Discussion and Conclusion

A. General Remarks

We will start this section with the discussion of the test-question posed in the Introduction: Suppose that we are observing trajectories of several particles: some or them physical (for instance, performing a Brownian motion), and others are biological (for instance, bacteria), Figure 1. Is it possible, based only upon the kinematics of the observed trajectories, to find out which particle is alive? Now we are in a better position to answer this question. First of all, represent the observed trajectories in the form of time series. Then, using the proposed methodology for data-driven model reconstruction introduced in Section 7 find the mental dynamics for both particles in the form (193). Finally, calculate the entropy evolution

(202)

Now the sufficient condition for the particle to be “alive” is the inequality

(203) that holds during, at least, some time interval, Figure 3. (Although this condition is sufficient, it is not necessary since even a living particle may choose not to exercise its privilege to decrease disorder). It should be noticed that the condition (203) is applicable not only to natural, but to artificial living systems as well. Indeed, the system (190)-(191) can be simulated by analog devices (such as VLSI chips, (Mead,C,1989)) or quantum neural nets, (Zak, M.,1999)) that will capture the property (203), and that justifies the phenomenological approach to the modeling of living systems. The condition (203) needs to be clarified from the viewpoint of the second law of thermodynamics. Formally it contradicts this law; however, this contradiction is apparent. Indeed, we are dealing here with an idealized phenomenological model which does not include such bio-chemical processes as metabolism, breezing, food consumption, etc. Therefore, the concept of an open or an isolated system becomes a subject to interpretation: even if the phenomenological model of a living system is isolated, the underlying living system is open. From biological viewpoint, the existence of the information potential that couples mental and motor dynamics is based upon the assumption that a living system possesses self-image and self-awareness. (Indeed, even such a primitive living system as a virus can discriminate the self from non-selves). The concepts of self-image and self-awareness can be linked to a recently discovered “mirror” properties of neurons according to which a neuron representing an agent A can be activated by observing action of an agent B that may not be in a direct contact with the agent A at all. Due to these privileged properties, living systems are better equipped for dealing with future uncertainties since their present motion is “correlated with future” in terms of the probability invariants. Such a remarkable property that increases 124 Michail Zak survivability could be acquired accidentally and then be strengthening in the process of natural selection. The ability of living systems to decrease their entropy by internal effort allows one to make connection between survivability and complexity and to introduce a model of emergent intelligence that is represented by intelligent control and common sense decision making. In this connection, it is interesting to pose the following problem. What is a more effective way for Livings to promote Life: through a simple multiplication, i.e. through increase of the number of “primitives” n, or through individual self-perfection, i.e. through increase of the number m of the levels of reflections (“What do you think I think you think…”)? The solution to this problem may have fundamental social, economical and geo-political interpretations. But the answer immediately follows from Eq. (82) demonstrating that the complexity grows exponentially with the number of the levels of reflections m, but it grows only linearly with the dimensionality n of the original system. Thus, in contradistinction to Darwinism, a more effective way for Livings to promote Life is through higher individual complexity (due to mutually beneficial interactions) rather than trough a simple multiplication of “primitives”. This statement can be associated with recent consensus among biologists that the symbiosis, or collaboration of Livings, is even more powerful factor in their progressive evolution than a natural selection. Before summarizing the results of our discussion, we would like to bring up the concept of time perception. Time perception by a living system is not necessarily objective: it can differ from actual physical time depending upon the total life time of a particular living system, as well as upon the complexity and novelty of an observed event. In general, one can approximate the subjective time as

(204)

In the simplest case, Eq. (204) can be reduced to a Logtime hypothesis used in psychology (Weber-Fechner law)

(205)

However, our point here is not to discuss different modifications of Eq. (204), but rather to emphasize that any time perceptions that are different from physical time lead to different motions of the living systems in actual physical space due to coupling of motor and mental dynamics through the information potential

(206)

B. Model Extension

The proposed model of Livings is based upon a particular type of the Liouville feedback implemented via the information potential Eq.(206). However, as shown in (Zak, M., 2004, 2005b, 2006a, 2006c, 2007), there are other types of feedbacks that lead to different models useful for information processing. For instance, if the information force is chosen as Physics of Life from First Principles 125

(207)

(that includes the information force in the second term), then the corresponding Liouville equation can describe shock waves, solitons and chaos in probability space, Figure 24, (Zak,M., 2004). Let us concentrate upon a particular case of Eq. (207) considered in the Section 2 (see Eqs. (7a) and (8a). The solution of Eq. (8a) subject to the initial conditions and the normalization constraint

(208) is given in the following implicit form (Whitham, G., 1974)

(209)

Figure 24. Shock wave and soliton in probability space.

This solution describes propagation of initial distribution of the density 0 (V ) with the speed V that is proportional to the values of this density, i.e. the higher values of propagate faster than lower ones. As a result, any “compressive” part of the wave, where the propagation velocity is a decreasing function of V, ultimately “breaks” to give a triple-valued (but still continuous) solution for (V,t) . Eventually, this process leads to the formation of strong discontinuities that are related to propagating jumps of the probability density. In the theory of nonlinear waves, this phenomenon is known as the formation of a shock wave, Figure 24. Thus, as follows from the solution (209), a single-valued continuous probability density spontaneously transforms into a triple-valued, and then, into discontinuous distribution. In aerodynamical application of Eq. (8a), when stands for the gas density, these phenomena are eliminated through the model correction: at the small neighborhood of shocks, the gas viscosity cannot be ignored, and the model must include the term 126 Michail Zak describing dissipation of mechanical energy. The corrected model is represented by the Burgers’ equation

(210)

This equation has a continuous single-valued solution (no matter how small is the viscosity ), and that provides a perfect explanation of abnormal behavior of the solution to Eq. (8a). Similar correction can be applied to the case when stands for the probability density if one includes Langevin forces L(t) ) into Eq. (7a)

(211)

Then the corresponding Fokker-Planck equation takes the form (210). It is reasonable to assume that small random forces of strength 1are always present, and that protects the mathematical model Eqs. (7a), and (8a) from singularities and multi-valuedness in the same way as it does in the case of aerodynamics. It is interesting to notice that Eq. (210) can be obtained from Eq. (211) in which random force is replaced by additional Liouville feedback

(211a)

As noticed above, the phenomenological criterion of life is the ability to decrease entropy by internal effort. This ability is provided by the feedback implemented in Eqs. (7a) and (8a). Indeed, starting with Eq. (1a) and invoking Eq. (8a), one obtains

(212)

Obviously, presence of small diffusion, when 1, does not change the inequality (212) during certain period of time. (However, eventually, for large times, diffusion takes over, and the inequality (212) is reversed). As shown in (Zak, M., 2006c) the model Eqs. (210), and (211a) exhibits the same mechanism of emergence of randomness as those described by Eqs. (14) and (15). If the information force has the following integral form

(213) Physics of Life from First Principles 127 leading to the governing equations

(214)

(215) then the solution to the corresponding Liouville equation (215) converges to an arbitrarily prescribed probability density *(V) . This result can be applied to one of the oldest (and still unsolved) problems in optimization theory: find a global maximum of a multi- dimensional function. Almost all the optimization problems, one way or another, can be reduced to this particular one. However, even under severe restrictions imposed upon the function to be maximized (such as existence of the second derivatives), the classical methods cannot overcome the problem of local maxima. The situation becomes even worse if the function is not differentiable since then the concept of gradient ascend cannot be applied. The idea of the quantum-inspired algorithm based upon the Liouville feedback Eq. (213) is very simple: represent a positive function (v1,v2 ,...vn ) to be maximized as the probability * density (v1,v2 ,...vn ) to which the solution of the Liouville equation is attracted. Then the larger value of this function will have the higher probability to appear as a result of running Eq. (214). If an arbitrary prescribed probability density is chosen as a power law

1 ( ) 2 V *(V ) 2 (1 ) ( 1) / 2 , ( ) 2 then the system (214), (215) simulates the underlying dynamics that leads to the corresponding power low statistics . Such a system can be applied to analysis and predictions of physical and social catastrophes, (Zak, M., 2007). Finally, introduction of the terminal Liouville feedback (Zak, M.,2004, 2005b, 2006a)

(216) applied to a system of ODE

(217) and leading to the governing equations

(218) 128 Michail Zak

(219) has been motivated by the fundamental limitation of the theory of ODE that does not discriminate between stable and unstable motions in advance, and therefore, an additional stability analysis is required for that. However, such an analysis is not constructive: in case of instability, it does not suggest any model modifications to efficiently describe postinstability motions. The most important type of instability, that necessitates postinstability description, is associated with positive Lyapunov exponents leading to exponential growth of small errors in initial conditions (chaos, turbulence). The approach proposed in (Zak,M., 2005b) is based upon the removal of positive Lyapunov exponents by introducing special Liouville feedback represented by terminal attractors and implemented by the force Eq. (216) . The role of this feedback is to suppress the divergence of the trajectories corresponding to initial conditions that are different from the prescribed ones without affecting the “target” trajectory that starts with the prescribed initial conditions. Since the terminal attractors include expected values of the state variables as new unknowns (see Eqs. (218)), the corresponding Liouville equation should be invoked for the closure (see Eq. (219)). This equation is different from its classical version by additional nonlinear sinks of the probability represented by terminal attractors. The forces (216) possess the following properties: firstly, they vanish at vi vi , and therefore, they do not affect the target trajectory vi vi (t) ; secondly, their derivatives become unbounded at the target trajectory:

(220) and that makes the target trajectory infinitely stable thereby suppressing chaos. The same property has been exploited for representation of chaotic systems via stochastic invariants, (Zak, M., 2005b), Figure 25. Such a representation was linked to the stabilization principle formulated in (Zak, M., 1994) for the closure in turbulence.

Figure 25. Divergence of trajectories. Physics of Life from First Principles 129

So far the extension of the model of Livings discussed above was implemented through different Liouville feedbacks. Now we will return to the original feedback, i.e. to the gradient of the information potential Eq. (32), while extending the model through a departure from its autonomy. The justification for considering non-autonomous living systems is based upon the following argument: in many cases, a performance of Livings include so called “discrete events”, i.e. special critical states that give rise to branching solutions, or to bifurcations. To attain this property, such system must contain a “clock”- a dynamical device that generates a global rhythm. During the first part of the “clock’s” period, a critical point is stable, and therefore, it attracts the solution; during the second half of this period, the “clock” destabilizes the critical point, and the solution escapes it in one of possible several directions. Thus, driven by alternating stabilities and instabilities, such a system performs a random walk-like behavior. In order to illustrate it, start with the following system (compare to Eq. (11))

(221) where β, γ, and ω are constants. Then one arrives at the following ODE with respect to the square root of the variance D (compare to Eq. (37))

(222)

This equation was studied in (Zak, M., 1990, 2005a). It has been shown that at the equilibrium points

(223) the Lipschitz condition is violated, and the solution represents a symmetric unrestricted random walk on the points Eq. (223). The probability * that the solution approaches a point y after n steps is

(224)

Here the binomial coefficient should be interpreted as 0 whenever m is not an integer in the interval [0,n], and n is the total number of steps. Obviously, the variance D = y2 performs asymmetric random walk restricted by the condition that D is not-negative, while Eq. (224) is to be replaced by the following

(225)

130 Michail Zak

Thus, the probability density of the velocity v described by the ODE (221) jumps randomly from flatter to sharper distributions, and vice versa, so that increase and decrease of the entropy randomly alternate, and that makes the system “alive”. Finally, the model (56), (57) can be further generalized by introduction of delay or advanced time

(226)

(227)

When 0, the dynamics is driven by a feedback from the past (memories); when 0, the dynamics is driven by a feedback from the future (predictions). Although little is known about the structure of the solution to Eqs. (226), (227), (including existence, uniqueness, stability, etc), the usefulness of such a model extension is obvious for inverse problems when the solution is given, and the model parameters are to be determined, (see Section 8, Eqs. (190)- (201)).

C. Summary

Thus, we have introduced the First Principle for modeling behavior of living systems. The structure of the model is quantum-inspired: it is obtained from quantum mechanics by replacing the quantum potential with the information potential, Figure 8. As a result, the model captures the most fundamental property of life: the progressive evolution, i.e. the ability to evolve from disorder to order without any external interference. The mathematical structure of the model can be obtained from the Newtonian equations of motion coupled with the corresponding Liouville equation via information forces. The unlimited capacity for increase of complexity is provided by interaction of the system with it’s images via chains of reflections: what do you think I think you think….All these specific non-Newtonian properties equip the model with the levels of complexity that match the complexity of life, and that makes the model applicable for description of behaviors of ecological, social and economics systems.

Acknowledgment

The research described in this paper was performed at Jet Propulsion Laboratory California Institute of Technology under contract with National Aeronautics and Space Administration.

Physics of Life from First Principles 131

References

Arbib,M., and Hesse, M.,1986,The Construction of Reality, Cambridge University press. Berg,H., 1983, Random walks in biology, Princeton Univercity Press, New Jersey. Gordon,R., 1999, The Hierarchical Genome and Differential Waves, World Scientific. Haken,H.,1988, Information and Self-organization, Springer, N.Y. Huberman,B., 1988, The ecology of Computation, North-Holland, Amsterdam. Lefebvre,V.,2001, Algebra of Conscience, Kluwer Acad. Publ. Mead,C,1989, Analog VLSI and Neural Systems,Addison Wesley. Mikhailov,A., 1990, Foundations of Synergetics, Springer, N.Y. Risken,H., 1989, The Fokker-Planck Equation, Springer, N.Y. Perl, J., Probabilistic reasoning in intelligent systems, Morgan Kaufmann Inc. San Mateo, 1986. Prigogine,I., 1961, Thermodynamics of Irreversible Processes, John Wiley,N.Y. Prigogine,I., 1980, From being to becoming, Freeman and co., San Francisco. Whitham, G., Linear and nonlinear waves, Wily-Interscience Publ.;1974. Yaguello,M,.1998, Language through the looking glass, Oxf. Univ. Press. Zak, M., "Terminal Attractors for Associative Memory in Neural Networks," Physics Letters A, Vol. 133, No. 1-2, pp. 18-22. 1989 Zak, M., 1990, Creative dynamics approach to neural intelligence, Biological Cybernetics, 64, 15-23. Zak,M.,1992, Terminal Model of Newtonian dynamics, Int. J. of Theor. Phys. 32, 159-190. Zak, M., 1994, Postinstability models in Dynamics, Int. J. of Theor. Phys. vol. 33, No. 11, 2215-2280. Zak,M., 1999a, Physical invariants of biosignature, Phys. Letters A, 255, 110-118. Zak, M.,1999b, Quantum analog computing, Chaos, Solitons & Fractals, Vol.10, No. 10, 1583-1620. Zak,M., 2000a, Dynamics of intelligent systems, Int. J. of Theor. Phys.vol. 39, No.8. 2107- 2140. Zak,M., 2000b, Quantum decision-maker, Information Sciences, 128, 199-215. Zak, M.,2002a) Entanglement-based self-organization, Chaos, Solitons & Fractals, 14, 745- 758. Zak,M.,2002b, Quantum evolution as a nonlinear Markov Process, Foundations of Physics L,vol.15, No.3, 229-243. Zak, M., 2003, From collective mind to communications, Complex Systems, 14, 335-361. Zak,M.,2004, Self-supervised dynamical systems, Chaos ,Solitons & Fractals, 19, 645-666. Zak, M. 2005a, From Reversible Thermodynamics to Life. Chaos, Solitons & Fractals, 1019- 1033. Zak, M., 2005b, Stochastic representation of chaos using terminal attractors, Chaos, Solitons &Fractals, 24, 863-868. Zak, M., 2006a Expectation-based intelligent control, Chaos, Solitons&Fractals, 28, 5. 616- 626. Zak, M., 2007a, Complexity for Survival of Livings, Chaos, Solitons &Fractals, 32, 3 , 1154- 1167. 132 Michail Zak

Zak, M., 2007b, From quantum entanglement to mirror neuron, Chaos, Solitons & Fractals, 34, 344-359. Zak, M., 2007c, “Quantum-Classical Hybrid for Computing and Simulations”, Mathematical Methods, Physical Models and Simulation in Science & Technology (in press).

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 133-149 © 2010 Nova Science Publishers, Inc.

Chapter 5

THEORETICAL PHYSICS OF DNA: NEW IDEAS AND TENDENCIES IN THE MODELING OF THE DNA NONLINEAR DYNAMICS

L.V. Yakushevich Institute of Cell Biophysics of the Russian Academy of Sciences Pushchino, Moscow region, 142290 Russia

Abstract

Theoretical studies of the DNA nonlinear dynamics successfully started with the model of Englander [1] in 1980, was intensively developed at the end of the 20th century. Most of the proposed models and results obtained have been summarized in the reviews [2-5] and books [6-9]. And what was happened after that? What new ideas, results and tendencies one can observe in this field of science now? Here we describe some of them.

1. Introduction

Modeling of the DNA nonlinear dynamics is one of the most perspective fields of modern theoretical physics. It was started in 1980 when Englander et al. proposed the first nonlinear model Hamiltonian of DNA. [1]. The approach was developed in the works of Yomosa [10- 11], Takeno and Homma [12-13], Krumhansl et al. [14-15], Fedyanin et al. [16-18], Yakushevich et al. [19-22], Zhang [23], Prohofsky [24], Christiansen and Muto [25-27], van Zandt [28], Peyrard et al. [29-30], Dauxois et al. [31], Gaeta [32-33], Salerno [34], Bogolubskaya and Bogolubsky [35], Hai [36], Gonzalez and Martin-Landrove [37], Volkov [38], Barby et al. [39-40], Campa [41] Kovaleva et al. [42] and Sanches et al. [43], where Englander’s model has been improved by modifying model Hamiltonian, more accurate definition of the dynamical parameters, finding new soliton-like solutions, consideration of statistical properties of DNA solitons and calculations of correlation functions. Most of the results obtained at the end of the 20th century, have been summarized in the reviews of Scott [2], Zhou and Zhang [3], Yakushevich [4] and Gaeta et al. [5], in the collection of lectures 134 L.V. Yakushevich made by participants of the International workshop in Les Hauches (France, 1994) [6], in the selected paragraphs of the monographs of Davydov [7] and Yakushevich [8], and in the book of Yakushevich [9]. And what was happened after that? What new ideas, results and tendencies one can observe in this field of science now? In this paper we describe some them.

2. New Stimuli for Nonlinear Modeling of DNA

At the end of the previous century the main stimulus for the nonlinear modeling of DNA was to find possible relation between functional and physical (and especially dynamical) properties of DNA. In the case of success investigators hoped to obtain a new effective instrument for interpretation of DNA code. This line in nonlinear DNA science was started by Salerno [34, 44] and now it was continued in the works of Sanches et al. [45-46]. In this century, however, another stimulus has been appeared. Now investigators hope to apply physical (and especially dynamical) properties of DNA for nanotechnology. They consider DNA as an ideal instrument for constructing nanodevices. Important contribution to appearance and developing this line of DNA science was made by experiments on charge transfer in DNA and experiments on single DNA molecule manipulations.

2.1. Charge-Transfer in DNA

Electronic excitations and motion of electronic charges are well known to play a significant role in a wide range of macromolecules of biological interest [47]. The double helical DNA has in its core a stacked array of base pairs. The bases possess an aromatic π- system in contact with those of neighboring residues. And these linked π-systems represent a unique system which could serve as a wire to convey electrons through the DNA. In spite of this, for a long time many scientists believed that DNA molecules, like proteins, were insulators and could not facilitate long-range charge transfer. There were also opinions in the middle of these whereby DNA might serve as a semiconductor, relaying a charge only in certain situations. Only in the end of 80th and in 90th years the Barton group at California Institute has published a series of papers in Science [48-49] reporting that in the DNA assemblies they constructed, damage can be promoted at a site some distance away from the site where a radical is injected into the DNA base pair stack. Barton believed that this damage was promoted through electron transfer mediated by the DNA double helix. Till that time many other experimental works on charge transport along the DNA double helix have been published [51-56] and many various models have been proposed to describe charge transfer and charge transport along the DNA double helix. The models of simple tunneling [56-57], semiconducting energy gap [58-60], polaron hopping [61], and fluctuation limited transport [62] are among them. However, neither of these models is good enough and the problem of theoretical modeling the process still remains unsolved. Theoretical Physics of DNA 135

2.2. Manipulations with Single DNA Molecule

Single molecule manipulations are becoming now almost routine thanks to the remarkable progress of experimental tools. Investigators are able to unfold a protein by pulling, denaturate DNA by torsion, measure the elasticity of a single molecule or the torque of a molecular motor, and investigate the microscopic mechanics of protein-DNA interactions or the disruption of the double-helix [63]. Micromanipulation experiments on proteins and on nucleic acids are based on magnetic beads, optical tweezers, micro-needles, biomembrane force probes and atomic force spectroscopy. They allow measurements of forces in the range from the “thermal” (fN) up to the rupture of covalent bonds (nN), and are based on the control of subnanometer displacements. So, because manipulations with single DNA molecule as well as charge-transfer experiments are considered now as important factors required for construction of nanodevices, they both are of most interest for physicists dealing with nonlinear modeling of DNA.

3. Two Tendencies in Theoretical Studies of Nonlinear DNA Dynamics

In theoretical modeling of nonlinear DNA dynamics one can observe two tendencies. The first one is associated with the improvement of theoretical models to make them more realistic. The other is opposite. Investigators simplify models to have a possibility to obtain reliable and clear solutions and to apply them to study a more wide range of problems of DNA science.

3.1. Tendency to Complicate DNA Models by Including Additional Details of the DNA Structure and Interactions

Tendency to improve existing nonlinear models of internal DNA dynamics looks rather natural. The improvement is usually accompanied by including more and more details of the DNA structure and motions which were earlier omitted. Below we present three examples which illustrate this tendency.

3.1.a. Inclusion of the Details on Differences in Mass of Bases in Watson-Crick Pairs

In most of nonlinear DNA models the difference in mass of bases in Watson-Crick pairs is neglected, and symmetry between two strands according to the general DNA axis is assumed. However, even in homogeneous DNA the symmetry is absent because the difference in mass of adenine and thymine in A-T base pairs and of guanine and cytosine in G-C base pairs is rather substantial and it should be taken into account. The difference has been recently taken into account in [22] where the model Hamiltonian proposed had the following form

h h h h H = T + V || + V ⊥.(1) 136 L.V. Yakushevich

h h Here the kinetic energy (T ), the energy of interactions along the chains (V ||) and the h energy of interactions between bases in pairs (V ⊥) are determined by formulas

h 2 2 2 T = ∑ {(m1r1 /2 (dϕn,1/dt) + (m2r2 /2)(dϕn,2/dt)}, (2) n

h 2 2 2 V || = ∑ {K1r1 (ϕn,1 - ϕn-1,1) + K2r2(ϕn,2 - ϕn-1,2) }, (3) n

h V ⊥ = ∑ k1-2{r1(r1 + r2)(1 - cosϕn,1) + r2 (r1 + r2)(1 - cosϕn,2) – n

- r1r2[1 - cos(ϕn,1 - ϕn,2)]}, (4) where ϕn,i is the angular displacement of the nth base of the ith chain from its equilibrium position; ri, is the distances between the center of mass of the ith base and the nearest sugar- phosphate chain; a is the distance between neighboring bases along the chains; mi is mass of bases of the ith chain; Ki is the coupling constant along the sugar-phosphate chain; k1-2 is the force constant that characterizes interactions between bases in pairs; n = 1, 2, … N; i = 1, 2. The model Hamiltonian (1) takes into account the difference in mass of bases in pairs as well as the difference in distance between the center of mass of bases and the nearest sugar- phosphate chain. This model is named asymmetrical model. Dynamical equations corresponding to Hamiltonian (1) have then the form

2 2 2 2 m1r1 (d ϕn,1/dt ) = K1r1 [(ϕn-1,1 - ϕn,1) - (ϕn,1 - ϕn+1,1)] –

- k1-2[r1(r1 + r2)sinϕn,1 - r2r1sin(ϕn,1 - ϕn,2)], (5)

2 2 2 2 m2r2 (d ϕn,2/dt ) = K2r2 [(ϕn-1,2 - ϕn,2) - (ϕn,2 - ϕn+1,2)] –

- k1-2 [r2(r1 + r2)sinϕn,2 - r2r1sin(ϕn,2 - ϕn,1)]. (6)

Soliton-like solutions of equations (5) – (6) are presented in Figure 1. They were obtained numerically with the help of the variation technique [64], and closed results have been developed recently by analytical methods in the work [65]. The first type, corresponding to the topological charge q = (1, 0), describes a kink moving along the first nucleotide chain and a small perturbation accompanying it and moving along the second chain. The second type, corresponding to topological charge q = (0, 1), describes a kink moving along the second nucleotide chain and a small perturbation accompanying it and moving along the first chain. The third type, corresponding to topological charge q = (1, 1), describes two kinks. The first kink moves along the first nucleotide chain, and the second one moves along the second chain; also, the first and second kinks are somewhat shifted with respect to each other. Theoretical Physics of DNA 137

Figure 1. Three types of soliton-like solutions obtained in the framework of asymmetrical model.

3.1.b. Inclusion of the Details on Differences in Frequencies of Base Rotational Oscillations in Phase and Out of Phase

Modern methods of quantum chemistry permits to calculate low-frequency spectrum of

DNA, and to obtain, in particular, frequencies of base rotations in phase and out of phase: w1 and w2. These two frequencies can be used to improve asymmetrical model (1) by including additional term [42]

2 Vadd = ∑ k1-2 {(1/4) [1-(w2/w1)] (r1 + r2) [1 - cos(ϕn,1 - ϕn,2)]}. (7) n

As a result, instead of three types of soliton-like solutions (Figure 1), improved asymmetrical model admits four types of soliton-like solutions (Figure 2). The first solution with topological charge q = (1, 0) has the form of a smooth step and describes a kink moving along the first nucleotide chain. It is accompanied be small- amplitude deformation moving along the second chain. The second solution with topological charge q = (0, 1) has the form of a smooth step and describes a kink moving along the second nucleotide chain. It is accompanied by small-amplitude deformation moving along the first 138 L.V. Yakushevich chain. The third solution with topological charge q = (1, 1) has the form of a step with respect to each component, and the steps are shifted in relation to each other. It was shown that such a solution represent the bound state of two topological solitons with charges q1 = (1,0) and q2 = (0,1). There are two equivalent states of this soliton: “left” state ql = (1,1)l and “right” state” qr = (1,1)r, where the soliton of charge q1 is located to the left or to the right of the soliton of charge q2. The fourth solution with the charge q = (1,-1) also has the form of a step with respect to each component, but the steps are oriented in different directions. The soliton also has two states: left and right ones.

Figure 2. Four types of soliton-like solutions obtained in the framework of improved asymmetrical model. Theoretical Physics of DNA 139

3.1.c. Inclusion of the Details on Interactions between Bases in Watson-Crick Pairs

Usually interactions between successive nucleotides (shown as blue or red disks in Figure 3) are modeled as harmonic potential, while the interactions between bases in Watson-Crick pairs are modeled by a potential which, albeit harmonic in the distance, becomes anharmonic when described in terms of rotational angles. More precisely, with L the distance between

points B1 and B2 (see Figure 4) of the disks representing nucleotides, and L0 the distance between the points B1 and B2 in the equilibrium configuration, the intrapair potential is

h 2 V ⊥ = (k/2) (L - L0) . (8)

Figure 3. DNA model. Nucleotides are presented as identical disks of radius r, which can rotate around their centers.

Figure 4. Further details of the base pair modeling. The points B1 and B2 are shown. The disks centers lie at a distance A = (r + L0/2) from the double helix axis. 140 L.V. Yakushevich

Usually investigators use approximation L0 = 0, which leads to a number of computational simplifications. However as pointed by Gonzalez and Martin-Landrove [37], L0 = 0 is a singular case in that the description one thus obtains is not structurally stable: as soon as we consider L0 ≠ 0, certain qualitative features of the model dynamics are changed. In the paper of Gaeta [66], the model is analyzed beyond the approximation. As a result, it was shown that in this case soliton solutions are still present. The qualitative form of soliton solutions is little changed; as for their width, it is very moderately increased (see Figure 5) still remaining in the same order magnitude.

Figure 5. The (0,1) soliton without approximation (solid line) and with contact approximation (dotted line).

3.2. Tendency to Simplify DNA Models Up to the Model of Englander

Another tendency in the nonlinear modeling of the internal DNA dynamics consists in simplifying existing models to have a possibility to consider a more wide range of nonlinear problems involving DNA internal dynamics. We present below several examples which illustrate the tendency. In all of them the problems considered are solved in the framework of Englander’s model [1] where dynamical equation has the form of sine-Gordon equation which in turn is a simplified version of the model system (5) - (6). Indeed, if we suggest that rotational motions of bases in each of the DNA chains are independent, equations (5) – (6) can be transformed to two independent discreet sine-Gordon equations

2 2 2 2 2 m1r1 (d ϕn,1/dt ) = K1r1 (ϕn-1,1 - 2ϕn,1, + ϕn+1,1) – k1-2 r1 sinϕn,1,(9)

2 2 2 2 2 m2r2 (d ϕn,2/dt ) = K2r2 (ϕn-1,2 - 2ϕn,2 + ϕn+1,2) – k1-2 r2 sinϕn,2., (10) where equation (9) describes rotational dynamics of bases in the first chains, the bases in the second chain being assumed immovable; and equation (10) describes rotational dynamics of the bases in the second chains, the bases in the first chain being assumed unmovable. Theoretical Physics of DNA 141

3.2a. Englander’s Model Applied to Study Effects of Dissipations

To solve the problem of the influence of dissipation on the velocity of local conformational distortion moving along the DNA, investigators often use continuum version of one of equations (9) – 10) as a basic (unperturbed) equation

2 Iφtt – K’a φzz + V sinφ = 0, (11)

and then they add the term

-βφt, (12) to describe effect of dissipation. So, the whole model equation has the form

2 Iφtt – K’a φzz + V sinφ =-βφt. (13)

2 2 Here K’ = K1r1 , V = k1-2 r1 and β is a coefficient of dissipation. In the work [67] equation (13) has been solved analytically by energetic method [68], and formula for kink velocity υ(t) has been obtained

2 -1/2 υ(t) = υ0 γ0 exp(-βt/I) [1 + (υ0γ0/C0) exp(-2βt/I)] , (14)

2 1/2 2 -1/2 where C0 = (K’a /I) , γ0 = [1- (υ0/C0) ] , υ0 – initial kink velocity. Results of calculations of the velocities of kinks moving along different types of homogeneous polynucleotide chains are shown in Figure 6.

Figure 6. Decreasing of kink velocity due to effects of dissipation. Calculations were made for four cases of homogeneous polynucleotide chains: the chain consisting of only adenine bases (red line), the chain consisting of only thymine bases (blue line), the chain consisting of only guanine bases (green line) and the chain consisting of only cytosine bases (black line). The model values υ0 = 189 m/s, β = 4.25 × 10-34(J s) were used, and the values of the parameters I, K’ and V for different types of polynucleotide chains were taken from [69]. 142 L.V. Yakushevich

3.2.b. Englander’s Model Applied to Study Effects of External Field

To take into account effects of an external field, for example of a constant torque F0, the basic sine-Gordon equation (11) is usually added by term

F0. (15)

So, the model equation takes the form

2 Iφtt – K’a φzz + V sinφ = F0. (16)

Equation (13) has been solved analytically by energetic method [68] which gave a possibility to obtain the kink velocity υ(t)

1/2 1/2 1/2 1/2 2 -1/2 υ(t) = ± [(F0πC0t/4I V ) + γ0υ0] {1 + [(F0πC0t/4I V ) + γ0υ0] } . (17)

Results of calculations of the velocities of kinks moving along different types of homogeneous polynucleotide chains are shown in Figure 7.

-22 Figure 7. Increasing velocities of kinks due to the action of constant torque F0 = 3,12 ×10 (J). Calculations were made for four different types of homogeneous polynucleotide chains: chain consisting of only adenine bases (red line), of only thymine bases (blue line), of only guanines (green

line) and of only cytosine bases (black line). The model value υ0 = 189 m/s was used and the values of the parameters I, K’ and V for different types of polynucleotide chains were taken from [69]. Theoretical Physics of DNA 143

3.2.c. Englander’s Model Applied to Study the Balance between the Action of Dissipation and External Fields

As follows from Figure 6 and Figure 7, effects of dissipation and external field lead to opposite results. The first one leads to decreasing kink velocity, the second - to increasing kink velocity. Energetic method permits to find condition under which these both effects become balanced

2 2 2 1/2 υ0 = [1 + (16Vβ )/(π IF0 )] . (18)

crit If parameters υ0 и β are fixed, it is easily to find the value of external torque F0 which is necessary to ensure the movement of kink with constant velocity

crit 2 1/2 F0 = (4βγ0υ0/π)(V/Ka ) . (19)

3.2.d. Englander’s Model Applied to Study DNA Kink Propagation through the Boundary between Two Homogeneous Regions

To model inhomogeneous DNA having the boundary between two homogeneous regions, one region consisting of only i-i’ base pairs, and the other region consisting of only j-j’ base pairs (i, i’, j, j’ = A, T, C, G),

it is enough to modify basic equation (11) in the following way [70]

(i,j) 2 (i,j) (i,j) Ii [1 + Θ(z-z*)ΔI ]φtt – Ka [1 + Θ(z-z*)ΔK ]φzz + V [1 + Θ(z-z*)ΔV ]sinφ = 0. (20)

Here it is suggested that the boundary between the regions is placed at the point z*. (i,j) (i,j) (i,j) Coefficients ΔI , ΔK , ΔV and function Θ(z-z*) are determined by formulas

(i,j) (i,j) (i,j) ΔI = (Ij – Ii)/Ii; ΔK = (Kj – Ki)/Ki; ΔV = (Vj- Vi)/Vi. (21)

⎧ 0, if z ≤ z*; Θ(z-z*) = ⎨ (22) ⎩ 1, if z > z*. 144 L.V. Yakushevich

Solutions of equation (20) were considered analytically and numerically in [70]. And different kink behavior was found. To illustrate this we present below results of numerical calculations made for three different types of the boundary.

1. AG boundary. In this case i = A, j = G, and the sine-Gordon kink propagates through the following double chain

Results of calculations are shown in Figure 8. They indicate that kink velocity decreases when crossing the boundary.

2. AT boundary. In this case i = A, j = T, and the sine-Gordon kink propagates through the following double polynucleotide chain

Results of calculations are shown in Figure 9. According to them kink velocity increases when crossing the boundary.

3. TG boundary. In this case i = T, j = G, and the sine-Gordon kink propagates through the double polynucleotide chain of the following type

And the results of calculations shown in Figure 10, indicate that the kink reflects from the boundary. Theoretical Physics of DNA 145

4. Conclusive Remarks

New ideas and tendencies in theoretical physics of DNA have been briefly described above. We showed that nonlinear line of DNA science continues to develop with increasing intensity, and this success is tightly connected with new experimental data on charge-transfer in DNA and on single molecule manipulations, which gave a new impulse for studying physical (and especially nonlinear dynamical) properties of DNA. As a result, many scientists prefer now to model synthetic DNA sequences instead of the real ones, and this gives raise a new tendency: to simplify the basic dynamical models of DNA up to Englander’s one. Surely these new features of DNA science does not decrease an importance of “old” line of DNA science associated with finding the relation between functional and physical (and especially dynamical) properties of DNA. This direction remains urgent because of its relation with the problem of interpretation of DNA code, which is far from its finishing.

Figure 8. Propagation through AG boundary.

Figure 9. Propagation through AT boundary. 146 L.V. Yakushevich

Figure 10. Reflection from TG boundary.

Dedication

Dedicated to the memory of Al Scott, eminent scientist, teacher and person.

References

[1] Englander S.W., Kallenbach N.R., Heeger A.J., Krumhansl J.A. and Litwin A. Nature of the open state in long polynucleotide double helices: possibility of soliton excitations. Proc. Natl. Acad. Sci. USA, 77, 7222-7226 (1980). [2] Scott A.C. Solitons in biological molecules. Comments Mol.Cell. Biol.3, 5-57 (1985). [3] Zhou G.-F. and Zhang Ch.-T. A short review on the nonlinear motion in DNA. Phys. Scripta, 43, 347-352 (1991). [4] Yakushevich L.V. Nonlinear dynamics of biopolymers: theoretical models, experimental data. Quart. Rev. Biophys. 26, 201-223 (1993). [5] Gaeta G., Reiss C., Peyrard M. and Dauxois T. Simple models of nonlinear DNA dynamics. Rev. Nuovo Cimento, 17, 1-48 (1994). [6] Nonlinear Excitations in Biomolecules. M. Peyrard (ed.) Springer, Berlin (1995). [7] Davydov A.S. Solitons in Bioenergetics. Naukova Dumka, Kiev (1986). [8] Yakushevich L.V. Methods of Theoretical Physics and Their Applications to Biopolymer Sciences. Nova Science Publishers, New York (1996). [9] Yakushevich L.V. Nonlinear Physics of DNA (second, revised version). Wiley, Weinheim, 2004. [10] Yomosa S. Soliton excitations in deoxyribonucleic acid (DNA) double helices. Phys. Rev. A-27, 2120-2125 (1983). [11] Yomosa S. Solitary excitations in deoxyribonucleic acid (DNA) double helices. Phys. Rev. A-30, 474-480 (1984.) [12] Takeno S. and Homma S. Topological solitons and modulated structure of bases in DNA double helices. Prog. Theor. Phys. 70, 308-311 (1983). Theoretical Physics of DNA 147

[13] Homma S. and Takeno S. A coupled base-rotator model for structure and dynamics of DNA. Prog. Theor. Phys. 72, 679-693 (1984). [14] Krumhansl J.A. and Alexander D.M. Nonlinear dynamics and conformational excitations in biomolecular materials. In: Structure and Dynamics: Nucleic Acids and Proteins. E. Clementi and R.H. Sarma (eds), Adenine Press, New York (1983), pp. 61-80. [15] Krumhansl J.A., Wysin G.M., Alexander D.M., Garcia A., Lomdahl P.S. and Layne S.P. Further theoretical studies of nonlinear conformational motions in double-helix DNA. In: Structure and Motion: Membranes, Nucleic Acids and Proteins. E. Clementi, G. Corongiu, M.H. Sarma and R.H. Sarma (eds), Adenine Press, New York (1985), pp. 407-415. [16] Fedyanin V.K. and Yakushevich L.V. Scattering of neutrons and light by DNA solitons. Stud. biophys. 103, 171-178 (1984). [17] Fedyanin V.K., Gochev I. and Lisy V. Nonlinear dynamics of bases in continual model of DNA double helices. Stud. biophys. 116, 59-64 (1986). [18] Fedyanin V.K. and Lisy V. Soliton conformational excitations in DNA. Stud. biophys. 116, 65-71 (1986). [19] Yakushevich L.V. The effects of damping, external fields and inhomogeneity on the nonlinear dynamics of biopolymers. Stud. biophys. 121, 201-207 (1987). [20] Yakushevich L.V. Nonlinear DNA dynamics: a new model. Phys. Lett. A-136, 413-417 (1989). [21] Yakushevich L.V. Investigation of a system of nonlinear equations simulating DNA torsional dynamics. Stud. biophys. 140, 163-170 (1991). [22] Yakushevich L.V., Savin A.V. and Manevitch L.I. On the internal dynamics of topological solitons in DNA. Phys. Rev. E-66, 016614-29 (2002). [23] Zhang Ch.-T. Soliton excitations in deoxyribonucleic acid (DNA) double helices. Phys. Rev. A-35, 886-891 (1987). [24] Prohofsky E.W. Solitons hiding in DNA and their possible significance in RNA transcription. Phys. Rev. A-38, 1538-1541 (1988). [25] Muto V., Holding J., Christiansen P.L. and Scott A.C. Solitons in DNA. J. Biomol. Struct. Dyn. 5, 873-894 (1988). [26] Muto V., Scott A.S. and Christiansen P.L. Thermally generated solitons in a Toda lattice model of DNA. Phys. Lett. A-136, 33-36 (1989). [27] Muto V., Lomdahl P.S. and Christiansen P.L. Two-dimensional discrete model for DNA dynamics: longitudinal and denaturation. Phys. Rev. A-42, 7452- 7458 (1990). [28] Van Zandt L.L. DNA soliton realistic parameters. Phys. Rev. A-40, 6134-6137 (1989). [29] Peyrard M. and Bishop A.R. Statistical mechanics of a nonlinear model for DNA denaturation. Phys. Rev. Lett. 62, 2755-2758 (1989). [30] Dauxois T., Peyrard M. and Willis C.R. Localized breather-like solutions in a discrete Klein-Gordon model and application to DNA. Physica D-57, ,267-282 (1992). [31] Dauxois T. Dynamics of breathers modes in a nonlinear helicoidal model of DNA. Phys. Lett. A-159, 390-395 (1991). [32] Gaeta G. On a model of DNA torsion dynamics. Phys. Lett. A-143, 227-232 (1990). [33] Gaeta G. Solitons in planar and helicoidal Yakushevich model of DNA dynamics. Phys. Lett. A-168, 383-389 (1992). 148 L.V. Yakushevich

[34] Salerno M. Discrete model for DNA-promotor dynamics. Phys. Rev. A-44, 5292-5297 (1991). [35] Bogolubskaya A.A. and Bogolubsky I.L. Two-component localized solutions in a nonlinear DNA model. Phys. Lett. A-192, 239-246 (1994). [36] Hai W. Kink couples in deoxyribonucleic acid (DNA) double helices. Phys. Lett. A-186, 309-316 (1994). [37] Gonzalez J.A. and Martin-Landrove M. Solitons in a nonlinear DNA model. Phys. Lett. A-191, 409-415 (1994). [38] Volkov S.N. Conformational transition. Dynamics and mechanism of long-range effects in DNA. J. Theor. Biol. 143, 485-496 (1990). [39] Barbi M., Cocco S. and Peyrard M. Helicoidal model of DNA opening. Phys. Lett. A- 253, 358-369 (1999). [40] Barbi M., Cocco S., Peyrard M. and Ruffo S. A twist opening model for DNA. J. Biol. Phys. 24, 97-114 (1999). [41] Campa A. Buble propagation in a helicoidal molecular chain. Phys. Rev. E-63, 021901- 10 (1999). [42] Kovaleva N.A., Savin A.V., Manevitch L.I., Kabanov A.V., Komarov V.M., Yakushevich L.V. Njpological solitons in heterogeneous DNA molecule, Polymer Sci., A-48, 278-293 (2006). [43] Dominguez-Adame F., Sanchez A., Kivshar Yu.S. Soliton pinning by long-range order in aperiodic systems. Phys, Rev. E, 52, R2183-R2186 (1995). [44] Salermo M, Kivshar Yu. DNA promoters and nonlinear dynamics. Phys. Lett. A,-193, 263-266 (1994). [45] Guenda S, Sanchez A. Disorder and fluctuations in nonlinear excitations in DNA. Fluctuation and Noise Lett. 4, L491-L504 (2004). [46] Guenda S, Sanchez A. Nonlinear excitations in DNA: A periodic models versus actual genome sequences. Phys. Rev. E-70, 051903-1 – 051903-8 (2004). [47] Eley D.D. and Leslie R.B., Nature, (London) 197, 898-899 (1963). [48] Barton J.K. Metals and DNA: Molecular Left-Handed Complements. Science, 233, 727 –734 (1986). [49] Murphy C.J., Arkin M.R., Jenkins Y., Ghatlia N.D., Bossmann S., Turro N.J. and Barton J.K. Long Range Photoinduced Electron Transfer through a DNA Helix. Science 262, 1025 –1029 (1993). [50] Dandliker P.J., Holmlin R.E., and Barton J.K. Oxidative Thymine Dimer Repair in the DNA Helix. Scienc,e 275, 1465 –1468 (1997). [51] Tran P., Alavi B. and Gruner G. Charge transport along the λ-DNA double helix. Phys. Rev. Lett. 85, 1564-1567 (2000). [52] Jerome D. and Schulz H.J. Organic conductors and superconductors. Adv. Phys. 31,299- 400, (1982). [53] Devreux F., Nechtschein M. and Grüner G. Charge Transport in the Organic Conductor Qn (TCNQ)2. Phys. Rev. Lett. 45, 53. (1980). [54] Mihaly, G.; Said, G.; Gruner, G. And Kertesz, M. 2-3 Benzacridinium (TCNQ)2: A Small Band Gap Semiconductor. Solid State Comm. 21, 1115-1118 (1977). [55] Ye Y.J., Chen R.S., Martinez A., Otto P. and Ladik J., Calculation of Hopping Conductivity in Aperiodic Nucleotide Base Stacks. Solid State Commun. 112 139-144 (1999). Theoretical Physics of DNA 149

[56] Jortner J., Bixon M., Langenbacher Th. and Michel-Beyerle M.E. Charge transfer and transport in DNA. Proc. Natl. Acad. Sci. USA 95, 12759–12765 (1998). [57] Hopfield J.J. Electron transfer between biological molecules by thermally activated tunneling. Proc. Natl. Acad. Sci. USA 71, 3640-3644 (1974). [58] Giamarchi T. Mott transition in one dimension. Physica (Amsterdam) 230 B, 975-980 (1997). [59] Gruner G. Density Waves in Solids. Addison-Wesley Publishing Co., Reading (1994). [60] Bixon M. and Jortner J. Energetic Control and Kinetics of Hole Migration in DNA. J. Phys. Chem. 104, 3906-3913 (2000). [61] Henderson, P.T., Jones, D.M., Kan, Y., Hampikian, Schuster G.B. Long-Distance Charge Transport in Duplex DNA: The Phonon Assisted Polaron-Like Hopping Mechanism. Proc. Natl. Acad. Sci. USA, 96, 8353-8358. (1999). [62] Bruinsma R., Gruner G., DOrsogna M.R., Rudnik J. Fluctuation-facilitated charge migration along DNA. Phys. Rev. Lett. 85, 4393–4396 (2000). [63] Single molecule studies: from the experiments to their analysis. M. Peyrard (ed.) Abstracts of CECAM Workshop, 24-26 September 2001, Lyon, France (2001). [64] Christiansen P.L., Zolotaryuk A.V. and Savin A.V. Solitons in an isolated helix chain. Phys. Rev. E-56, 877-889 (1997). [65] Zakiryanov F.K., Yulmuhametov L.R., Shamsutdinov M.A. Solitons in the double DNA chain. In: Structural and dynamical effects in the ordered matter. Ufa:RIC Bushkirskii State Iniversity (2006) pp. 101-106. [66] Gaeta G. Solitons in the Yakushevich model of DNA beyond the contact approximation. Phys. Rev. E-74, 021921-1 – 021921-9 (2006) [67] Yakushevich L.V, Krasnobaeva L.A. Effects of Dissipation and external fields on the Dynamics of conformational distorsions in DNA. Biophysics 52, 237-243 (2007). [68] McLaughlin D.W., Scott A.C. Perturbation analysis of fluxon dynamics. Phys. Rev. 18, 1652-16-80 (1978). [69] Yakushevich L.V., Krasnobaeva L.A., Shapovalov A.V., Quintero N.R. One- and two- soliton solutions of the sine-Gordon equation as applied to DNA. Biophysics, 50, 404- 409 (2005). [70] Yakushevich L.V., Savin L.V., Manevitch L.I.. Propagation of nonlinear conformational waves in DNA: overcoming the boundary between two homogeneous overcoming the boundary between two homogeneous. In: Mathematics. Computer. Education. Vol. 12 (2005), pp. 865-876.

In: Crossing in Complexity ISBN 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 151-180 c 2010 Nova Science Publishers, Inc.

Chapter 6

MATHEMATICAL AND DATA MINING CONTRIBUTIONSTO DYNAMICSAND OPTIMIZATION OF GENE-ENVIRONMENT NETWORKS

Gerhard–Wilhelm Webera∗, Pakize Taylanb†, Bas¸ak Akteke-Ozt¨ urk¨ a, and Om¨ ur¨ Ugur˘ a aInstitute of Applied Mathematics, Middle East Technical University, Ankara, Turkey bDepartment of Mathematics, Dicle University, Diyarbakır, Turkey

Abstract

This paper further introduces continuous optimization into the fields of computa- tional biology and environmental protection which belong to the most challenging and emerging areas of science. It refines earlier ones of our models on gene-environment patterns by the use of optimization theory. We emphasize that it bases on and presents work done in [61, 66]. Furthermore, our paper tries to detect and overcome some struc- tural frontiers of our methods applied to the recently introduced gene-environment networks. Based on the experimental data, we investigate the ordinary differential equations having nonlinearities on the right-hand side and a generalized treatment of the absolute shift term which represents the environmental effects. The genetic process is studied by a time-discretization, in particular, Runge-Kutta type discretization. The possibility of detecting stability and instability regions is being shown by a utilization of the combinatorial algorithm of Brayton and Tong which is based on the orbits of polyhedra. The time-continuous and discrete systems can be represented by means of matrices allowing biological implications, they encode and are motivated by our gene-environment networks. A specific contribution of this paper consists in a care- ful but rigorous integration of the environment into modeling and dynamics, and in further new sights. Relations to parameter estimation within modeling, especially, by using optimization, are indicated, and future research is addressed, especially towards

∗E-mail address: [email protected] †E-mail address: [email protected] 152 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

the use of stochastic differential equations. This practically motivated and theoreti- cally elaborated work is devoted for a contribution to better health care, progress in medicine, a better education and more healthy living conditions recommended.

1. Introduction

Biology, biotechnology, medicine and environmental engineering in these years are be- coming closer areas and they use applied mathematics, data mining and operational re- search. Most tasks lie in the wide fields of process optimization and prediction. In this paper, we focus on the modeling and forecasting of genetic processes with a first inclusion of the environment and its effects on life in its various forms. This paper bases on earlier studies done, especially, on [61, 66], and it firstly introduces a modeling of stochastic dif- ferential equations into our studies. Here, we especially address parameter estimation and the use of optimization theory.

1.1. Analyzing and Optimizing in Biotechnology In biotechnology and computational biology, the metabolism of the cell can be opti- mized by several ways, ranging from simply adjusting the organism’s environment to ge- netic modification of the organism. Prequisite is and intense analysis and correct modeling of the metabolism of interest for which a large number of analytical tools and modeling techniques exist. Suppose, a cell is in a certain metabolic state at time point tk. The gene-expression pat- tern at that particular time point can thus be defined as a vector Ek. The elements of this vector represent the transcript concentrations of individual genes. Our modeling approach described in Section 2. is based on these vectors. After modeling gene-expression data obtained from finitely many experiments, the stability analysis of this model described in Subsection 3.3. will give us a measure for the quality of the model. If we have a stable model for the expression of all genes, it is possible to analyze the behavior of subsets of these genes. We show a way to find rarified and stable networks by reducing (top-down approach) or increasing (bottom-up approach) the number of genes iteratively. It is expected that there are several such stable subnetworks from the previous knowledge. Hence, bottom-up approach is more appropriate with using different but well selected genes. Still, random subtraction during the top-down approach is unbiased and does not depend on prior knowledge. Either way we will get a candidate-gene list. To learn more about photobiological hydrogen production [66], the approach will be based on the identification of all genes in this list for which we enter hypothesis driven experimental research. We expect to increase the yields of photobiological hydrogen production by directed manipulation of either the candidate gene itself, its product or its regulation. We are still at the very beginning of a long-term project. But the application of methods derived from operational research (OR) to biotechnological as well as medical problems will certainly lead to new insights in the near future. In the following sections, we are concerned with both modeling of gene-expression data and the subsequent stability analysis. Mathematical and Data Mining Contributions... 153 1.2. Stability in Genetics and Approach to Matrices

In biology and statistics, tables and table works are important for recording processes and relationships. Our approach to understand and predict gene-expression patterns is an OR one and given by means of matrices. Matrices represent linear mappings. In this paper, matrices will encode genetic and, potentially, environmental effects. These effects can appear in higher potency by forming products of matrices. Their products, consisting of any finitely many factors, will stand for genetic processes where the possible asymptotic tendencies, i.e., stability or instability, are interesting for us. This paper is a contribution to mathematical stability analysis applied to gene-exp- ression patterns, based on an improved modeling compared with former approaches [24, 25, 56, 57, 69, 70]. Furthermore, we suggest an explicit inclusion of environmental factors into the genetic context.

1.3. Extracting Genetic Networks from Gene-Expression Data

DNA-microarray technology is widely used to monitor the expression values for large numbers of genes [12]. To clarify the precise connections of genetic network is a research problem which can be treated by mathematical modeling. It is a graph which consists of nodes for genes and of edges with weights for the influence that genes exercise on other genes. These influences between genes are aimed to be predicted and found. Such networks are constructed and analyzed by specifically developed mathematical methods. Here, we refine the model derived from differential equations by generalizing an additive shift term while referring to an extended space of model functions. For some investigations on gene and related networks in our line of research we refer to [1, 13, 17, 24, 26, 30, 41, 49, 69].

2. Modeling Gene-Expression Data

For modeling the gene-expression data, there are well-known approaches like Bayesian networks, Boolean networks, models derived from ordinary or piecewise linear differential equations, hybrid systems. There are advantages and disadvantages of these methods in terms of their goodness of data fit, computation time, capturing dynamics well, stability and other qualitative or quantitative aspects. It is quite common to use differential equations for modeling formalisms in mathemat- ical biology. First of all, modeling of regulatory interactions by them can provide a more accurate understanding of the physical systems. Secondly, there are well developed ap- proaches like dynamical systems theory to analyze such models. Thirdly, concerning that biological systems are being developed in continuous time, we prefer to use systems of dif- ferential equations which may incorporate instantaneous changes on their right-hand sides caused by thresholds of expression levels traversed. Subsection 2.4. discusses such an affine addition to the the right-hand side with its effects and biological meanings. 154 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al. 2.1. Modeling Gene Networks with Ordinary Differential Equations By the following ordinary differential equations (ODEs), we represent the relation be- tween variables of gene networks

E˙ i = fi(E)(i = 1, 2, . . . , n),

T where E = (E1,E2,...,En) (E = E(t), t ∈ I) is the vector of positive concentrations n of proteins, mRNAs, or small components. The functions fi : R → R are nonlinear and n is the number of genes. Here, the time interval I = (a, b) ⊆ R includes finitely many sample times tκ; it can be of short-, middle- or long-term. For the stability analysis given in this paper, the long-term case with b = ∞ is most important. Chen et al. [13] first proposed to use differential equation or dynamical system model consisting of mRNA and protein concentrations in the form of E˙ = ME, where M is a constant matrix and the vector E comprises the expression level of individual genes. Later on, this linear model on mRNA data of Bacillus subtilis was used by De Hoon and Imoto [40] to estimate M with maximum likelihood estimation method. In 2001, a more flexible model was proposed by Sakamoto and Iba [53]: E˙ i = fi(E1,E2,...,En), with fi being T functions of E = (E1,E2,...,En) determined by genetic programming and least-squares methods. At the beginning, we specified the models described above by regarding the model E˙ = M(E)E, in which the matrix M, usually a constant matrix, depends on E [24]. The multiplicative model form of the right-hand side M(E)E is a major candidate for the applicability of our stability analysis. This analysis and the underlying least-squares optimization problem on finding an approximate model were explained in [24, 25]; we restricted the solution space by assuming that the number of regulating factors for each gene is bounded (cf. Section 3.6.). In the following, we will study this model closer and extend it by including the environment.

2.2. Our Model and Its Possible Extensions Before coming to the discretization step and subsequent stability analysis, we present our recent model. In particular, we introduce an additional vector C(E) to describe the environmental state at a given time point. Here, C(E) can be adapted to any system under investigation. The advantage of this additional vector is the inclusion of real-time data for predicting the future development of the current state and its growth. No need to say that the factor can only be employed after gene expression has been successfully modeled and when some effects of growth conditions on gene expression in the organism under consideration are known or supposed.

2.2.1. A Quasi-Linear, Multiplicative Model

Let the n-column vector E = E(t) consist of gene-expression data at different times t. We denote the given finite set of experimental results as E¯0, E¯1,..., E¯l−1, where each n E¯κ ∈ R corresponds to the gene profile taken at an increasing sequence of sample times t¯κ. Mathematical and Data Mining Contributions... 155

Gebert et al. [30] refined the time-continuous model first formulated by Chen et al. [13] by taking into account that the interaction between variables is nonlinear but the number of associated regulating influences is bounded. This model was represented by the multiplica- tive nonlinear or, as we say, quasi-linear form of a continuous equation

(CE) E˙ = M(E)E.

Here, we refer to corresponding initial values E(t0) = E0, mostly E0 = E¯0. Note that (CE) is multiplicative with respect to E, and autonomous. This implies that trajectories do not cross themselves. The matrix M(E) is defined component-wise by a family of any class of functions including unknown parameters.

2.3. Gene Regulation — An Example The form of the matrix M(E) may be derived from the following system of differential equations [30, 66]:

ji ki ˙ + − Ei = ci − δiEi + reg fi,j + reg fi,k (i = 1, 2, . . . , n). j X=1 Xk=1 Here, the parameters ci ≥ 0 and δi ≥ 0 are coefficients or rates representing basic synthesis or basic degradation, and the sums correspond to activation and inhibition by other network + − components, respectively. These activation and inhibition functions reg fi,j and reg fi,k have been shown to possess a sigmoid shape [68]. The resulting matrix has the entries

mii−1 ci Ei mii := − δi + αii mii mii (i = 1, 2, . . . , n), and Ei Ei + θii mij −1 Ej mij = αij mij mij (i, j = 1, 2, . . . , n; i 6= j) Ej + θij

+ with αij ∈ R and θij, mij ∈ R0 . All parameters are collected in a vector y and can be estimated by data from DNA-microarray experiments. These representations of mii and mij reveal the discontinuities of poles. Taking into account that the expression levels are lying in certain bounded windows and the matrix coefficients cannot be unbounded, we can stay away from such singularities. Genes which are not expressed could be excluded from our study in advance. Some first discussions of the functional nature of M(E) and of its entries were undertaken in [24, 25]. T As an easy example of a (2 × 1)-vector E = (E1,E2) , e.g., the matrix M(E) could be [24, 25]

2 a1E1 + a2E1E2 a3E2 cos(E1) + a4 Ma1,a2,a3,a4 := 2 . a5,a6,a7,a8 a cos(E ) + a E a E + a E  5 2 6 1 7 1 8 2  Here, we have eight parameters in total and we note that the polynomial, trigonometric, but otherwise also exponential, etc., entries represent the growth, cyclicity or other kinds of changes in the concentrations. 156 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

Consequently, two different stages of problem enter our consideration concerning the parametrized entries of the matrices M(E). Firstly, the optimization problem of discrete (least-squares) approximation which can be written as

l−1 ˙ 2 minimize kMy(E¯κ)E¯κ − E¯κk , y 2 κ=0 X ˙ the vectors E¯κ being of the form of difference quotients based, however, on the experimen- tal data E¯κ and the times h¯κ = t¯κ+1 − t¯κ between the sampling [24, 26]. We are referring to the Euclidean norm k·k2. Here, the least-squares methods of linear and nonlinear regres- sion are used to estimate the vector y of a first part of the parameters to fit the set of given experimental data and to characterize the statistical properties of estimates. For a closer look on the parameter estimation and statistical learning we refer to [6, 35]. Secondly, we investigate which components of the remaining parameter vector, say x, have a stable, and which ones have an unstable influence on the dynamics. For a closer presentation of this bilevel problem from parametric optimization, we refer to Subsection 3.6. and [24, 44]. A very important feature of (CE) to which we shall always return in this paper is the following: The system (CE) allows a time-discretization such that the dynamics is given by a step-wise matrix multiplication. This recursive property is a key advantage of the form (CE) for an algorithmic stability analysis.

2.3.1. Nonlinear Model with Quadratic or Higher Degree Polynomials An extension to (CE) was considered by Yılmaz [69] and Yılmaz et al. [70] by proposing a polynomial form of E˙ = f(E), T n where f = (f1, f2, . . . , fn) is a n-tuple of nonlinear functions depending on E ∈ R . To be more specific, for representing the influence of gene i on j the authors considered the quadratic (constant, linear) functions

2 fj,i(x) := aj,ix + bj,ix + cj,i, where x = Ei denotes the concentration of gene i and aj,i, bj,i, cj,i ∈ R are the correspond- ing coefficients of this dependence. The functions fj,i are the ith additive terms in the jth coordinate function fj of the mapping f from the right-hand side. Note that in comparison to the model (CE) with its multiplicative form M(E)E, now the vector C ∈ Rn coming from the absolute effects cj,i means a parametrically enrichment, a constant additive “shift” on the right-hand side: E˙ = M(E)E + C. In [69, 70], the least-squares approximation errors of linear and nonlinear, mainly quadra- tic, models are compared: the quadratic model is understood to be better with respect to both the goodness of data fit and, after time-discretization, the goodness of future state prediction. In case of cubic or even higher-degree polynomials, the accuracy increases, but unfortunately, so does the numerical instability as well as the statistical variance. For this classical trade-off between accuracy and complexity in the theory of inverse problems we refer to [6]. Mathematical and Data Mining Contributions... 157 2.4. The Extended Model The model extended by Yılmaz et al. [69, 70] allows the nonlinear interactions and uses affine linear terms as shifts. Unfortunately, the matrix-multiplicative form of the recursive iteration mentioned in [25] is lost by these shift terms, at the first glance. However, we shall return to the form (CE), realized in higher dimensions. Now, let us make the following affine addition [66]: (ACE) E˙ = M(E)E + C(E). The additional column vector C(E) can represent the environmental perturbations and pro- vide better least-squares approximations, which has in a first step been guaranteed by re- garding the basic case where C(E) is constant, i.e., C(E) ≡ C. In contrast to M(E)E, the second term (shift) C(E) does not need to reveal E as a factor, but exponential or trigono- metric. In case where M(E) and C(E) are polynomials, component-wisely understood, M(E)E may have a higher degree than C(E). On the other hand, the term C(E) may also be understood as noise caused by the environmental factors. Relatively, this additive term, C(E), is considered to be “small” when compared to the factorized part M(E)E. Mathematically, one speaks of a normal form when finding an additive decomposition of the right-hand side where the terms are ordered according to their degrees. In fact, envi- ronmental effects such as emissions, poison in water or food, dangerous drugs, etc., are screened to form the right-hand side of the differential equation. For a wider mathematical discussion of such a kind of unfolding in terms of singularity theory, differential equations, catastrophe theory and optimization theory, we suggest [5, 10, 44]; for the background of generalized additive models in statistical learning, we refer to [35]. Although the matrix-multiplicative recursive iteration idea allowed by a time-discreti- zation of (CE) with its multiplicative form M(E)E [25] is lost by adding shift terms, bio- logically it is important to include the environmental changes: both in short term and long term. T Thus, for instance, Eˇ(t) = (Eˇ1(t), Eˇ2(t),..., Eˇm(t)) is a specific m-vector rep- resenting m environmental factors that might affect the gene-expression levels and their variation. We shall pay attention to the case where Eˇ is constant, i.e., coinciding with an initial value, or piecewise constant; but our general modeling will be wide enough to al- low larger types of variability in time. Herewith, Eˇ appears as a portion of the solution trajectories, together with the gene expression levels. Some of the components Eˇi may represent short-term factors affecting and others may be regarded as long-term factors. Of course, the weights of the effect of the jth environmental factor Eˇj on the gene-expression data Ei should also be represented by the system. This could be achieved by introducing the weight matrix Mˇ (E). Hence, our approach in overcoming the more complex form of (ACE) algorithmically is that C(E) can be written as

C(E) = Mˇ (E)E,ˇ

where c11(E) ··· c1m(E) ˇ . .. . M(E) =  . . .  c (E) ··· c (E)  n1 nm    158 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al. is an (n × m)-matrix. Here, cij represents the effect of the jth environment factor on the ith gene. This approach was for the first time done by Tas¸tan [56] and Tas¸tan et al. [57], but in the case where n = m and Mˇ (E) was the diagonal matrix Mˇ (E) := diag(C) T with C := (c1, c2, . . . , cn) . However, in this paper we allow m environmental factors to affect the n number of genes, and hence, the (n × m)-matrix Mˇ (E) may be regarded as a m ˇ gene-environment matrix. Moreover, in our proposed generalized model, j=1 cij(E)Ej should be understood as the total effect of the overall environmental factors on the gene P concentration Ei. On the other hand, the matrix Mˇ (E) comprises the dependencies of the gene-expression data and its environment. Herewith, the gene-environment system (ACE) is equivalent to E˙ = M(E)E + Mˇ (E)E.ˇ Now, introducing the (n + m)-vector and the (n + m) × (n + m)-matrix

E M(E) Mˇ (E) E := and M(E) := Eˇ 0 0     so that we end up with the following form of an extended initial value problem

E (CE) E˙ = M(E)E, E = E(t ) = 0 , ext 0 0 Eˇ  0  where Eˇ0 is the initial effect of the environmental factors on E. We emphasize that with ˙ (CE)ext we have reconstructed the multiplicative form of (CE) E = M(E)E, with an inclusion of the environmental items now. If the jth environmental factor Eˇj is considered to affect some gene-expression data, say the ith, then initially the jth component of Eˇ0 is considered to be 1, otherwise 0. Notice that the weight of such an interaction is determined by the entry cij of the matrix Mˇ (E). Hence, it is also possible to regard the components of Eˇ as “switches” controlling the effects of the environmental factors: the 1 in the jth component of Eˇ0 means that the jth environmental factor is “switched on” to affect the gene-network, while 0 means that it is “switched off”. Theoretically, the initial state Eˇ0 can be any other vector. For an arbitrarily given initial state, however, the matrix Mˇ (E) can be altered (dividing the jth column of Mˇ (E) by the jth component of Eˇ0, provided that it is nonzero) in order to normalize the given initial state to the constant vector Eˇ0 containing 1’s and 0’s. In fact, by the corresponding initial value Eˇ(t0) = Eˇ0, we see that the naturally ˙ time-dependent variable Eˇ is constant (i.e., Eˇ = 0) and, hence, identically Eˇ ≡ Eˇ0. This is because we do not include the “dynamics” of the environment directly. However, this is not a leak in our model, but a hybridizing: effects of the dynamics of the environment can be regarded as an accompanying system of equations to switch “on” or “off” the corresponding environmental factor. If, for instance, some certain thresholds are prescribed in order to take into account the changes in the dynamics of the environmental factors, then the gene expression represented by (CE)ext undergoes a sudden change of the state at the time when a certain quantity passes any of the thresholds: the coefficient matrix M(E) of system (CE)ext changes. The final value of the system is, then, taken as an initial condition for the newly born system. A new state (trajectory) of the dynamics starts; this leads us to the study of “hybrid systems” (cf. Section 3.4.) or a very challenging theory of “impulsive differential equations” and discontinuous dynamical systems. We refer to [4, 2, 35, 3, 27, 48] and Mathematical and Data Mining Contributions... 159 references therein for interested readers. In either case, because the discontinuities are on the derivatives, the trajectories of our model for the gene-expression data are continuous but have a nonsmooth nature. On the other hand, by means of turning from dimension n to dimension d = n + m, we have m further coordinates unfolding the effects of the environmental factors on the gene- expression data. Hence, in this paper, we (without further drawbacks) combine and benefit from both the affine term structure to model gene-expression patterns for more accurate least-squares approximations and more precise future predictions, and the time-continuous iterative matrix multiplication approach. For this purpose we work in the higher dimension d = n + m.

3. Time-Discretization and the Stability Analysis

Discretization concerns the process of approximating continuous models and equations by their discrete counterparts. A numerical solution generated by simulating the behavior of a system governed by ODEs, initiated at t0 with given initial value E0, is an approximation to the solution at a discrete set of points. We follow trajectories with approximate solution values. Hence, choosing a suitable numerical method applied on the time-continuous model is an extremely important task. Euler’s method, the simplest case of time-discretization, has been used for gene-expression patterns, but we know that it is slow and inaccurate. For some further information we refer to [19]. Thus, on our way of using more refined and convincing techniques, we use a Runge-Kutta discretization method.

3.1. Runge-Kutta Method While solving ODEs numerically, we are faced with two kinds of errors, namely, the rounding error as a result of finite precision of floating-point arithmetic and, secondly, the truncation error associated with the method used. For example, in Euler’s method the trun- cation error is by far larger because the curve E(t) is approximated by a straight-line be- tween the end-points tk and tk+1 of time intervals. In addition, Euler’s method evaluates derivatives at the beginning of the interval, i.e., at tk which makes the method asymmetric with respect to the beginning and the end of the interval. Hence, more symmetric integra- tion methods like Runge-Kutta (RK) which takes into account the interval midpoints, can be applied on the system (CE)ext. Runge-Kutta methods have the advantage of stability which is closer to the stability of the given time-continuous model. RK methods use the information at time tk only, which makes them self-starting at the beginning of the integration and also lets methods become easy to program. This accounts in part for their popularity [39]. A central idea of applying RK methods to model of gene-expression patterns was first introduced by Ergenc¸and Weber [20]. Here, we illustrate the application of a particular RK method, called Heun’s method. Heun’s method is a modified version of Euler’s method, more illustrative, explicit and the simplest case of the Runge-Kutta approach. In our ex- tended model space, it is formulated as follows: h E = E + k (k + k ), k+1 k 2 1 2 160 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al. where

k1 = M(Ek)Ek, and

k2 = M(Ek + hkk1)(Ek + hkk1).

More explicitly, we write

h h E = E + k M(E )E + k M(E + h M(E )E )(E + h M(E )E ), k+1 k 2 k k 2 k k k k k k k k h h = I + k M(E ) + k M(E + h M(E )E )(I + h M(E )) E . 2 k 2 k k k k k k k   Defining

h h M := I + k M(E ) + k M(E + h M(E )E )(I + h M(E )), k 2 k 2 k k k k k k we get the following time-discrete equation

(DE)ext Ek+1 = MkEk.

Thus, we iteratively approximate the next state from the previous one. We note that since the experimental results are represented as E¯0, E¯1,..., E¯l−1, we can represent the approxi- mations by El, E2,..., El−1 in the following way: setting E0 = E¯0, the kth approximation is calculated as b b b b Ek = Mk−1(Mk−2 ··· (M1(M0E¯0))) (k ∈ N0).

Formula (DE)extbis the key for understanding how we obtain our gene-environment k network from the time-discrete dynamics. In fact, if mij is the entry of Mk related with the k ith row and the jth column, then mij is the coefficient of proportionality (i.e., multiplied by Ej) so that the ith gene (or environmental factor) becomes changed by the jth gene (or environmental factor), in the step from time-point k to time-point k + 1. While the genes and environmental factors are represented by the nodes (vertices) of our (dynamical) network, the interactions between them appear to be the edges, weighted with those coefficients. To the analysis of these networks, the wealth of discrete algorithm applies both, statically and dynamically. The subjects of this analysis are connectedness, clusters, shortest paths, subnetworks, and so on. According to our modeling, we do not include any effects which genetical information might examine on the environment. Having such a multiplicative formula for prediction has a great analytical and numerical advantage also. Now, according to our motivations of stability analysis given in Subsec- tion 1.2., these iterative matrix multiplications in front of the given initial state E¯0 force us to consider the stability and boundedness of the solution. Thus, we investigate the questions concerning how products of matrices Mk look like. What is the structure of the product and what does the block structure say about boundedness or unboundedness of the products of finitely many matrices? Mathematical and Data Mining Contributions... 161 3.2. Algebra of Matrix Products Let us remember that the matrix in the time-continuous model has the canonical form M(E) Mˇ (E) M(E) = , 0 0   where M(E) is an (n × n)- while Mˇ (E) is an (n × m)-matrix. These matrices help us for representing the relations between the genes and the environ- mental factors, and for understanding the structure of gene and gene-environment networks. To be more precise, the matrices Mk which will come from those matrices M(E) will be the basis of our networks. The product of two matrices having that canonical block form is again a matrix with the same structure, because for any X,Y ∈ Rn it holds:

M(X) Mˇ (X) M(Y ) Mˇ (Y ) M(X)M(Y ) M(X)Mˇ (Y ) = 0 0 0 0 0 0       ˇ := M(X,Y ) M(X,Y ) . " 0 0 # f f Matrix multiplication is not performed in the case of the time-continuous model, but we try to understand whether our matrices Mk and their products in the time-discrete iterative system have some “canonical” block form or not. After some notation, simplification and by definition of Mk, we find that

2 hk M(E ) Mˇ (E ) A A h B B M = I + k k + k , k 2 0 0 0 0 2 0 0     e e where I = Id ((d × d)-unit matrix) with d = n + m, and

A := M Ek + hk M(Ek)Ek + Mˇ (Ek)Eˇk , ˇ ˇ ˇ A := M Ek + hk M(Ek)Ek + M(Ek)Ek , ˇ ˇ B := M Ek + hk M(Ek)Ek + M(Ek)Ek M(Ek), e ˇ ˇ ˇ B := M Ek + hk M(Ek)Ek + M(Ek)Ek M(Ek).   We conclude thate Mk has also its final canonical block form \ \ˇ M(Ek) M(Ek) . " 0 Im # Here, one of our main questions concerns iterative multiplication of matrices having the same form with model Mk. In the following section, for our stability analysis we have to study these matrices Mk in detail. Is the form of the product of two and, by induction, finitely many matrices Mk? By using A, B, C, D to represent the corresponding block matrices, we calculate: b b b b A B C D AC AD + B K L = =: . 0 I 0 I 0 I 0 I  m   m   m   m  b b b b b b b b b b b 162 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

Consequently, we observe that any finite product of matrices in the extended space preserves the same structure as a single matrix Mk. In fact, multiplying any canonical matrix Mk by a T ˇT T T ˇT T vector (E , E0 ) reproduces a vector (E , E0 ) of the same type. For this reason, there is no restriction if we focus our attention on the first n coordinates of the vectors and on the first n rows of our matrices. Herewith,e we pay attention to the south-east block Im, reflecting that necessarily Eˇ ≡ Eˇ0. By linear algebra it is easy to see that the matrices

K L and K 0 I  m  b b have the same eigenvalues if we disregard the ones comingb from the (m × m)-unit matrix. In fact, the additional eigenvalue 1 has its algebraic multiplicity equal to its geometrical multiplicity as it is being requested for any eigenvalue λ with |λ| = 1 to ensure stabil- ity [32]. This enables us for doing a similar, n-dimensional stability analysis as it has been performed for (CE) in [25]. From some point of view, stability is a condition on a well behaving of dynamical systems under initial perturbations around equilibrium points, including also parametric variations, e.g., of entries in model matrices and vectors. This can be thought as a char- acterization of environmental changes (perturbation) given to the system, of disease or the treatment of the cell by some medicine or radiation. Since gene-expression values lie in a bounded region, if not modeled or scaled differently, unstable solutions indicate an unsat- isfactory data fit such that the statistical learning has to go on [35]. Figure 1 depicts the real data connected by linear interpolation (cf. the left side) and how solutions of a model behave (cf. the right side): On the one hand, based on given data for 8 genes sampled at 7 time points, we may expect regular as well as irregular (unbounded) behaviors. On the other hand, by a learning process combined with a stable model, the solutions behave controlled and within natural bounds of expression levels.

Figure 1. Real data [55] and model stability [56, 57].

We start with a mathematical definition of stability of a time-continuous system:

Definition 1. A point E∗ ∈ Rd is called an equilibrium point of system

(S) E˙ = f(t, E), Mathematical and Data Mining Contributions... 163 where (t, E) ∈ R × Rd if f(t, E∗) = 0 for all t ∈ R. An equilibrium E∗ of (S) is called stable (in the Lyapunov sense) if for every ε > 0 there exists δ = δ(ε) > 0 such that at time t = t0 it satisfies kE(t) − E∗k ≤ δ,

and for all t > t0 it holds kE(t) − E∗k < ε.

A common method for demonstration of stability is to find a Lyapunov function for the considered system. However, the problem of finding a such a suitable function arises because there is no general rule for establishing such functions [9]. Therefore, an algorith- mic method which studies stability and introduces Lyapunov functions in the time-discrete case has first been introduced by Brayton and Tong [9], where connection between stability of time-continuous and time-discrete systems has been studied to some extent. Herewith, in our paper, besides the analytical side of our research, we prepare an algorithm and our insight in its working. The algorithmic theory is in detail explained in [7, 9] for the case of Euler discretization, and in [20, 56, 57], corresponding with our extended model and Runge-Kutta discretization used. In any of these cases, the procedure bases on the study of a sequence of polyhedra by which we observe the virtue on matrices applied, i.e., stability or instability of the dynamics to become detected. Stability of our time-continuous model (CE)ext describing gene-expression profiles is strongly related with the stability of the time-discrete system (DE)ext obtained by Euler’s method, as the following statement shows.

Theorem 1. Let the map E 7→ M(E) be Lipschitzian. If the Eulerian time-discrete system,

d Ek+1 = [I + hkM(Ek)]Ek (k ∈ N0, E0 ∈ R ), with some appropriate hmax > 0 being given, is stable for all hk ∈ [0, hmax], then the time-continuous system E˙ = M(E)E is also stable.

Proof. See [9].

Based on our knowledge that the Runge-Kutta discretization methods, explicit ones like Heun’s method in this paper, or implicit like Trapezoidal or implicit Euler’s, allow a stability behavior that is more closer to the given autonomous system than the one which Euler’s discretization provides. However, it is important to note that even if the discretization is implicit, the resulting discrete dynamical system has to be explicit to apply our stability analysis by multiplying a finite set of matrices. In this respect, Theorem 1 implies the following result from [66] which is quite important in our stability analysis.

Corollary 1. Let the map E 7→ M(E) be Lipschitzian. If the Runge-Kutta time-discrete d system of Heun’s type Ek+1 = MkEk (k ∈ N0), E0 ∈ R , as in (DE)ext, some appropriate hmax > 0 being given, is stable for all values hk ∈ [0, hmax], then the time-continuous system E˙ = M(E)E is also stable.

Proof. See [66]. 164 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

In [9], it is demonstrated how the stability of a time-discrete model, being (DE)ext here, can be investigated by the stability of some finite set of matrices M = {M0, M1,..., Ml−1}. We note that this set is derived by us with discretely approximat- ing the range d M(E, h) | E ∈ R , h ∈ [0, hmax] , n o where M(E, h) := I + hM(E) and hmax > 0 chosen, under extremal (worst-case) consid- erations on finitely many matrices multiplied. However, since we are using RK method, in our case, M(E, h) takes the form: h h M(E, h) := I + M(E) + M(E + hM(E)E)(I + hM(E)). 2 2 In fact, we are discretizing the function M(E, h) in a way that the values of the implied matrix entries are taken at their maximal or minimal values, and h is (by hmax) taken extremal as well. When iteratively applying the resulting entire matrices to polyhedral neighborhoods of the origin 0d, then we represent and understand the worst-case growth behavior of any finite matrix multiplication, i.e., whether instability is holding.

3.3. Stability Analysis of a Set of Matrices We like to perform a stability analysis in order to (a) validate our gene-expression model and (b) analyze the genetic network involved in photobiological hydrogen production by it- eratively changing the number of participating genes. Thus, after modeling gene-expression data, the stability analysis explained below means a crucial part of our research. Let M = {M0, M1,..., Ml−1} be a set of given real matrices. There should not be any confusion with the usage of the notation Mk used in Section 3. for the kth iterate of the time- discrete dynamics. We will consider the larger multiplicative semigroup M′ containing all finite products of matrices generated by M. In other words,

k ′ ls M := Ms : Ms ∈ M, ls ∈ N, s ∈ {1, 2, . . . , k}, (s=1 Y k Ms 6= Ms+1 ∀s ≤ k − 1, k ∈ N, ls = p, p ∈ N . s=1 ) X Since our dynamical analysis bases on the linear algebra of matrices, especially, on the spectral study of eigenvalues, we have to locate our study over the complex numbers rather than the reals.

d Definition 2. The finite set M is stable if for every neighborhood of the origin 0d, U ⊆ C , ′ there exists another neighborhood of the origin 0d, U˜ such that, for each M ∈ M it holds: MU˜ ⊆ U. Brayton and Tong [9] proved that M is stable if and only if B∗ is bounded, where

∞ ∞ ∗ j ′ B := Bj, with B := conv M ′ B and k ≡ k − 1 (mod ℓ), k  k k−1 j j [=0 [=0   Mathematical and Data Mining Contributions... 165

B0 being some initial neighborhood of the origin 0d. We note that in the argument of the convex hull operator conv, two kinds of variations are combined: products (powers) of a matrix to any finite order and the “turnaround” modulus given by mod (modulo).

3.3.1. On the Algorithm The algorithm of Brayton and Tong focuses on the special combinatorial structure of polyhedral sets Bk and, hence, it has many advantages. We can analyze a set of matrices, derived as explained above, and we can decide for which combination of these matrices the underlying dynamical system is stable or not. As the matrices represent biological, further ecological and medical information (which are affected by errors, estimation, environmen- tal factors, etc.), this approach can support work in these areas very comfortably. This feature we can employ to examine, e.g., genetic subnetworks participating in photobiolog- ical hydrogen production and, more generally, to better understand the gene-environment networks and subnetworks. The question of stability is answered by automatically gener- ated Lyapunov functions [9, 25]. There are further procedures possible. However, with the proposed algorithm of Brayton and Tong [9], implemented by Pickl, Tas¸tan and Weber, the boundary manifold between stability and instability regions can be analyzed in detail. The great advantage of the algorithmic attempt of Brayton and Tong lies in the fact that the (numerically) determined boundary between stability and instability regions can be captured as precisely as it is should be. However, there is still an open discussion about how fine this discretization should be made. What are the determinants: the experimental design in the lab, the quality of the measurement devices or, e.g., the environmental influences assessed in the model? For that reason, the procedure can be adjusted and used to develop a qualitative analysis of the real-world situation, its model and forecast.

3.4. Modeling Gene Regulatory Networks with Piecewise Linear Differential Equations The mixed continuous-discrete model introduced in [30] implies the most relevant regu- lating interactions in a cell, and it is a complementary approach to the one introduced in the previous section. Collecting the concentrations of mRNA molecules for n individual genes at time t in the vector E(t), the authors present a complete description of the dynamics by a hybrid system. Collecting the concentrations of mRNA molecules for n individual genes at time t in the vector E(t), a complete description of the dynamics is given by ˙ E(t) = Ms(t)E(t) + Cs(t), where s(t) := F (Q(E(t))),

Q(E(t)) = (Q1(E(t)),...,Qn(E(t))), (HS) 0 for Ei(t) < θi,1 1 for θi,1 ≤ Ei(t) < θi,2 Qi(E(t)) :=  .  .  di for θi,di ≤ Ei(t), with θ < θ < ··· < θ , M being an (n × n)-matrix and k ∈ Rn. Here, i,1 i,2 i,di s(t)  s(t) Q : Rn → Nn implies the information about the thresholds; F (Q(E)) indicates in which 166 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al. part of the state space the system is located at the state E and which matrix M and vector C have to be chosen to specify the system in a way most convincingly approximating the given data. The function F : Nn → N has to be injective, such that a different pair (M,C) is used whenever a threshold is exceeded. This kind of modeling with a continuous and a discrete part is called a hybrid system. Sometimes, based on a suitable method of time-discretization or statistical learning, a time-discrete model is preferred rather than a one with differential equations:

E(k + 1) = Ms(k)E(k) + ks(k),

s(k) = F (Q(E(k − 1))) (k ∈ N0).

For mathematical modeling, i.e., to state our time-continuous (or -discrete) system, again we have to reconstruct the parameters from the experimental data. In the time-continuous model (HS), all values θi,j, all entries of Ms(t) and Cs(t) and, in addition, the functions F (Q(E)) and Qi(E)(i = 1, 2, . . . , n) have to be determined. There are two steps on how the parameter estimation is performed [30]:

(i) calculation of the matrices and vectors describing the system in between thresholds, and then,

(ii) estimation of the thresholds.

According to [30], for practical reasons, assuming that we already know all thresholds which partition the state space into several cuboids, the problem of extracting Ms(t) and Cs(t) for a given cuboid is considered. Since inside of such a cuboid the equation (HS) simplifies to a system of ordinary linear differential equations, we can give analytical so- lutions for the corresponding parts of the state space. Now, for a certain cuboid Q∗, we can formulate our problem to find the parameters for the corresponding constant matrix M ∗ and the constant vector C∗ as an optimization problem: Minimization of the quadratic error between our E¯˙ and the right-hand side of the differential equations evaluated at the ¯ ∗ ∗ finitely many measurement points Eκα ∈ Q (α = 0, 1, . . . , l − 1) lying in the considered cuboid Q∗: l∗−1 ∗ ¯ ∗ ¯˙ 2 (LS) min kM Eκα + C − Eκα k (m∗ ),(c∗) ij i α=0 X ∗ ∗ ∗ ∗ with M = (mij)i,j=1,2,...,n and C = (ci )i=1,2,...,n. We briefly note that a unifying d = n+m-dimensional notation in the line of our paper could be used now and that we are then, minimizing with respect to α = M∗, which comprises M ∗ and C∗. Here, again, difference ¯˙ quotients Eκα serve for an approximation of the changes of concentration. Problem (LS) can be canonically treated by building the partial derivatives with respect to the unknown parameters and equating them to zero. Then, one has to solve the resulting equations, which ∗ ∗ are linear in the unknown parameters mij and ci , e.g., by Gaussian algorithm. Similarly expressed genes probably have similar functions also and their connections to other genes in the network should be the same. Two genes with almost the same ex- pression profile, even when being unrelated, do not give any information about whether their role in the network is different from each other or similar. Moreover, one can imply Mathematical and Data Mining Contributions... 167 biological a priori knowledge into the model, e.g., about degradation rates or about the net- work structure. In many cases, at least a lower bound γi,min for the degradation of gene i = 1, 2, . . . , n is known [30]. These constraints can be represented in the form of new parameters. We introduce a Boolean matrix Y = (yji)j,i=1,2,...,n by

1, if gene i regulates gene j y := ji 0, otherwise,  n so that j=1 yji is the number of genes which are regulated by gene i. Herewith, we obtain the following constrained minimization problem P l∗−1 ∗ ¯ ∗ ¯˙ 2 min kM Eκα + C − Eκα k (m∗ ),(c∗) ij i α=1 X n j=0 yji ≤ li (i = 1, 2, .., n) subject to m ≥ γ (i = 1, 2, . . . , n)  P ii i,min  mij ≥ 0 (i 6= j), which consists of a quadratic objective function with linear constraints. For this problem, numerical optimization methods can be applied. To determine all the thresholds θi,j, as one of the methods, Akaike’s Information Cri- terion [35] is used. For further information we refer [30] and, especially, concerning the parameter estimation in the case of the above time-discrete system, we refer to [2, 3, 27, 48].

3.5. On Additive Models, Spline Regression and Stochastic Differential Equations 3.5.1. Introduction Models received by spline approximation belong to the most refined ones. The papers [58, 59] refer to generalized additive models from statistical learning [35]. They can be employed in many areas of prediction, e.g., in our domains of genetics and environment, the latter one also including financial risk management, areas of chemistry and environ- mental protection. The parameter estimation is done with backfitting (or Gauss-Seidel) algorithm modified by minimizing an objective function which consists of the least-squares term plus a penalty term. The penalty represents our wish to control the energy of curvature that could easily make the estimation very unstable, but robustness is requested. Instead of the classical approach of separation of variables, [58] presents a new approach with input data grouped and density and variation indices studied for each group (cluster in an inter- val). Distinguishing between the clusters ordered by time intervals gives the opportunity to application on consecutive series of DNA microarray experiments and repeated environ- mental observations. As an alternative to penalty methods with their need to adjust the penalty parameter regularly, [58, 59] newly proposes the elegant model framework of conic quadratic programming and interior point methods [47]. Recently, a contribution [60] was given on generalized partial linear models with B-splines. Furthermore, a corresponding spline and optimization framework of the regression and classification method MARS [35] is in preparation by us, too. 168 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

At the end of this subsection, we pay special attention to modeling with stochastic dif- ferential equations which can be employed in all the areas of life and environment, including the economy and the financial sector, and to the corresponding parameter estimation.

3.5.2. Classical Additive Models Regression models, especially linear ones, are very important in many applied areas. However, the traditional linear models often fail in real life, since many effects are gener- ally nonlinear. To characterize this effects, flexible statistical methods like non-parametric regression must be used [21]. However, if the number of independent variables in the mod- els is large, many forms of nonparametric regression do not perform well. It is also difficult to interpret nonparametric regression depending on smoothing spline estimates. To over- come these difficulties, Stone [54] proposed additive models. They estimate an additive approximation of the multivariate regression function. Here, the estimation of the individ- ual terms explains how the dependent variable changes with the corresponding dependent variables and we can examine the predictor effect separately in the absence of interactions. We refer to [35] for the basic elements of the theory of additive models. Let we have N observations on a response (or dependent) variable Y denoted by y = (y1, y2, . . . , yN ) measured at N design vector xi = (xi1, xi2, . . . , xim). The points xi may be chosen in advance, or may themselves be the measurement of random variables Xm (j = 1, 2, . . . , m) or both. We note that the letter X could widely be replaced by the symbols E and E, as we used them in the previous sessions; but to avoid misunderstandings related with the use of E as the expected value, we write X. Now, the additive model is defined by m Y = β0 + fj (Xj) + ǫ, j X=1 2 where the error ǫ is independent of the variables Xj, and E (ǫ) = 0 and V ar (ǫ) = σ . Here, fjs are unknown, arbitrary and univariate functions. They are mostly considered to be splines, i.e., piecewise polynomial, since, e.g., polynomials themselves have a too strong or early asymptotic and by this they are not always satisfactory for data fitting. We denote estimates by fˆj. The standard convention for Xj is to assume E (Xj) = 0, since otherwise there will be a free constant in each of the functions [36].

3.5.3. Estimation Equations for Additive Models Additive models have a strong motivation as a useful data analytic tool. Each func- tion is estimated by an algorithm proposed by Friedman and Stuetzle [23], which is called backfitting (or Gauss-Seidel) algorithm. As the estimator for β0, the mean of the response variable Y is used: βˆ0 = E (Y ). This procedure depends on the partial residual against Xj:

rj := Y − β0 − fk (Xk), Xk=6 j and consists of estimating each smooth function by holding all the other ones fixed [38]. 2 m Then, E (rj|Xj) = fj (Xj) and minimizes E Y − β0 − j=1 fj (Xj) .  P  Mathematical and Data Mining Contributions... 169

In a framework of cycling from one to the next iteration, this means the following [37]:

ˆ ˆ1 initialization: β0 = E (Y ), fj (·) ≡ 0, p = 0 cycle: j = 1, . . . , m, 1,..., Iterate: p ← p + 1, for j = 1, . . . , m, do:

rj = Y − βˆ0 − fk (Xk) Xk=6 j ˆm fj (Xj) = E (rj|Xj)

2 ˆ m until: RSS = E Y − β0 − k=1 fk (Xk) lies below some threshold.  P  To prove its convergence, [11] used the normal equation for an arbitrary solution f˜ to reduce the problem to the solution of a corresponding homogeneous system. This algorithm was modified by [58] under the additional aim to keep low the oscillation of the splines and, by this, the numerical instability of the approximation [37]. Each iteration in the modified backfitting or Gauss-Seidel includes the additional penalized curvature term.

3.5.4. Penalized Regression Problems and Inverse Problems

In Subsection 2.4. and the Subsubsections 3.5.1., 3.5.2. and 3.5.3., we discussed about normal forms and generalized additive models. Such forms and models can also be found with respect to the representation of error terms. For example, the expected prediction error can be displayed as the sum of squared bias, variance and the irreducible error [35]. An- other example is given in the regularization of discrete ill-posed problems: There we are concerned with the trade-off between accuracy and stability of the inverse problem. We can optimize one of the targets subject to the other one bounded below a prescribed tolerance, ǫ or δ, respectively, or, as a third alternative, we minimize the sum of the least-squares error and the squared norm of the unknown parameter vector multiplied with a penalty parameter α2. But it could also be the squared norm of the unknown vector, multiplied with a matrix L which represents an operator on first or second order derivatives, to be more precise, the corresponding difference quotient at the discretization time points. For the resolution of this trade-off, so-called L-curves and filter factors serve, by using MATLAB regularization toolbox [34]. By them the right parameter α2 can be determined and an appropriate damp- ening of those terms in the representation of the computed estimation (by singular value decomposition) be found where almost vanishing singular values are the reason for insta- bility [6]. In our research [59, 58, 60], we have those penalty terms also, given by integrated second order derivatives, and we discretize them, leading to the use of a squared norm of an unknown vector multiplied with a matrix linear term L. Actually, there, we have sev- eral or, iteratively, successive penalty parameters, but it could also be a uniform one for all scalar terms, or a group-wise one for several terms penalized jointly. Furthermore, we also study the minimization of the residual sum of squares, subject to that discretized penalty term bounded from above by a tolerance. In this sense, we have found another example, where the theory and methods of so-called Tikhonov regularization and the regularization 170 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al. toolbox are applicable. In this sense, these important techniques yield a method for deter- mining a penalty parameter considered. This was an obstacle before in the applicability of Gauss-Seidel kind of methods for the regression problem with penalization. In [59, 58, 60], we could overcome it by turning to use conic quadratic programming (CQP). Herewith, CQP and Tikhonov regularization — modern methods from continuous optimization and the theory of inverse problems, respectively — became our new proposals for treating the regression problem from statistical learning. We point out that in certain applications of spline regression it is more meaningful to take into account first order partial derivatives in addition to the second order ones in the penalty term. For example, in the methodology MARS for regression and, in particular, classification [22, 35], we may prefer not too small areas (classes); therefore, we look for “flat” base spline functions. For this purpose, a penalization of larger absolute values first order derivatives together with the goal of small squared errors serves.

3.5.5. Regression Problems and Flows In Subsubsection 3.5.4., we mentioned about representing forms and models on errors targets, connected by a penalty parameter or vector; all of them were parametrized. When- ever anyone of these forms or models is regarded as an objective function to be minimized with respect to the parameter vector, optimization methods become applied. One analyti- cal class of such tools are systems of differential equations, among of them gradient flows and Newton flows [45]. Here, we are especially interested in the stationary points, i.e., in those parameter constellations where the right-hand side of the system vanishes or, at least, becomes norm minimal. In case we had a state variable such as consisting of the genetic or environmental expression levels, besides of the unknown parameters, then the parameter trajectories could become control variables. For an idea about a class of optimal control problems with a multivalued right-hand side of the system which is governing an objec- tive functional with an additive model form, we refer to the paper [16] that also contains a stability analysis.

3.5.6. Stochastic Differential Equations Introduction to Stochastic Differential Equations Many phenomena in nature, technol- ogy and economy, especially, in the financial sector are modeled by means of deterministic n differential equations with an initial vector x0 ∈ R :

x˙ = a (x, t) x (0) = x ,  0 dx where x˙ = dt . But this type of modeling ignores stochastic fluctuations and is not appropri- ate for describing various states from nature of for the stock prices. To consider stochastic movements, stochastic differential equations (SDEs) are used. They arise in modeling of many phenomena, such as random dynamics in physical, biological, engineering or social sciences and in economics. Solutions of these equations are often diffusion processes and, hence, they are connected to the subject of partial differential equations. We try to find a solution for this equation by an additive approximation, which is very famous in statistical Mathematical and Data Mining Contributions... 171 area, using the spline functions. Typically, a stochastic differential equation with initial condition is given by

X˙ = a (X, t) + b (X, t) δt (for almost every t ∈ [0, ∞)) X (0) = x ,  0 where a is a deterministic part, bδt is a stochastic part, and δt denotes a generalized stochas- tic process [46]. An example of a generalized stochastic processes is white noise. For a generalized stochastic processes, derivatives of any order can be defined. Suppose that Wt is a generalized version of a Wiener process which is used to model the motion of stock prices, which instantaneously respond to the numerous upcoming informations. We note the discontinuity and much higher nonsmoothness given for the solutions here compared with the ones of the piecewisely defined hybrid systems from Section 3.4.. A Wiener pro- cess is a time continuous process (0 ≤ t ≤ T ) with property (Wt ∼ N (0, t)), then Wt can be differentiated. White noise δt is then defined as a Wiener process, actually, is can be obtained by smoothening of white noise. If we replace bδtdt by dWt, then, an Itoˆ stochastic differential equation can be rewritten as

dXt = a (Xt, t) dt + b (Xt, t) dWt,

where a (Xt, t) and b (Xt, t) are the drift the diffusion terms, respectively, and Xt is a solution looked for based on the experimental data. We want to simulate the values of Xt since we do not know its distribution. For this reason, we simulate a discretized version of the SDE.

Discretization of SDEs There are a number of discretization schemes available, we choose Milstein scheme: Here, we write an approximation for Xt as

X¯ (tj+1) = X¯ (tj) + a X¯ (tj) , tj · (tj+1 − tj) ¯ + b X (tj) , tj · (W (tj+1 ) − W (tj)) 1 + bb′ X¯ (t ) , t · [W (t ) − W (t )]2 − (t − t ) . 2 j j j+1 j j+1 j   Then,  

˙ ∆Wj X¯j = a X¯ (tj) , tj + b X¯ (tj) , tj · h¯j   2 1 ′ ¯ (∆Wj) + bb X (tj) , tj · ¯ − 1 , 2 hj !   ˙ where the prime denotes the derivative, the vector X¯j comprises difference quotients based on the jth experimental data X¯j and on step length h¯j := t¯j+1 − t¯j between neighboring samplings times. In the following we use the following notation, where X¯j approximates X¯ (t¯j) and, e.g. (cf. [24]),

X¯j+1−X¯j ¯ , if j = 0, 1, 2,...,N − 1, ¯˙ hj Xj := ¯ ¯  XN −XN−1 ¯ , if j = N.  hN−1  172 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

Since Wt ∼ N(0, t) and the increments ∆Wj are independent on non-overlapping in- tervals, and moreover Var ∆W¯ j = ∆t¯j, hence, the increments having normal distribution can be simulated with the help of standard normal distributed random numbers Z . Here-  j with, we can obtain a discrete model for a Wiener process,

∆W¯ j = Zj ∆t¯j (Zj ∼ N (0, 1)) ∀j. q If we use this value in our discretized equation, we obtain

˙ Zj X¯j = a X¯j, t¯j + b X¯j, t¯j · h¯j   1 q + bb′ X¯ , t¯ · Z 2 − 1 . 2 j j j    For simplicity we write this equation as

¯˙ ¯ ¯ ¯ ′ ¯ Xj = Gj + Hjc + HjHj d,   ¯ 1 2 ¯ ¯ ¯ ¯ ¯ ¯ where c := Zj/ hj , d := 2 Zj − 1 , Gj := a Xj, tj and Hj := b Xj, tj . q    Parameter Estimation for SDEs To estimate the unknown functions G¯j and H¯j, we consider the following least-squares optimization problem:

N 2 ¯˙ ¯ ¯ ¯ ′ ¯ min Xj − Gj − Hjc − HjHj d . y 2 j=1   X

Here, y is a vector which comprises all the parameters in the Milstein model. We know that data coming, e.g., from the stock market has high variation. Indeed, for example, investors may temporarily pull financial prices away from their long term trend level. Over- reactions may occur so that excessive optimism may drive prices unduly high or excessive pessimism may drive prices unduly low, new theoretical and empirical arguments can be effected share prices which can fall dramatically, even though, to this day, it is impossible to fix a definite cause: A thorough search failed to detect any specific or unexpected devel- opment that might account for the crash and, many studies have shown a marked tendency for the stock market to trend over time periods of weeks or longer, sometimes the market tends to react irrationally to economic news, even if that news has no real effect on the technical value of securities itself, and so on. Then, we must use a parameter estimation method which will diminish this high vari- ation and will give a smoother approximation to data. Splines are more flexible and they allow us to avoid large oscillation observed for high degree polynomial approximation. We recall that these functions can be described as linear combinations of basis splines and ap- ˙ proximate X¯j, t¯j the data smoothly. Therefore, we use spline approximation for each function G¯j = a X¯j, t¯j and H¯j = b X¯j, t¯j characterized by a separation of variables   Mathematical and Data Mining Contributions... 173

(coordinates). This means, e.g.,

2 2 dp ¯ ¯ ¯ 1j ¯ 1j p p ¯ Gj = a Xj, tj = β0 + fp Zp = β0 + θl hl Zp , p=1 p=1 q=1  X  X X  2 2 dr ¯ ¯ ¯ 2j ¯ 1j p p ¯ Hj = a Xj, tj = β0 + fp Zr = β0 + θl hl Zp , r=1 r=1 s=1  X  X X  where Z¯p = Z¯1, Z¯2 = X¯j, t¯j . If we denote the kth order base spline by hη,k, a polynomial of degree k, with knots, say x , then a great benefit of using the base splines is   η provided by the following recursive algorithm:

1, x ≤ t x h (t) := η η+1 (1) η,1 0, otherwise, 

t − tη tη+k+1 − t hη,k (t) = hη,k−1 (t) + hη+1,k−1 (t) , k ≥ 1. tη+k − tη tη+k+1 − tη+1

3.6. Related Topics and Future Projects In this paper, we concentrate on the dynamical aspect of gene patterns. This can be regarded as a dual problem of the underlying primal problem of parameter estimation. The parameter sets of both problems are complementary to each other, and both problems con- stitute a bilevel problem [67]. Indeed, having learned about the system of differential equa- tions (the model) based on the experimental and measurement data and by least-squares estimation, the stability study on the dynamics can be regarded as a testing of the goodness of data fitting for the model. We emphasize that in Section 3. the 0 matrix partitions can be filled, by this representing influences exercised by the genes (the biology) on the environment, or among environmen- tal items. An example for the latter interaction in presented in [65], where the items consist of the CO2 emission reduction and, on the other side, the financial means and the techno- logical level is the countries participating in the process of Kyoto Protocol. In future studies, our new work from Subsubsection 3.5.6. could become implied into the study made in [65]. By those developed matrices, the entire matrix calculus becomes much richer. The present paper serves for establishing the foundations. That matrix algebra and arithmetics is impor- tant in mathematical modeling and in our theory of dynamical systems as well. For finding the model we turn our least-squares problem, where the unknown parameters are entries of matrices and vectors, to a classical linear least-squares problem, where the parameters are ordered in a vector only. Now, the system matrix has taken the form of a block matrix. From the numerical viewpoint, any such a block structure needs to be acknowledged and exploited. As a special case, the matrices may even be sparse. We note that the chip, matrix-like form of the DNA microarrays could give rise to a study of a matrix differential equation instead of our system of ordinary differential equations. Concerning further matrix and discretization concepts, we refer to [15, 33]. 174 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

Let us point out that our model is a piecewisely linear one. Indeed, whenever a certain threshold is traversed, the parameter constellation changes instantaneously to a new affinely linear model setting. By this a mixed continuous-discrete character of nature becomes integrated. Such a hybrid system needs not only a estimation of the unknown parameters, but also an estimation of the thresholds, based on the given data. Furthermore, since gene or gene-environment networks are very huge and often algo- rithmically impracticable, we rarify them by imposing appropriate bounds on the outde- grees of the nodes. This has to be done very carefully, and together with the practitioners [24, 67, 66]. Research based on this paper done by us and our colleagues includes the aspect of errors and uncertainty which is a characteristic property of genetic and environmental measure- ments and the assessment of the interrelations between the expression levels and of the environment [62, 65]. For that reason, intervals are used; by them, least-squares estimation turns to some Chebychev approximation or, equivalently formulated, to a problem from generalized semi-infinite programming [62, 65, 67, 66].

4. Conclusion

In this study, from the viewpoints of statistical learning and dynamical system theory, for making models more realistic, especially, by emphasizing the environment, approxima- tive and better prepared for stability analysis, we improved the mathematical model and stability analysis. Thus, our study may support these techniques in order to achieve new insights by means of mathematical modeling, dynamical systems, optimization and com- binatorial algorithms. This paper focused very much on a motivation and an analytical preparation of our algorithm. A special attention was paid to the relations to inverse prob- lems, especially, Tikhonov regularization, and to modeling opportunities with stochastic differential equations. In a vast area of computational biology and medicine, however, gene-expression data are currently the most practicable source to get an holistic picture of the metabolic state of an organism. Both the proteome and the metabolome are not efficiently and completely accessible yet. Here, we describe the application of dynamical system theory to the anal- ysis of gene-expression data from hydrogen producing microorganisms. The ultimate goal is to optimize cell metabolism for higher hydrogen yields by means of optimized growth conditions and directed mutagenesis. After modeling the gene-expression data from differ- ent time points of experimental setups and subsequent discretization, we apply a stability analysis in order to test the quality of the modeling. Given the model is adequate, we obtain stable genetic subnetworks that yield candidate genes for the optimization of the biosystem. In our recent and future research, established on this paper, we further analyze the role of the environment in its widest sense and we include the fact that modern experimental technology and information processing are affected with errors and ambivalences. Our approach is wide enough to include and integrate various gene-environment interactions and uncertainties, and to provide a stability theory with different types of perturbations. Mathematical and Data Mining Contributions... 175 References

[1] Ahuja, R.K., Magnanti, T.L., and Orlin, J.B., Network Flow: Theory, Algorithms and Applications, Prentice Hall, N.J., 1993.

[2] Akc¸ay, D., Inference of Switching Networks by Using a Piecewise Linear Formulation, Institute of Applied Mathematics, METU, MSc thesis, 2005.

[3] Akhmet, M.U., Gebert, J., Oktem,¨ H., Pickl, S.W., and Weber, G.W., An improved al- gorithm for analytical modeling and anticipation of gene expression patterns, Journal of Computational Technologies 10, 4 (2005) 3–20.

[4] Akhmet, M.U., Kirane, M., Tleubergerova, M.A., and Weber, G.W., Control and op- timal response problems for quasilinear impulse integrodifferential equations, in: the feature cluster Advances of Continuous Optimization of European Journal of Opera- tional Research 169, 3 (2005) 1128–1147.

[5] Amann, H., Gewohnliche¨ Differentialgleichungen, Walter de Gruyter, Berlin, New York, 1983.

[6] Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press, 2004.

[7] Bechmann, E., Analyse und Konstruktion eines Algorithmus zur Untersuchung der Stabilitat¨ dynamischer Systeme. Mathematische Institut der Universitat¨ zu Koln,¨ Diploma thesis, 2005.

[8] Benemann, J., Hydrogen Biotechnology: Progress and prospects, Nat. Biotechnol. 14, 9 (1996) 1101–1103.

[9] Brayton, R.K., and Tong, C.H., Stability of dynamical systems: A constructive ap- proach, IEEE Transactions on Circuits and Systems 26, 4 (1979) 224–234.

[10] Brocker,¨ Th., and Lander, L., Differentiable Germs and Catastrophes, London Math. Soc. Lect. Note Series 17, Cambridge University Press, 1975.

[11] Buja, A., Hastie, T., and Tibshirani, R., Linear smoothers and additive models, The Ann. Stat. 17, 2 (1989) 453–510.

[12] Carbayo, M.S., Bornman, W., and Cardo, C.C., DNA Microchips: technical and prac- tical considerations, Current Organic Chemistry 4, 9 (2000) 945–971.

[13] Chen, T., He, H.L., and Church, G.M., Modeling gene expression with differential equations, in: Proc. Pacific Symposium on Biocomputing (1999) 29–40.

[14] Combettes, P.L., and Pennanen, T., Proximal methods for cohypomonotone operators, SIAM J. Control Optim. 43, 2 (2004) 731–742.

[15] Dekker, K., and Verwer, J.G., Stability of Runge-Kutta Methods for Stiff Nonlinear Differential Equations, North-Holland, 1984. 176 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

[16] Denkowski, Z., and Migorski,´ St., On sensitivity of optimal solutions to control prob- lems for hyperbolic hemivariational inequalities, in: Control and Boundary Analysis, Chapman & Hall / CRC, Lecture Notes in Pure and Applied Mathematics 240 (2005) 145–156.

[17] DeRisi, J., Iyer, V., and Brown, P., Exploring the metabolic and genetic control of gene expression on a genomic scale, Science 278 (1997) 680–686.

[18] Djordjevic, M., A biophysical approach to transcription factor binding site discovery, Genome Res. 13 (2003) 2381–2390.

[19] Dubois, D.M., and Kalisz, E., Precision and stability of Euler, Runga-Kutta and incur- sive algorithm for the harmonic oscillator, International Journal of Computing Antic- ipatory Systems 14 (2004) 21–36.

[20] Ergenc¸, T., and Weber, G.W., Modeling and prediction of gene-expression patterns reconsidered with Runge-Kutta discretization, special issue at the occasion of sev- entith birthday of Prof. Dr. Karl Roesner, TU Darmstadt, Journal of Computational Technologies 9, 6 (2004) 40–48.

[21] Fox, J., Nonparametric regression, Appendix to an R and S-Plus Companion to Ap- plied Regression, Sage Publications, 2002.

[22] Friedman, J.H., Multivariate adaptive regression splines, The Annals of Statistics 19, 1 (March 1991) 1–67.

[23] Friedman, J.H., and Stuetzle, W., Projection pursuit regression, J. Amer. Statist Assoc. 76 (1981) 817–823.

[24] Gebert, J., Latsch,¨ M., Pickl, S.W., Weber, G.W., and Wunschiers¨ R., Genetic net- works and anticipation of gene expression patterns, in: Computing Anticipatory Sys- tems: CASYS(92)03 – Sixth International Conference, AIP Conference Proceedings 718 (2004) 474–485.

[25] Gebert, J., Latsch,¨ M., Pickl, S.W., Weber, G.W., and Wunschiers,¨ R., An algorithm to analyze stability of gene-expression pattern, to appear in: special issue Discrete Mathematics and Data Mining II of Discrete Applied Mathematics, Boros, E., Ham- mer, P.L., and Kogan, A., guest editors (2005) 144–145.

[26] Gebert, J., Latsch,¨ M., Quek, E.M.P., and Weber, G.W., Analyzing and optimizing genetic network structure via path-finding, Journal of Computational Technologies 9, 3 (2004) 3–12.

[27] Gebert, J., Oktem,¨ H., Pickl, S.W., Radde, N., Weber, G.W., and Yılmaz, F.B., Infer- ence of gene expression patterns by using a hybrid system formulation – an algorith- mic approach to local state transition matrices, in: Anticipative and Predictive Models in Systems Science I, Lasker, G.E., and Dubois, D.M., eds., IIAS (International Insti- tute for Advanced Studies) in Windsor, Ontario (2004) 63–66. Mathematical and Data Mining Contributions... 177

[28] Gebert, J., and Radde, N., A new approach for modeling procaryotic biochemical networks with differential equations, CASYS’05, Seventh International Conference on Computing Anticipatory Systems (Liege, Belgium, August, 2005), Dubois, D., ed. (Liege, Belgium, August, 2005), AIP Conference Proceedings (Melville, New York) 839 (2006) 526-533. [29] Gebert, J., Nicole, R., Faigle, U., Strosser,¨ J., and Burkovski, A., Modeling and sim- ulation of nitrogen regulation in Corynebacterium glutamicum, preprint, Center for Applied Computer Science, University of Cologne, Germany, 2006. [30] Gebert, J., Radde, N., and Weber, G.W., Modeling gene regulatory networks with piecewise linear differential equations, to appear in the special issue (feature clus- ter) Challenges in Continuous Optimization in Theory and Applications of European Journal of Operational Research 181, 3 (2007) 1148–1165. [31] Gerland, U., et al., Physical constraints and functional characteristics of transcription factor-dna interaction, Proc. Natl. Acad. Sci. USA 99 (2002) 12015–12020. [32] Guckenheimer, J., and Holmes, P., Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector Fields, Springer, 1997. [33] Hairer, E., Nørsett, S., and Wanner, G., Solving Ordinary Differential Equations, I. Nonstiff Problems, Springer, 1993. [34] Hansen, P.Ch., Regularization Tools Version 3.1 (for Matlab Version 6.0), http://www2.imm.dtu.dk/pch/Regutools/˜ [35] Hastie, T., Tibshirani, R., and Freedman, J., The Elements of Statistical Learning – Data Mining, Inference and Prediction, Springer Series in Statistics, 2001. [36] Hastie, T., and Tibshirani, R., Generalized Additive Models, New York, Chapman and Hall, 1990. [37] Hastie, T., and Tibshirani, R., Generalized additive models, Statist, Science 1, 3 (1986) 297–310. [38] Hastie, T., and Tibshirani, R., Generalized additive models: some applications, J. Amer. Statist. Assoc. 82, 398 (1987) 371–386. [39] Heath, M., Scientific Computing: An Introductory Survey, McGraw-Hill, 2002. [40] Hoon, M.D., Imoto, S., Kobayashi, K., Ogasawara, N., and Miyano, S., Inferring gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations, in: Proc. Pacific Symposium on Biocomputing (2003) 17–28. [41] Huang, S., Gene expression profiling, genetic networks and cellular states: an integrat- ing concept for tumorigenesis and drug discovery, J. Mol. Med. 77 (1999) 469–480. [42] Ideker, T.E., Thorsson, V., and Karp, R.M., Discovery of regulatory interaction through perturbation: inference and experimental design, Pac. Symp. Biocomput. 5 (2000) 302–313. 178 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

[43] Jacob, F., and Monod, J., Genetic regulatory mechanisms in the synthesis of proteins, J. Mol. Biol. 3 (1961) 318–356.

[44] Jongen, H.Th., and Weber, G.W., On parametric nonlinear programming, Annals of Operations Research 27 (1990) 253–284.

[45] Jongen, H.Th., Jonker, P., and Twilt, F., Nonlinear Optimization in Finite Dimensions – Morse Theory, Chebyshev Approximation, Transversality, Flows, Parametric As- pects, Nonconvex Optimization and its Applications 47, Kluwer Academic Publishers, Boston, 2000.

[46] Kloden,¨ P.E, Platen, E., and Schurz, H., Numerical Solution of SDE Through Com- puter Experiments, Springer Verlag, New York, 1994.

[47] Nemirovski, A., Five Lectures on Modern Convex Optimization, C.O.R.E. Summer School on Modern Convex Optimization, August 26–30, 2002; http://iew3.technion.ac.il/Labs/Opt/opt/LN/Final.pdf.

[48] Oktem,¨ H., A survey on piecewise-linear models of regulatory dynamical systems, Nonlinear Analysis 63 (2005) 336–349.

[49] Ozcan,¨ S., Yıldırım, V., Kaya, L., Becher, D., Hecker, M., and Ozcengiz,¨ G., Phane- rochaete chrysoporium proteome and a large scale study of heavy metal response, in: HIBIT – Proceedings of International Symposium on Health Informatics and Bioin- formatics, Turkey ’05 (Antalya, Turkey, November 2005) 108–114.

[50] Radde, N., Gebert, J., and Forst, Ch.V., Systematic component selection for gene- network refinement, Bioinformatics 22, 21 (2006) 2674–2680.

[51] Radde, N., and Kaderali, L., Inference of an oscillation model for the yeast cell cy- cle, preprint, Center of Applied Computer Science, University of Cologne, Germany, 2006.

[52] Rhoades, B.E., Quadratic optimization of fixed-points for a family of nonexpansive mappings in Hilbert space, Fixed Point Theory and Applications 2 (2004) 135–147.

[53] Sakamoto, E., and Iba, H., Inferring a system of differential equations for a gene reg- ulatory network by using genetic programming, in: Proc. Congress on Evolutionary Computation (2001) 720–726.

[54] Stone, C.J., Additive regression and other nonparametric models, The Annals of Statis- tics 13, 2 (1985) 689–705.

[55] Stanford MicroArray Database, http://genome-www5.stanford.edu/ .

[56] Tas¸tan, M., Analysis and Prediction of Gene Expression Patterns by Dynamical Sys- tems, and by a Combinatorial Algorithm, Institute of Applied Mathematics, METU, MSc Thesis, 2005. Mathematical and Data Mining Contributions... 179

[57] Tas¸tan, M., Ergenc¸E., Pickl, S.W., and Weber, G.-W., Stability analysis of gene ex- pression patterns by dynamical systems and a combinatorial algorithm, in: HIBIT – Proceedings of International Symposium on Health Informatics and Bioinformatics, Turkey ’05 (Antalya, Turkey, November 2005) 67–75.

[58] Taylan, P., and Weber, G.-W., New approaches to regression in financial mathematics by additive models, to appear in Journal of Computational Technologies (2007).

[59] Taylan, P., Weber, G.-W., and Beck, A., New approaches to regression by general- ized additive models and continuous optimization for modern applications in finance, science and technology, to appear in the special issue of Optimization at the occa- sion of the 5th Ballarat Workshop on Global and Non-Smooth Optimization: Theory, Methods and Applications, November 28–30, 2006.

[60] Taylan, P., Weber, G.-W., and Nuray Urgan, N., On the parameter estimation for generalized partial linear models with B-splines and continuous optimization, preprint no. 71, Institute of Applied Mathematics, METU, 2007, submitted to Australian and New Zealand Journal of Statistics, 2007.

[61] Ugur,˘ O.,¨ Pickl, S.W., Weber, G.-W., and Wunschiers,¨ R., An algorithmic approach to analyze genetic networks and biological energy production: an introduction and contribution where OR meets biology, submitted for publication in the special issue of Optimization at the occasion of the 5th Ballarat Workshop on Global and Non- Smooth Optimization: Theory, Methods and Applications, November 28–30, 2006.

[62] Ugur,˘ O.,¨ and Weber, G.-W., Optimization and dynamics of gene-environment net- works with intervals, to appear in the special issue of Journal of Industrial and Man- agement Optimization (JIMO) at the occasion of the 5th Ballarat Workshop on Global and Non-Smooth Optimization: Theory, Methods and Applications, November 28–30, 2006.

[63] Vernes, J., L’Ileˆ mysterieuse.´ L’Abandonne´, Pierre-Jules Hetzel, Paris, France, 1875.

[64] Weber, G.-W., Generalized Semi-Infinite Optimization and Related Topics, Helder- mann publishing house, Research and Exposition in Mathematics 29, Lemgo, Hof- mannn, K.H., and Wille, R., eds., 2003.

[65] Weber, G.-W., Alparslan-Gok,¨ S.Z. and Soyler,¨ B., A new mathematical approach in environmental and life sciences: gene-environment networks and their dynamics, preprint no. 69 at Institute of Applied Mathematics, METU, 2006, invited paper sub- mitted to Environmental Modeling & Assessment.

[66] Weber, G.-W., Tezel, A., Taylan, P, Soyler, A., and C¸etin, M., Mathematical contri- butions to dynamics and optimization of gene-environment networks, to appear in the special issue of Optimization in honour of the 60th birthday of Prof. Dr. H.Th. Jongen (2006). 180 Gerhard–Wilhelm Weber, Pakize Taylan, Bas¸ak Akteke-Ozt¨ urk¨ et al.

[67] Weber, G.-W., Ugur,˘ O.,Taylan¨ P., and Tezel, A., On optimization, dynamics and un- certainty: a tutorial for gene-environment networks, preprint no. 67 at Institute of Ap- plied Mathematics, METU, 2006, submitted to the special issue of Discrete Applied Mathematics “Networks in Computational Biology”.

[68] Yagil, G., and Yagil, E., On the relation between effector concentration and the rate of induced enzyme synthesis, Biophysical Journal 11 (1971) 11–27.

[69] Yılmaz, F.B., A Mathematical Modeling and Approximation of Gene Expression Pat- terns by Linear and Quadratic Regulatory Relations and Analysis of Gene Networks, Institute of Applied Mathematics, METU, MSc Thesis, 2004.

[70] Yılmaz, F.B., Oktem,¨ H., and Weber, G.-W., Mathematical modeling and approxima- tion of gene expression patterns and gene networks, in: Operations Research Proceed- ings, at the occasion of International Conference on Operations Research, Tilburg, The Netherlands, September 2004, Fleuren, F., den Hertog, D., and Kort, P., eds. (2005) 280–287. In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 181-190 © 2010 Nova Science Publishers, Inc.

Chapter 7

FOLDING PROTEINS: HOW TO SET UP AN EFFICIENT METRICS FOR DEALING WITH COMPLEX SYSTEMS

Alessandro Giuliani* Environment and Health Dept., Istituto Superiore di Sanita’, Viale Regina Elena 299, 00161, Roma, Italy

Abstract

Protein folding, the process allowing a monodimensional string of aminoacids to acquire its characteristic shape in solution, is where complexity starts, as clearly stated in a famous paper entitled ‘Proteins: where physics of simplicity and complexity meet’ by Hans Frauenfelder and Peter Wolynes [1]. The starting of complexity implies the coupling of a thorough and accurate knowledge of the ‘first principles’ and potentials (hydrophobic interactions, hydrogen bonding, size constraints etc.) acting at the microscopic level with the substantially empirical (and very inaccurate) predictions on the actual structure of proteins when in solution. Along the pilgrimage to the ‘translation key’ from protein sequence to structure, scientists of different cultures have met and exchanged ideas and, as often happens to pilgrims, even the nature of the goal changed along the way. This is a tale from a section of this path (still very far to be completed) in which some peculiarities of the network based formalization of protein sequence and structure are presented as an example of a possible way to generate an efficient metrics to study phenomena in which many different actors interact in a complex way.

Introduction

Proteins live in a twilight zone between chemistry and biology: while clearly they are not living entities, it takes only a dozen of them (plus a relatively short nucleic acid molecule) to build up a virus particle [2]. They are clearly identified as organic polymers whose constructive principle in chemical terms is very straightforward: eliminate a water molecule between the basic (-NH2) and acidic (-COOH) ends of two neighbouring aminoacids and a

* E-mail address: [email protected] 182 Alessandro Giuliani peptide bond is created, iterate this reaction and a polymer chain called protein comes in existence. These polymers are made up of 20 basic monomers (natural aminoacids) that in turn form chains of various length (the great majority of known proteins go from a minimum of 15 aminoacid residues to a maximum of 1200 even if some monsters of 10000 residues are known). The disposition of residues along the chain is basically random, with only a weak (till significant) autocorrelation as for the hydrophobicity character of residues along the sequence [2],[3], [4]. What differentiates proteins from the artificial, man made, polymers is that, when in solution, proteins tend to fold themselves so to acquire a well defined 3D structure, while man made repetitive polymers are not water soluble and create large and unstructured matrices we call plastic [2]. It is important to consider the fact a protein molecule is huge (they are called macromolecules) with respect to classical molecular objects, thus the basic problem a protein has to solve is ‘how can I stay in the water without precipitating ?’. This is a very difficult chemico-physical problem, in fact the great majority of aminoacidic random polymers do precipitate when in water environment and natural protein too have the tendency to aggregate and precipitate following specific mutations. The scientists know very well the solvation forces that come into play when a given protein sequence is put into an aqueous solvent (for the sake of simplicity we can skip the complication of proteins living in lipid environments of at the interface between lipid and aqueous): first of all the so called ‘hydrophobic interaction’ , i.e. the gain of entropy of solvent water molecules deriving from putting the hydrophobic residues inside the protein structure away from water while leaving hydrophilic residues exposed. Other important players are the capability of forming hydrogen bonds between exposed residues and water , the possibility to generate covalent bonding in the form of disulphide links between different residues of the proteins and the minimization of steric hindrance [5]. This, at least in principle, allows for a clear definition of the sequence/structure puzzle: the 3D structure of a protein (that we can measure by means of both X ray crystallography and NMR measurements) is the consequence of the need of the polymer to acquire a conformation that allows for its optimal solvation (energy minimization), the task of prediction of what structure (if any) a given sequence will acquire in solution will come up directly from the application of the correct interaction potential to the different residues. From a mathematical point of view the problem is clearly understandable as the mapping between a monodimensional string correspondent to the aminoacid sequence and a three dimensional space in which each aminoacid residue is coded by its three dimensional coordinates in the space. This relatively well posed problem is still far from a complete and general solution (even if many local solutions were reached) after more than thirty years of massive attacks by legions of scientists coming from all the fields of science from mathematics to biology passing by chemists and physicists, this paper’s intention is to give some rapid spots from this long pilgrimage that could be suggestive to non expert readers of some common difficulties (but even of the great rewards) encountered when tackling intrinsically complex problems. I will concentrate on very basic formalization problems without going in deep into subteleties so to give results that, hopefully, will be of interest to non-specialists. This by no way must be considered as an exhaustive treatment of a huge scientific field for which we refer to some basic literature for those interested [6-10]. Folding Proteins: How to Set Up an Efficient Metrics… 183

I was forgetting to say that so many scientists got involved in protein folding affair not only for the intellectual attraction of exploring ‘where complexity starts’ [1] but for the very practical reason that biological activity (and thus variation of it) of protein molecules derives from their structure and being able to predict the biological consequences of a modification in the sequence (as consequence of a mutation or of a deliberate human intervention) is of utmost importance for both industrial (drug development) and knowledge (which are the potentially harmful mutations) purposes.

Results and Discussion: A Tale of Formalization

The aminoacid residues forming the sequence (or primary structure) of a protein are not simply logical entities, they are molecules, chemico-physical entities. As told in the introduction, the sequence/structure puzzle is mainly a solvation business, thus we can profitably transform the monodimensional literal string made of the first letter(s) of the consecutive aminoacid residue names into a numerical string reporting the relative hydrophobicity of each residue. The choice of hydrophobicity is particularly cogent for the role the hydrophobic interaction plays in the solvation energy of proteins. Figure 1 reports this kind of formalization.

Figure 1. The role the hydrophobic interaction plays in the solvation energy of proteins. 184 Alessandro Giuliani

The figure reports the sequence of haemoglobin that, through the agency of the coding by an hydrophobicity scale (lower values = hydrophilic residues, higher values = hydrophobic residues) becomes a numeric series. The numeric series is mathematically identical to a time series and thus can be treated with a lot of mathematical and statistical instruments apt for time series analysis [2]. Clearly, no formalization is neutral: if we consider a protein sequence as a series of hydrophobicity values we implicitly orient our sequence/structure mapping toward a major role of hydrophobicity. Clearly we can use a lot of different codings taking into account not only hydrophobicity but electronic properties of residues, their size, volume and so forth, in any case we are obliged to accept a general idea of what protein folding driving forces are. Let’s shift now to the formalization of the other ‘horn’ of the problem, i.e. the 3D structure.

Figure 2.

What seems to be the most natural representation of each residue in the space (the three coordinates of the residues taking as reference the centre of mass of the protein) and how the proteins are effectively represented in the structural data bases (e.g http://www.rcsb.org/pdb/home/home.do ) was demonstrated to be not practical for model building. The scientists preferred to shift to an ‘intrinsic’ geometry in which, instead of Folding Proteins: How to Set Up an Efficient Metrics… 185 putting the protein into an extrinsic pre-existing Euclidean space, the spatial locations of the single residues of the proteins were expressed in terms of an adjacency matrix in which each residue was coded in terms of its contact (a pixel becomes black in the graph, a value becomes 1 in the matrix) or absence of contact (a pixel remains blank, a value of zero in the matrix) with the other residues of the protein, after the elimination of the trivial contacts (self contacts or contacts with residues less than three residues apart in the sequence for which there is no room for not being in contact). Figure 2 reports the basic of this formalization. In the top of the figure the two proteins 2WRP-R and 1TNF-A are reproduced in the usual way, like ropes (whose linear order coincides with the primary structure) that fold on themselves in the space. The protein on the left is a so called only-alpha protein, this has to do with a typical spiral arrangement of the relative positions of neighbouring residues to form the so called alpha-helix, the protein on the right is instead a beta-sheet protein characterized by the local arrangement of nearby residues in a way resembling a sheet that is folded like those paper fans we used to build when in elementary school with one residue up, one down, one up, one down…. The bottom part of the figure reports the adjacency matrix formalization of the figures above, the alpha-carbon of the residue (the carbon atom nearer to the functional group) is taken as reference and its contacts with the other residues indicated by a pixel. A contact is scored whenever the two residues are at a mutual distance less than 6 angstrom, this choice is dictated by the limit of Van Der Walls radius of the residues. In any case this choice allowed to recollect, in terms of different patterns in the adjacency matrix, what already known about fold classes of proteins as coming from the entire all-atoms coordinates information [11]. This implies contact matrixes represent a very useful simplification of the problem of formalization allowing for an invariant, scale independent, representation of 3D structures in which proteins correspond to simple networks whose nodes are the residues and the relative edges the presence of a physical contact between the nodes. How this network formalization links with the time series-like formalization of primary structure ? One obvious link is by the sketching of lattice models: the adjacency matrix is represented as a bi-dimensional lattice whose elements correspond to the observed contacts, each contact is in turn scored by a statistical (or otherwise determined) potential measuring its energy contribution based on the coding of the residue in terms of a specific physico-chemical property. This representation can be the base of a non-linear optimisation problem aimed at finding the ‘most favourite’ folding [12] as a most favourite ‘path’ in the lattice. Figure 3 clarifies this point: hydrophobicity of the single residues is taken as potential and an ‘energy contribution’ is assigned to each interaction based on the principle that the smaller the difference in hydrophobicity between the residues the most favourite their interaction (again trivial contacts not considered). This kind of approach, pioneered by Ken Dill [12] was successful in the limited realm of very short peptides (10 to 20 residues) but was not able to predict sequence/structure relations in real, much bigger, proteins. What soon became evident was the need to take into consideration some general topological constraints acting on the protein considered as a network, in other words, it was not sufficient that a given arrangement of the chain give rise to an energetically ‘admitted’ or ‘favoured’ arrangement (by the way soon was discovered that a protein allowed for thousands of ‘energetically admitted’ but very different structures), but there were some ‘network topological structures’ that were favoured for different, not 186 Alessandro Giuliani immediately evident and apparently purely topological, reasons [13]. The discovery of ‘cavities’ inside protein globular shapes was a by product of this attention to the topology of contacts, these cavities were in turn demonstrated to play a very important role in protein physiology [14].

Figure 3.

On another point of view, the ‘adjacency matrix’ formalization (the theoretical physicists probably will note more than an analogy with the so called S matrix formalism in quantum physics) was demonstrated useful even for highlighting some hidden regularities in the apparently random distribution of the residues along the chain. By simply computing the Euclidean distance in terms of hydrophobicity profile of subsequent short (from 3 to 6 consecutive residues) patterns of hydrophobicity distribution along the chain we can easily generate ‘adjacency’ matrixes where the adjacency is no more a contact in the classical 3D space but identifies two patches of the sequence sharing the same (or very similar) hydrophobicity profile. In Figure 4 this procedure is reported for the P53 protein, in the bottom panel the hydrophobicity profile is reported, in the top panel this hydrophobicity profile is transformed into a binary matrix (recurrence plot) [2] in which the residues having a similar hydrophobicity along the chain (or better the 4 residues long segments starting from a given residue) are marked with a black dot in analogy with the 3D contact matrix. This kind of formalization allowed to detect local regularities (portions of the sequence in which hydrophobicity displays similar patterns of distribution) that were invisible at the general autocorrelation scale of the entire protein. Folding Proteins: How to Set Up an Efficient Metrics… 187

Figure 4.

These regularities (accumulation of many dots in a narrow region) were demonstrated to be linked to the portions of the molecule more prone to undergo interactions with other proteins [15-17] and allowed to discovery that there were some biological properties of proteins that were not mediated by structure but only by different sequence patterns in presence of an identical 3D structure. This went together with the discovery of the so called ‘natively unfolded proteins’, i.e. proteins that exert their physiological role by never acquiring a well defined 3D structure [15-20] that in many cases correspond to very periodic and repetitive patches along the sequence (the so called low-complexity regions) [17]. Not only there are proteins that are globally unfolded, but in a lot of proteins there are short regions of structural disorder inside an otherwise ordered structure, these disordered regions are in many cases the regions exerting the most important biological role [21]. A still more ‘revolutionary’ and surprising finding based on the consideration of autocorrelation structure along the protein sequences was the discovery of the possibility of synthesizing small (8 to 15 residues) peptides that bind to large proteins (500-1000 residues) by ‘downsizing’ at the small peptide scale the periodicities in terms of hydrophobicity present in the large protein system that acts as a receptor of the peptide [22], [23]. 188 Alessandro Giuliani

Figure 5 explains this point

Figure 5.

On the top left panel the hydrophobicity distribution of a more than 1000 residues globular protein is reported. On the top right panel the frequency spectrum relative to that distribution after some mathematical filtering (see [23]) is shown. On the bottom panel a 14 residues peptide is represented by both its hydrophobicity distribution (left) and relative spectrum (right). The peptide was designed by the authors to share the same two ‘frequency’ peaks of the globular protein. The peptides designed with this principles of mimicking the hydrophobicity distribution peaks were experimentally demonstrated to effectively bind with the correspondent proteins, while this result is practically impossible to achieve (at least without making millions of trials) with random residue assortment. This result (together with somewhat similar results pointing to a sort of ‘stochastic resonance’ effect in protein hydrophobicity sequences [24]), if confirmed, will force science community to change the time honoured paradigm representing the macromolecules as balls and sticks structural models to shift towards a ‘global wavelike’ vision in which the interactions could not have a specific location but in some sense being ‘delocalized’ on the entire system. At the end of the day this could completely change the relevance we attach to the definition of a specific 3D structure of a protein and even provoke important changes in our ideas of how drugs exert their biological action. It is worth noting how the choice of an efficient metrics based on mutual comparisons between residues (both in 3D and chemico- physical properties space) allowed for a lot of otherwise hidden regularities to come out. This can be considered as a general lesson when dealing for complex systems: the key is in many Folding Proteins: How to Set Up an Efficient Metrics… 189

cases not in the search for subtle, all embracing, laws of nature but in the ability to describe the systems with an efficient metrics.

Conclusion

At the end of this incredibly rapid (and very narrowly focused) tour of the status of protein folding problem, the take-home message is that when we put our feet along a path we cannot be completely sure of who we will meet on our way, during the pilgrimage we could even discover that the end of our journey is completely different from what we expected, or that intermediate stops were more interesting than the final one, the important thing is to continue to walk. The relevance of the protein folding journey was, in my opinion, in the clarity of initial goal, so that many scientists with different backgrounds could participate and join the march. Moreover the presence of thousands of protein sequences and structures freely downloadable from the world wide web allowed the pilgrims to enter into the path for free without the need of complicate instruments and experimental skills; as any real pilgrimage the basic lesson the participants had was a lesson of humility (things were not so easy as expected), the most superb pilgrims abandoned the path (IBM ‘blue gene’ project of a supercomputer able to definitively solve the sequence/structure puzzle by pure brute force was abandoned), the other participants received unexpected gifts linked to the appearing of natural mysteries at the beginning they never had imagined. Walking side-by-side along the same path, pilgrims of different scientific background learned to appreciate each other work so breaking initial cast separations. This is more than enough to say we can be happy of what protein folding problem gave to us.

Acknowledgments

Ignazio Licata is gratefully acknowledged for his suggestions.

References

[1] Frauenfelder H., Wolynes P. (1994) Phys. Today (47): 58. [2] Giuliani A., Benigni R., Zbilut JP., Webber, CL., Sirabella P., and Colosimo A. (2002) Chem. Rev. (102): 1471. [3] Weiss O., and Herzel H., (1998) J.Theor.Biol. (190): 341. [4] Irback A. and Sandelin E. (2000) Biophys. J. (79): 2252. [5] Finkelstein AV. and Galzitskaya OV. (2004) Physics of Life Rev. (1): 23. [6] Ellis RJ. and Hartl FU (1999) Curr. Opinion Struct. Biol. (9): 102. [7] Ivankov, D and Finkelstein AV.(2004) Proc.Natl.Acad. Sci. USA (101): 8942. [8] Dobson CM (2003) Nature Rev. Drug Discovery (2): 155. [9] Taylor WR, May AC, Brown NP, Aszodi A. (2001) Rep. Prog. Phys. (64): 517. [10] Pande VS, Grosberg AY and Tanaka T. (2000) Rev. of Mod. Phys. (72): 259. 190 Alessandro Giuliani

[11] Webber CL, Giuliani A., Zbilut JP and Colosimo A. (2001) Proteins (44): 292. [12] Dill KA (1990) Biochemistry (29): 7133. [13] Halle B. (2002) Proc.Natl.Acad. Sci. USA (99): 1274. [14] Hubbard SJ and Argos P. (1996) J.Mol. Biol. (261): 289. [15] Dunker KA, Brown CJ, Lawson D., Iakoucheva LM and Obradovic Z. (2002) Biochemistry (41): 6573. [16] Romero P., Obradovic Z., Li X., Garner EC., Brown CJ and Bunker KA (2001) Proteins (42): 38. [17] Uversky VN (2002) Eur. J. Biochem. (269) : 2. [18] Uversky VN (2002) Protein Sci (11) : 739. [19] Fink AL. (2005) Curr. Opinion Struct. Biol. (15): 35. [20] Uversky VN, Gillespie JR and Fink AL (2000) Proteins (41): 415. [21] Dunker KA, Cortese MS, Romero P., Iakoucheva LM and Uversky VN (2005) FEBS J. (272): 5129. [22] Mandell AJ, Selz K, Shlesinger MF. (1997) Proc. Natl. Acad. Sci. USA (94): 13576. [23] Selz K., Samoylova TI, Samoylov AM, Vodyanoy VJ and Mandell AJ (2007) Biopolymers (85) 38. [24] Zbilut JP, Scheibel T., Huemmerich D., Colafranceschi M. and Giuliani A. (2005) Phys. Lett. A (346): 33. In: Crossing in Complexity ISBN 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 191-197 c 2010 Nova Science Publishers, Inc.

Chapter 8

THE (UNFORTUNATE)COMPLEXITY OF THE ECONOMY

J.P. Bouchaud∗ Science & Finance, Capital Fund Management, 6 Bd Haussmann, 75009 Paris, France

Abstract

This article is a follow-up of a short essay that appeared in Nature 455 1181 (2008). It has become increasingly clear that the erratic dynamics of markets is mostly en- dogenous and not due to the rational processing of exogenous news. I elaborate on the idea that spin-glass type of problems, where the combination of competition and het- erogeneities generically leads to long epochs of statis interrupted by crises and hyper- sensitivity to small changes of the environment, could be metaphors for the complexity of economic systems. I argue that the most valuable contribution of physics to eco- nomics might end up being of methodological nature, and that simple models from physics and agent based numerical simulations, although highly stylized, are more realistic than the traditional models of economics that assume rational agents with infinite foresight and infinite computing abilities.

The current direful crisis puts classical economics thinking under huge pressure. In theory, deregulated markets should be efficient, rational agents quickly correct any mispric- ing or forecasting error. Price faithfully reflect the underlying reality and ensure optimal allocation of resources. These “equilibrated” markets should be stable: crises can only be triggered by acute exogeneous disturbances, such as hurricanes, earthquakes or political upheavals, but certainly not precipitated by the market itself. This is in stark contrast with most financial crashes, including the latest one. The theory of economic equilibrium and rational expectations, as formalized since the 50’s and 60’s, has deeply influenced scores of decision-makers high up in government agencies and financial institutions. Some of them are now “in a state of shocked disbelief”, as Alan Greenspan himself declared when he re- cently admitted that he had put too much faith in the self-correcting power of free markets

∗E-mail address: [email protected] 192 J.P. Bouchaud and had failed to anticipate the self-destructive power of wanton mortgage lending. Eco- nomic theories turn out to have significant impact on our every-day life. The last twenty years of deregulation have been prompted by the argument that constraints of all kinds pre- vent the markets from reaching their supposedly perfect equilibrium, efficient state. The theory of Rational Expectations has now permeated into International Political Economics, Sociology, Law, etc.1 Unfortunately, nothing is more dangerous than dogmas donned with scientific feathers. The present crisis might offer an excellent occasion for a paradigm change, already called for in the past by yawing economists such as John Maynard Keynes, Alan Kirman or Steve Keen. They have forcefully highlighted the shortcomings and contradictions of classical economics, but progress has been slow. Of course, it is all easier said than done, and the task looks so formidable that some economists argue that it is better to stick with the implausible but well corseted theory of perfectly rational agents rather than to venture into modelling the infinite number of ways agents can be irrational. So where should one start? What should be taught to students in order to foster, on the long run, a better grasp of the complexity of economic systems? Can physics really contribute to the much awaited paradigm shift? After twenty years or so of “econophysics” 2 and around 1000 papers in the arXiv, it is perhaps useful to give a personal birds eye view of what has been achieved in that direction. Econophysics is in fact, at this moment in time, a misnomer since most of its scope concerns financial markets. To some economists, finance is of relatively minor importance and any contribution, even significant, can only have a limited impact on economics at large. I personally strongly disagree with this viewpoint: the recent events confirm that financial markets hiccups can cripple the entire economy. From a more conceptual point of view, financial markets represent an ideal laboratory for testing several fundamental concepts of economics, for example: Is the price really such that supply matches demand? Or: Are price moves primarily due to news? The terabytes of data spitted out everyday by financial markets allows one (in fact compels one) to compare in detail theories with observations and the answers to both questions above seem to be clear no’s. 3 This proliferation of data should soon concern other spheres of economics and social science: credit cards and e-commerce will allow one to monitor consumption in real time and to test theories of consumer behaviour in great detail. 4 So we must get prepared to deal with huge amounts of data, and to learn to scrutinize them with as little prejudice as possible, still asking relevant questions, starting from the most obvious ones – those that need nearly no statistical test at all because the answers are clear to the naked eye – and only then delving into more sophisticated ones. The very choice of the relevant questions is often sheer serendipity: more of an art than a science. That intuition, it seems to me, is well nurtured by an education in natural sciences, where the emphasis is put on mechanisms and

1There are rational expectation theories of marriage, drug addiction or obesity! 2It seems adequate to define the first econophysics event as the Santa Fe conference in 1987, although the first scientific papers were written in the mid nineties, and the name econophysics coined by Gene Stanley in 1995. 3On these points, see Joulin et al. (2008) for the minor role news seem to play in explaining large price jumps, and Bouchaud et al. (2008) for an extensive review on the inadequacy of the idea that supply and demand is cleared instantaneously in financial markets. 4For an interesting work in that direction, see Sornette et al (2004). The (Unfortunate) Complexity of the Economy 193 analogies, rather than on axioms and theorem proving. Faced with a mess of facts to explain, Feynman advocated that one should choose one of them and try one’s best to understand it in depth, with the hope that the emerging the- ory is powerful enough to explain many more observations. In the case of financial mar- kets, physicists have been immediately intrigued by a number of phenomena described by power-laws. For example, the distribution of price changes, of company sizes, of individ- ual wealth all have a power-law tail, to a large extent universal. The activity and volatility of markets have a power-law correlation in time, reflecting their intermittent nature, obvi- ous to the naked eye: quiescent periods are intertwined with bursts of activity, on all time scales. Power-laws leave most economists unruffled (isn’t it, after all, just another fitting function?), but immediately send physicists imagination churning. The reason is that many complex physical systems display very similar intermittent dynamics: velocity fluctuations in turbulent flows, avalanche dynamics in random magnets under a slowly varying exter- nal field, teetering progression of cracks in a slowly strained disordered material, etc. The interesting point about these examples is that while the exogeneous driving force is regu- lar and steady, the resulting endogenous dynamics is complex and jittery. In these cases, the non-trivial (physicists say critical) nature of the dynamics comes from collective ef- fects: individual components have a relatively simple behaviour, but interactions lead to new, emergent phenomena. The whole is fundamentally different from any of its elemen- tary sub-part. Since this intermittent behaviour appears to be generic for physical systems with both heterogeneities and interaction, it is tempting to think that the dynamics of finan- cial markets, and more generally of economic systems, does reflect the same underlying mechanisms. Several economically inspired models have been shown to exhibit these critical fea- tures. One is a transposition of the Random Field Ising Model (RFIM) to describe situ- ations where there is a conflict between personal opinions, public information and social pressure.5 Imagine a collection of traders having all different a priori opinions, say opti- mistic (buy) or pessimistic (sell). Traders are influenced by some slowly varying global factors, for example interest rates, inflation, earnings, dividend forecasts, etc. One assumes no shocks whatsoever in the dynamics of these exogenous factors, but posits that each trader is also influenced by the opinion of the majority. He conforms to it if the strength of his a priori opinion is weaker than his herding tendency. If all agents made their mind in isolation (zero herding tendency) then the aggregate opinion would faithfully track the external influences and, by assumption, evolve smoothly. But surprisingly, if the herding tendency exceeds some finite threshold, the evolution of the aggregate opinion jumps dis- continuously from optimistic to pessimistic as global factors only deteriorate slowly and smoothly. Furthermore, some hysteresis appears. Much as supersaturated vapor refusing to turn into liquid, optimism is self-consistently maintained. In order to trigger the crach, global factors have to degrade far beyond the point where pessimism should prevail. On the way back, these factors must improve much beyond the crash tipping point for global opti- mism to be reinstalled, again somewhat abruptly. Although the model is highly simplified, it is hard not to see some resemblance with all bubbles in financial history. The progressive reckoning of the amount of leverage used by banks to pile up bad debt should have led

5See Sethna et al. (2001) for a general review on this model, and Galam & Moscovici (1991) and Michard & Bouchaud (2005) for some application to economics and social science. 194 J.P. Bouchaud to a self-correcting, soft landing of global markets – as the efficient market theory would predict. Instead, collective euphoria screens out all bad omens until it becomes totally un- sustainable. Any small, anecdotal event or insignificant news is then enough to spark the meltdown. The above framework also illustrates in a vivid way the breakdown of a cornerstone of classical economics, stigmatized in Alan Kirman’s essay Whom or what does the represen- tative individual represent? Much as in statistical physics or material science, one of the main theoretical challenges in economics is the micro/macro link. How does one infer the aggregate behaviour (for example the aggregate demand) from the behaviour of individual elements? The representative agent theory amounts to replacing an ensemble of heteroge- neous and interacting agents by a unique representative one – but in the RFIM, this is just impossible: the behaviour of the crowd is fundamentally different from that of any single individual. Minority Games define another, much richer, family of models in which agents learn to compete for scarce resources.6 A crucial aspect here is that the decisions of these agents impact the market: the price does not evolve exogenously but moves as a result of these decisions. A remarkable result is the existence, within this framework, of a genuine phase transition as the number of speculators increase, between a predictable market where agents can eke out some profit from their strategies, and an over-crowded market, where these prof- its vanish or become too risky. Around the critical point where predictability disappears and efficiency sets in, intermittent power-law phenomena emerge, akin to those observed on real stock markets. The cute point of this analysis is that there is a well-grounded mechanism to keep the market in the vicinity of the critical point: 7 less agents means more profit op- portunities which attracts more agents, more agents means no profit opportunities so that frustrated agents leave the market. There are other examples in physics and computer science where competition and het- erogeneities lead to interesting phenomena which could be metaphors of the complexity of economic systems: spin-glasses (within which spins interact randomly with one another), molecular glasses, protein folding, Boolean satisfiability problems, etc. In these problems, the energy (or the cost function) that must be minimized is an incredibly complicated func- tion of the N degrees of freedoms (the spins, the position of the atoms of the protein, the Boolean variables). Generically, this function is found to display an exponential number (in N) of local minima. The absolute best one is (a) extremely hard to find: the best algorithms to find it take an exponential time in N; (b) only marginally better than the next best one; (c) extremely fragile to a change of the parameters of the problem: the best one can easily swap over to become the second best, or even cease abruptly to be a minimum. Physical systems with these “rugged” energy landscapes display very characteristic phenomena, extensively studied in the last twenty years, both experimentally and theoretically. 8 The dynamics is extremely slow as the system is lost amidst all these local minima; equilibrium is never reached in practice; there is intermittent sensitivity to small changes of the environment.

6For a review and references, see Challet, Marsily and Zhang (2005). 7A mechanism called Self-Organized Criticality by Per Bak (1996), which is assumed to take place in other situations potentially relevant to economics, for example evolution and extinction of species, which might have the evolution and extinction of companies. 8For a review see A. P. Young (1998) The (Unfortunate) Complexity of the Economy 195

There is no reason to believe that the dynamics of economic systems, also governed by competition and heterogeneities, should behave very differently – at least beyond a certain level of complexity and interdependency.9 If true, this would entail a major change of paradigm: • First, even if an equilibrium state exists in theory it may be totally irrelevant in prac- tice, because the equilibration time is far too long. As Keynes noted, in the long run we are all dead. The convergence to the Eden Garden of economic systems might not be hobbled by regulations but by their tug-induced complexity. One can in fact imagine situations where regulation could nudge free, competitive markets closer to an efficient state, which they would never reach otherwise.

• Second, complex economic systems should be inherently fragile to small perturba- tions, and generically evolve in an intermittent way, with a succession of rather stable epochs punctuated by rapid, unpredictable changes – again, even when the exogenous drive is smooth and steady. No big news is needed to make markets lurch wildly, in agreement with recent empirical observations (see Joulin et al. 2008). Within this metaphor of markets, competition and complexity could be the essential cause of their endogenous instability. 10 The above models tell interesting stories but are clearly highly stylized and aim to be inspiring rather than convincing. Still, they seem quite a bit more realistic than the tradi- tional models of economics that assume rational agents with infinite foresight and infinite computing abilities. Such simplifying caricatures are often made for the sake of analytical tractability, but many of the above results can in fact be established analytically, using statis- tical mechanics tools developed in the last thirty years to deal with disordered systems. One of the most remarkable breakthroughs is the correct formulation of a mean-field approxima- tion to deal with interactions in heterogeneous systems. Whereas the simple Curie-Weiss mean-field approximation for homogenous systems is well known and accounts for inter- esting collective effects11, its heterogeneous counterpart is far subtler and has only been worked out in detail in the last few years. 12 It is a safe bet to predict that this powerful analytical tool will find many natural application in economics and social sciences in the years to come. As models become more realistic and hone in on details, analytics often has to give way to numerical simulations. The situation is now well accepted in physics, where nu- merical experimentation has gained a respectable status, bestowing us with a telescope of the mind, (as beautifully coined by Mark Buchanan) multiplying human powers of analy- sis and insight just as a telescope does our powers of vision . Sadly, many economists are still reluctant to recognize that numerical investigation of a model, although very far from theorem proving, is a valid way to do science. Yet, it is a useful compass to venture into the wilderness of irrational agent models: try this behavioural rule and see what comes

9The idea that spin-glass theory might be relevant to economics was originally suggested by Phil Anderson during the Santa Fe meeting The economy as a complex evolving system (1988). 10In a recent beautiful paper, Marsili (2008) has shown how the increase of derivative products could drive the system close to an instability point, using concepts and methods quite similar to those of the Minority Game. 11The Curie-Weiss mean field theory was first used in an economic context by Brock and Durlauf (2001). 12For a thorough review, see M´ezard& Montanari (2009). 196 J.P. Bouchaud out, explore another assumption, iterate, explore. It is actually surprising how easily these numerical experiments allow one to qualify an agent-based model as potentially realistic (and then one should dwell further) or completely off the mark. 13 What makes this expe- ditious diagnosis possible is the fact that for large systems details do not matter much – only a few microscopic features end up surviving at the macro scale. This is a well-known story in physics: the structure of the Navier-Stokes equation for macroscopic fluid flow, for example, is independent of all molecular details. The present research agenda is therefore to identify the features that explain financial markets and economic systems as we know them. This is of course still very much of an open problem, and simulations will play a cen- tral role. The main bet of econophysics is that competition and heterogeneity, as described above, should be the marrow ingredients of the final theory. A slew of other empirical results, useful analytical methods and numerical tricks have been established in the 15 active years of econophysics, which I have no space to review here.14 But in my opinion the most valuable contribution of physics to economics will end up being of methodological nature. Physics has its own way to construct models of reality based on a subtle mixture of intuition, analogies and mathematical spin, where the ill-defined concept of plausibility can be more relevant than the accuracy of the prediction. Kepler’s ellipses and Newton’s gravitation were more plausible than Ptolemy’s epicycles, even when the latter theory, after centuries of fixes and stitches, was initially more accurate to describe observations. When Phil Anderson first heard about the theory of Rational Expectations in the famous 1987 Santa Fe meeting, his befuddled reaction was: You guys really believe that? He would probably have fallen from his chair had he heard Milton Friedman’s complacent viewpoint on theoretical economics: In general, the more significant the theory, the more unrealistic the assumptions . Physicists definitely want to know what an equation means in intuitive terms, and believe that assumptions ought to be both plausible and compatible with observations. This is probably the most urgently needed paradigm shift in economics.

Editor’s Note: This article first appeared in Physics World April 2009 pp28-32, and is here republished under their kind permission (www.physicsworld.com).

References

P. W. Anderson, K. Arrow, D. Pines, The Economy as an Evolving Complex System, Addison-Wesley, 1988. P. Bak, How Nature Works: The Science of Self-Organized Criticality , New York: Coperni- cus, 1996. J.-P. Bouchaud and M. Potters, Theory of Financial Risks and Derivative Pricing , Cam- bridge University Press, 2003.

13For reviews, see Goldberg & Janssen (2005) and Lux (2008). 14Let me quote in particular models of wealth distribution, market microstructure and impact of trades, exactly solvable stochastic volatilitymodels, path integrals, multifractal random walks or random matrix theory. For more exhaustive reviews, see (among others): Bouchaud & Potters (2004), Yakovenko (2007), Lux (2008). The (Unfortunate) Complexity of the Economy 197

J.-P. Bouchaud, J. D. Farmer, F. Lillo, How Markets Slowly Digest Changes in Supply and Demand, in: Handbook of Financial Markets: Dynamics and Evolution, North-Holland, Elsevier, 2009. W. Brock, S. Durlauf, Discrete Choice with social interactions, Review of Economic Studies, 68, 235 (2001). M. Buchanan, This Economy does not compute, New York Times, Oct. 1st 2008. D. Challet, M. Marsili, Y.C. Zhang, Minority Games, Oxford University Press, 2005. J. D. Farmer, J. Geanakoplos, The virtues and vices of equilibriumand the future of financial economics, e-print arXiv:0803.2996 (2008). S. Galam, S. Moscovici, Towards a theory of collective phenomena: Consensus and attitude changes in groups, Euro. J. Social Psy. 21, 49 (1991). R. Goldstone, M. Janssen, Computational Models of collective behaviour, Trends in Cog- nitive Science 9, 424 (2005). A. Joulin, A. Lefevre, D. Grunberg, and J.-P. Bouchaud, Stock price jumps: News and volume play a minor role, arXiv:0803.1769; Wilmott Magazine, November 2008. S. Keen, Debunking Economic, Pluto Press, 2000. A. Kirman, What or whom does the representative individual represent?, Journal of Eco- nomic Perspectives 6, 117 (1992). T. Lux, Applications of Statistical Physics in Finance and Economics , to appear in Hand- book of Research on Complexity (2008). R. N. Mantegna, H. E. Stanley, An Introduction to Econophysics: Correlations and Com- plexity in Finance, Cambridge University Press, 1999). M. M´ezard, A. Montanari, Information, Physics & Computation, Oxford University Press, 2009. Q. Michard, J.-P. Bouchaud, Theory of collective opinion shifts: from smooth trends to abrupt swings, Eur. J. Phys. B 47, 151 (2005). M. Marsili, Eroding Market Stability by Proliferation of Financial Instruments (2008): http://ssrn.com/abstract=1305174 J. Sethna, K. Dahmen, C. Myers, Crackling Noise, Nature, 410, 242 (2001). D. Sornette, F. Deschˆatres,T. Gilbert, Y. Ageon, Endogeneous vs exogeneous shocks in complex systems: an empirical test using book sales ranking, Phys. Rev. Lett. 93, 228701 (2004). V. Yakovenko, Statistical Mechanics approach to Econophysics , arXiv:0709.3662 (2007), to appear in Encyclopedia of Complexity and System Science. A. P. Young, Spin glasses and Random Fields, World Scientific, 1997.

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 199-212 © 2010 Nova Science Publishers, Inc.

Chapter 9

EVOLUTION OF NORMS IN A MULTI-LEVEL SELECTION MODEL OF CONFLICT AND COOPERATION

J.M. Pacheco1, F.C. Santos2 and F.A.C.C. Chalub3 1 Centro de Física Teórica e Computacional & Departamento de Física da Faculdade de Ciências, P-1649-003 Lisboa Codex, Portugal 2 IRIDIA, CoDE, Université Libre de Bruxelles, Av. F. Roosevelt, 50, CP 194/6, 1050 Brussels, Belgium 3 Departamento de Matemática da Universidade Nova de Lisboa and Centro de Matemática e Aplicações, Quinta da Torre 2829-516, Caparica, Portugal

Abstract

We investigate the evolution of social norms in a game theoretical model of multi-level selection and mutation. Cooperation is modelled at the lower level of selection by means of a social dilemma in the context of indirect reciprocity, whereas at the higher level of selection conflict is introduced via different mechanisms. The model allows the emergence of norms requiring high levels of cognition. Results show that natural selection and mutation lead to the emergence of a robust yet simple social norm, which we call stern-judging. Stern-judging is compatible with expectations that anthropologists have regarding the Pleistocene hunter gatherer communities. Perhaps surprisingly, it also fits very well recent studies of the behaviour of reputation-based e-trading. Under stern-judging, helping a good individual or refusing help to a bad individual leads to a good reputation, whereas refusing help to a good individual or helping a bad one leads to a bad reputation. The lack of ambiguity of stern- judging, where implacable punishment is compensated by prompt forgiving, supports the idea that simplicity is often associated with evolutionary success.

Introduction

Natural selection is conventionally assumed to favour the strong and selfish who maximize their own resources at the expense of others. But many biological systems, and 200 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub especially human societies, show persistent patterns of altruistic, cooperative interactions [1], forming large social groups in which cooperation among non-kin is widespread. Therefore, one may naturally wonder: How can natural selection promote unselfish behaviour?

The Mathematics of Give and Take

The problem may be mathematically formulated in the context of evolutionary game theory [2]. Following the work of Hamilton, Trivers, and Wilson [3-5], an act is altruistic if it confers a benefit b to another individual in spite of accruing a cost c to the altruist (where it is assumed, as usual, that b>c). In this context, several mechanisms have been invoked to explain the evolution of altruism, but only recently an evolutionary model of indirect reciprocity has been developed by Nowak and Sigmund [6]. According to Alexander [7], indirect reciprocity presumably provides the mechanism which distinguishes us, humans, from all other living species on Earth. Moreover, as recently argued in [8], “indirect reciprocity may have provided the selective challenge driving the cerebral expansion in human evolution”. Unlike direct reciprocity, which reflects the common principle “I scratch your back and you scratch mine”, indirect reciprocity conveys the motto “I scratch your back and someone else will scratch mine”. We may assume an underlying mechanism of reputation through which an individual, by providing help to another, increases her reputation in such a way that it will become more likely for others to help her in turn, boosting cooperation. As became clear in the model developed by Nowak and Sigmund [6], the rule defining the conditions under which the reputation of an individual will change depending on her action towards a third party and that third party’s reputation constitutes the norm of the society. Nowak and Sigmund showed that cooperation under indirect reciprocity is feasible when the ruling norm is “image score”, whereby an individual always increases her reputation by helping another individual. Obviously, image score is but one example of a myriad of possible norms, some of which may ultimately promote cooperation, unlike others. The model developed by Nowak and Sigmund [6], despite dealing with a single norm, had an inherent level of strategic complexity which precluded its extension towards an exhaustive study of norms per se. On the other hand, the model provided clear-cut links between norms of cooperation and social norms studied long before in the context of economics [9], also associated with community enforcement [10]. Before the development of the simpler models we shall address in the following, the consensus from both economics and evolutionary biology was that reputation based cooperation must be associated with norms requiring high levels of cognition [8].

From the Pleistocene to the Internet

Anthropologists have discussed for a long time the features and limitations related to the social structure of hunter-gatherers during the Pleistocene [11]. Such egalitarian, small communities must have been under the influence of simple norms, more complex norms being associated with the emergence of societies at a larger scale [12, 13]. Consequently, it has remained unclear to which extent indirect reciprocity may provide a mechanism to explain reputation based cooperation in more primitive societies. Evolution of Norms in a Multi-Level Selection Model… 201

More recently, and in the midst of the information age, studies have shown that relatively low levels of cognition (associated with simple norms) seem to be the rule in modern means of economic exchange such as e-trade, which also rely on reputation-based mechanisms of cooperation [14-16]. Indeed, anonymous one-shot interactions between individuals loosely connected and geographically dispersed usually dominate e-trade, raising issues of trust- building and moral hazard [17]. Reputation in e-trade is introduced via a feedback mechanism which announces rating of sellers. Despite the success and high levels of cooperation observed in e-trade, it has been found [14] that publicizing a detailed account of the seller’s feedback history does not improve cooperation, as compared to publicizing only the seller’s most recent rating. In other words, practice shows that simple reputation-based mechanisms are capable of promoting high levels of cooperation. In view of the previous discussion, it is hard to explain the success of e-trade on the basis of the results obtained so-far for reputation- based cooperation in the context of indirect reciprocity.

A World in Black and White

Considerable insight into the nature of social norms became possible after some major simplifications were introduced in the original model of Nowak and Sigmund [6]. Ohtsuki and Iwasa [18] and Brandt and Sigmund [19] have developed, simultaneously and independently, a model of binary assessment in a world in black and white, in which reputations can only take one of two values – GOOD or BAD. This model has influenced many studies since then [8, 17-23]. In the original work of Ohtsuki and Iwasa an exhaustive search has been made in infinite populations under the same social norm, spanning all possible social norms in such a world in black and white. In their model, the norm constitutes a rule defining the reputation of a focal individual A, given his action towards another individual B, A’s reputation and B’s reputation. These three factors contributing to define the new reputation of individual A define so-called third order norms.

The Leading Eight

In this context, eight norms were found to be particularly efficient in promoting cooperation. These so-called leading eight are depicted in Figure 1. It is noteworthy that the norm image-score proposed by Nowak and Sigmund [6] is not part of the leading eight. On the other hand, this study addresses the stability of norms in infinite populations in which ALL individuals adopt the same strategy, information is public, no errors take place and stability is studied against invasion by individuals adopting a different norm and strategy. Despite the great insight provided by this study, many of simplifications adopted, with the aim of obtaining analytical solutions, raise questions concerning the connection of these results to real-world situations. Indeed, while it is reasonable to assume that a given population evolves under a common social norm, it is hard to imagine that all individuals adopt exactly the same strategy and that the society is free from errors both in what concerns decisions, but also in what concerns information spread. How do norms evolve in populations where such errors co-exist with diversity in individual strategies?

202 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub

Figure 1. Norm complexity and the Leading Eight Norms. The higher the order (and complexity) of a norm, the more “inner” layers it acquires. The outer layer stipulates the donor’s new reputation based on the 3 different reputation/action combinations aligned radially layer by layer: Inwards, the first layer identifies the action of the donor. The second identifies the reputation of the recipient; the third the reputation of the donor. The 3 white “slices” can be associated with either GOOD (green) or BAD (grey) reputations. Consequently, we have 23 norms – the leading eight found by Ohtsuki and Iwasa. Note that, in this convention, second order norms exhibit a mirror symmetry with respect to the equatorial plane (disregarding the innermost layer). As a result, only two second order norms can incorporate the leading-eight.

A Model of Conflict and Cooperation

Let us consider a world in black and white consisting of a set of tribes, such that each tribe lives under the influence of a single norm, common to all individuals (see Figure 2). Each individual engages once in the indirect reciprocity game with all other tribe inhabitants. Her action as a donor will depend on her individual strategy, which dictates whether she will provide help or refuse to do it depending on her and the recipient’s reputation. In the indirect reciprocity game, any two players are supposed to interact at most once with each other, one in the role of a potential donor, while the other as a potential receiver of help. Each player can experience many rounds, but never with the same partner twice, direct retaliation being impossible. By helping another individual, a given player may increase (or not) her reputation, which may change the pre-disposition of others to help her in future interactions. However, her new reputation depends on the social norm used by her peers to assess her action as a donor. Reputations are public: this means that the result of every interaction is made available to every one through the "indirect observation model" introduced in [18] (see also [21]). This allows any individual to know the current status of the co-player without Evolution of Norms in a Multi-Level Selection Model… 203 observing all of her past interactions. On the other hand, this requires a way to spread the information (even with errors) to the entire population: Language seems to be an important cooperation promoter [24] although recent mechanisms of reputation spreading rely on electronic databases (e.g., in e-trade, where reputation of sellers is centralized). Since reputations are either GOOD or BAD, there are 24=16 possible strategies, encoded as shown in Table 1, further listed in more detail in Table 2, together with known names from previous studies. On the other hand, the number of possible norms depends on their associated order. The simplest are the so-called first order norms, in which all that matters is the action taken by the donor. In second order norms the reputation of one of the players (donor or recipient) also contributes to decide the new reputation of the donor. And so on, in increasing layers of complexity (and associated requirements of cognitive capacities from individuals) as shown in Figure 1, which illustrates the features of third order norms such as those we employ here. Any individual in the tribe shares the same norm, which in turn raises the question of how each inhabitant acquired it. We shall not explore this issue here.

Figure 2. Norm evolution under conflict and cooperation. Each palette represents a tribe in which inhabitants (coloured dots) employ different strategies (different colours) to play the indirect reciprocity game. Each tribe is influenced by a single social norm (common background colour), which may be different in different tribes. All individuals in each tribe undergo pairwise rounds of the game (lower level of selection, level 1 in figure), whereas all tribes also engage in pairwise conflicts (higher level of selection, level 2 in figure). As a result of the conflicts between tribes, norms evolve, whereas evolution inside each tribe selects the distribution of strategies which best adapt to the ruling social norm in each tribe. 204 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub

Table 1. Bit-encoding of individual strategies

donor’s reputation recipient’s reputation donor’s action

GOOD GOOD Y / N

GOOD BAD Y / N

BAD GOOD Y / N

BAD BAD Y / N Each individual has a strategy encoded as a four-bit string (Y=1 and N=0). For each combination pair of donor and recipient reputations, the strategy prescribes individual’s action. There are a total of 24=16 strategies, identified in Table 2.

It is likely that a common norm contributes to the overall cohesiveness and identity of a n tribe. For a norm of order n there are 22 possible norms, each associated with a binary string of length 2n . We consider third order norms (8 bit-strings, Figure 1): In assessing a donor’s new reputation, the observer has to make a contextual judgment involving the donor’s action, as well as her and the recipient’s reputations scored in the previous action. We introduce the following evolutionary dynamics in each tribe: During one generation all individuals interact once with each other via the indirect reciprocity game. When individuals “reproduce” they replace their strategy by that of another individual from the same tribe, chosen proportional to her accumulated payoff [19]. The most successful individuals in each tribe have a higher reproductive success. This indirect reciprocity game provides the basis for the cooperation dilemma that each individual is facing in each tribe. Since different tribes are “under the influence” of different norms, the overall fitness of each tribe will vary from tribe to tribe, as well as the plethora of successful strategies which thrive in each tribe (Figure 2). This describes individual selection in each tribe (Level 1 in Figure 2). Tribes engage in pairwise conflicts with a small probability, associated with selection between tribes. After each conflict, the norm of the defeated tribe will change towards the norm of the victor tribe, as detailed in the METHODS section (Level 2 in Figure 2). We consider different forms of conflict between tribes, which reflect different types of inter-tribe selection mechanisms: group selection [7, 13, 25-29] based on the average global payoff of each tribe (involving different selection processes and intensities – imitation dynamics, a Moran-like process as well as selection resulting from inter-tribe conflicts modeled in terms of games: The display game of war of attrition, and an extended Hawk-Dove Game [20, 22]. We perform extensive computer simulations of evolutionary dynamics of sets of 64 tribes, each with 64 inhabitants. Once a stationary regime is reached, we collect information for subsequent statistical analysis (cf. methods). We compute the frequency of occurrence of bits 1 and 0 in each of the 8 bit locations. A bit is said to fixate if its frequency of occurrence exceeds or equals 98%. Otherwise, no fixation occurs, which we denote by “X”, instead of “1” or “0”. We analyze 500 simulations for the same value of b, subsequently computing the frequency of occurrence 1, 0 and X of the bits “1”, “0” and “X”, respectively. If 1 > 0 +

X the final bit is 1; if 0 > 1 + X the final bit is 0; otherwise we assume it is indeterminate, and denote it by “•”. It is noteworthy that our bit-by-bit selection/transmission Evolution of Norms in a Multi-Level Selection Model… 205 procedure, though artificial, provides a simple means of mimicking biological evolution, where genes are interconnected by complex networks and yet evolve independently.

Table 2. Different individual strategies in indirect reciprocity game

Strategy Name GG GB BG BB ALLD N N N N 1 N N N Y AND N N Y N SELF N N Y Y 4 N Y N N 5 N Y N Y 6 N Y Y N 7 N Y Y Y 8 Y N N N 9 Y N N Y CO Y N Y N OR Y N Y Y 12 Y Y N N 13 Y Y N Y 14 Y Y Y N ALLC Y Y Y Y We identify the different strategies and how they determine the action of a donor (N=no, do not provide help, Y=yes, provide help), given the reputation pair donor/recipient. Whereas some of these strategies have assumed well-known designations in the literature, others remain named by their numeric order.

Results

The results, for different values of b are given in Table 3, showing that a unique, ubiquitous social norm emerges from these extensive numerical simulations. This norm is of second-order, which means that all that matters is the action of the donor and the reputation of the receiver. It is depicted in Figure 3.

Table 3. Emerging social norm

Imitation Pairwise b Moran War of attrition Hawk-Dove dynamics Comparison 2 1001 1001 1•01 1001 1001 1001 •••• •••• 1001 1001 3 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 For each value of the benefit b (c=1), each column displays the eight-bit norm emerging from the analysis of 500 simulations employing the conflict method between tribes indicated as column headers. Irrespective of the type of conflict, the resulting norm which emerges is always compatible with stern-judging.

206 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub

In other words, even when individuals are equipped with higher cognitive capacities, they rely on a simple norm as a key for evolutionary success. In a nutshell, “helping a good individual or refusing help to a bad individual leads to a good reputation, whereas refusing help to a good individual or helping a bad one leads to a bad reputation”. Moreover, we find that the final norm is independent of the specifics of the second level selection mechanism, i.e., different types of conflict will alter the rate of convergence, but not the equilibrium state. In this sense, we conjecture that more realistic procedures will lead to the same dominant norm.

Figure 3. Stern-judging. Out of the 28 possible norms, the highly symmetric, second order norm shown as the outer layer emerges as the most successful norm. Indeed, stern-judging renders the inner layer (donor reputation) irrelevant in determining the new reputation of donor. This can be trivially confirmed by the symmetry of the figure with respect to the equatorial plane (not taking the inner layer into account, of course).

Prompt Forgiving and Implacable Punishment

The success and simplicity of this norm relies on never being morally dubious: To each type of encounter, there is one GOOD move and a BAD one. Moreover, it is always possible for anyone to be promoted to the best standard possible in a single move. Conversely, one bad move will be readily punished [30, 31] with the reduction of the player's score. This prompt forgiving and implacable punishment leads us to call this norm stern-judging [22]. Long before the work of Nowak and Sigmund [6] several social norms have been proposed as a means to promote (economic) cooperation. Notable examples are the standing Evolution of Norms in a Multi-Level Selection Model… 207 norm, proposed by Sugden [9] and the norm proposed by Kandori [10] as a means to allow community enforcement of cooperation. When translated into the present formulation, standing constitutes a third-order norm, whereas a fixed-order reduction of the social norm proposed by Kandori (of variable order, dependent on the benefit to cost ratio of cooperation) would correspond to stern-judging. Indeed, in the context of community enforcement, one can restate stern-judging as: “Help good people and refuse help otherwise, and we shall be nice to you; otherwise, you will be punished”. It is therefore, most interesting that the exhaustive search carried out by Ohtsuki and Iwasa [18, 21] in the space of up to third-order norms found that these two previously proposed norms were part of the so-called leading-eight norms of cooperation. On the other hand, image-score, the norm emerging from the work of Nowak and Sigmund [6] which has the attractive feature of being a simple, second-order norm (similarly to stern-judging) does not belong to the leading-eight. Indeed, the features of image-scoring have been carefully studied in comparison to standing [32-34], showing that standing performs better than image- scoring, mostly in the presence of errors [19].

The Emergence of Good and Evil

Although we used the words GOOD and BAD to describe possible tags, these are somewhat unfortunate choices, as can easily lead to the impression that good and bad are previously defined concepts. It is important to notice, therefore, that nothing would change if we have used 1 and 0, instead. The central idea of these works is to show that morality (i.e., the means to distinguish between good and bad behaviours) can evolve as a consequence of economic success. In our case, such economic success results from cooperation within groups and conflict among groups. Once a leading norm emerges, then 1 and 0 can be correctly interpreted as good and bad, respectively. Moreover, as shown in a series of previous works, our economic behaviour is not necessarily pay-off maximizer: in some simple games, like the "ultimatum" or "dictator" games, the typical human behaviour is off Nash-equilibrium [35, 36]. In some cases we violate economic principles necessary for proving market equilibrium, generating economic inefficiency [37, 38]. One possible explanation relies on the existence of a certain inner morality which limits our ambition of being maximizers at all times. For a comprehensive explanation of the role of ethics in economics, see [37, 39]. The question of how a morality, which is not optimal from the economic point of view, can evolve (which is the case of stern-judging, which ranks second in the work of Ohtsuki and Iwasa, where norm selections are not involved) remains unanswered in general. But as many have pointed before (see (7] and references therein) group selection plays an important role and norms that are successful during "peace" time are not necessarily stable when a conflict takes place. This means that under stress (a between-group conflict, natural catastrophes or epidemic outbreaks, for example) characteristics other than economic efficiency (e.g., the existence of a social security network) may be fundamental.

208 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub

Figure 4. Cooperation under a selected social norm. We depict the three popular norms (besides stern- judging pictured in Figure3), the performance of which we analysed. Stern-judging, simple-standing and image-scoring are symmetric with respect to the equatorial plane, and as such are second order norms. As for standing, it clearly breaks this symmetry, constituting a third order norm. In the lower panel, we plot the ratio between the average payoff attained by each tribe under the influence of a single, fixed norm, and the maximum value possible, given the population size [64], the benefit from cooperation (b) and the cost of cooperation (c=1).

Discussion

Among the leading-eight norms discovered by Ohtsuki and Iwasa [18, 21], only stern- judging [8] and the so-called simple-standing [23] constitute second-order norms. Our present results clearly indicate that stern-judging is favored compared to all other norms. Nonetheless, in line with the model considered here, the performance of each of these norms may be evaluated by investigating how each norm performs individually, taking into account all 16 strategies simultaneously. Such a comparison is shown if Figure 4. The results show that the overall performance of stern-judging is better than that of the other norms over a wide range of values of the benefit b. Furthermore, both standing and simple standing perform very similarly, again pointing out that reputation-based cooperation can successfully be established without resorting to higher-order (more sophisticated) norms. Finally, image- scoring performs considerably worse that all the other norms, a feature already addressed before [32-34]. Within the space of second order norms, similar conclusions have been found Evolution of Norms in a Multi-Level Selection Model… 209 recently by Ohtsuki and Iwasa [23]. Clearly, stern-judging’s simplicity and robustness to errors may contribute to its evolutionary success, since other well-performing strategies may succumb to invasion of individuals from other tribes who bring along strategies which may affect the overall performance of a given tribe. In this sense, robustness plays a key role when evolutionary success is at stake. We believe that stern-judging is the most robust norm promoting cooperation. The present results correlate well with the recent findings in e-trade, where simple reputation-based mechanisms ensure high levels of cooperation. Indeed, stern-judging involves a straightforward and unambiguous reputation assessment, decisions of the donor being contingent only on the previous reputation of the receiver. We argue that the absence of constraining environments acting upon the potential customers in e-trade, for whom the decision of buying or not buying is free from further ado, facilitates the adoption of a stern- judging assessment rule. Indeed, recent experiments [40] have shown that humans are very sensitive to the presence of subtle psychologically constraining cues, their generosity depending strongly on the presence or absence of such cues. Furthermore, under simple unambiguous norms humans may escape the additional costs of conscious deliberation [41].

Methods

We considered sets of 64 tribes, each tribe with 64 inhabitants. Each individual engages in a single round of the following indirect reciprocity game [8] with every other tribe inhabitant, assuming with equal probability the role of donor or recipient. The donor decides if YES or NO she provides help to the recipient, following her individual strategy encoded as a 4-bit string [18-20]. If YES, then her payoff decreases by 1, while the recipient's payoff increases by b>1. If NO, the payoffs remain unchanged (following common practice [6, 17, 19, 20, 32] we increase the payoff of every interacting player by 1 in every round to avoid negative payoffs). This action will be witnessed by a third-party individual who, based on the tribe's social norm, will ascribe (subject to some small error probability μa = 0.001) a new reputation to the donor, which we assume to spread efficiently without errors to the rest of the individuals in that tribe [18-20]. Moreover, individuals may fail to do what their strategy compels then to do, with a small execution error probability μe = 0.001. After all interactions take place, one generation has passed, simultaneously for all tribes. Individual strategies in each tribe replicate to the next generation in the following way: For every individual A in the population we select an individual B proportional to fitness (including A) [19]. The strategy of B replaces that of A, apart from bit mutations occurring with a small probability μs = 0.01. Subsequently, with probability pCONFLICT=0.01, all pairs of tribes may engage in a conflict, in which each tribe acts as an individual unit. Different types of conflicts between tribes have been considered:

1) Imitation Selection: We compare the average payoffs A and B of the two conflicting tribes A and B, the winner being the tribe with highest score. 2) Moran Process: In this case the selection method between tribes mimics that used between individuals in each tribe; one tribe B is chosen at random, and its norm is replaced by that of another tribe A chosen proportional to fitness. 210 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub

3) War of attrition: We choose at random two tribes A and B with average payoffs A

and B. We assume that each tribe can display for a time which is larger the larger its average payoff. To this end we draw two random numbers RA and RB each

following an exponential probability distribution given by exp(-t/ A )/ A and exp(-

t/ B )/ B, respectively. The larger of the two numbers identifies the winning tribe. 4) Pairwise comparison: We choose at random two tribes A and B, with average payoffs

A and B, respectively; then norm of tribe B will replace that of A with a probability given by

1 p 1 e ( B A )

whereas the inverse process will occur with probability (1 p) . In physics this function corresponds to the well-known Fermi distribution function, in which the inverse

temperature determines the sharpness of transition from p 0 , whenever B < A, to

p 1, whenever A < B. Indeed, in the limit we obtain imitation dynamics (strong selection), whereas whenever 0 B replaces A with the same probability that A replaces B ( ½ - neutral drift). 5) Extended Hawk-Dove Game: This method of tribal conflict has been developed in Ref. [20] and is based on an extended Hawk-Dove game introduced in Ref. [42] Full details are provided in Ref. [20] Similarly to the other types of conflict, we choose at

random two tribes A and B, with average payoffs A and B, to engage in a conflict. For each tribe there are two possible strategies, HAWK and DOVE, as described in Ref. [20].

As a result of inter-tribe conflict the norm of the loosing tribe (B) is shifted in the direction of the victor norm (A). Convergence of such a non-linear evolutionary process dictates a smooth norm crossover. Hence, each bit of norm A will replace the corresponding bit of norm B with probability

A p A +(1 ) B which ensures good convergence whenever 0.2 , independently of the type of conflict (a bit-mutation probability N 0.0001 has been used). Furthermore, a small fraction of the population of tribe A replaces a corresponding random fraction of tribe B: Each individual of tribe A replaces a corresponding individual of tribe B with a probability migration=0.005. Indeed, if no migration takes place, a tribe’s population may get trapped in less cooperative strategies, compromising the global convergence of the evolutionary process [27]. Each simulation runs for 9000 generations, starting from randomly assigned strategies and norms, in order to let the system reach a stationary situation, typically characterized by all tribes having maximized their average payoff, for a given benefit b > c=1. The subsequent 1000 generations are then used to collect information on the strategies used in each tribe and Evolution of Norms in a Multi-Level Selection Model… 211 the norms ruling the tribes in the stationary regime. We ran 500 evolutions for each value of b, subsequently performing a statistical analysis of the bits which encode each norm, as detailed before. In our simulations, we adopted the following values: η=0.1, μN=0.0001, μS=0.01, μa=μe=0.001. The benefit b varied from b =2 to b=36. Each individual, in each tribe, has a strategy (chosen randomly at start) encoded as a four-bit string, which determines the individual’s action (N=no, do not provide help; Y=yes, provide help) as a donor, knowing hers and the recipient’s reputation, as detailed in Table 1. This results in a total of 16 strategies, ranging from unconditional defection (ALLD) to unconditional cooperation (ALLC), as detailed in Table 2. The results presented are quite robust to variations of the different mutation rates introduced above, as well as to variation of population size and number of tribes. Furthermore, reducing the threshold from 98% to 95% does not introduce any changes in the results shown. Finally, in Figure 4, we ran 500 simulations for each tribe with 64 inhabitants, and used the last 1000 generations from a total of 10000.

Acknowledgments

JMP would like to thank Yoh Iwasa and Hisashi Ohtsuki for helpful discussions. JMP and FACCC acknowledge support from FCT, Portugal. FCS acknowledges the support of COMP2SYS, a Marie Curie Early Stage Training Site, funded by the EC through the HRM activity.

References

[1] Smith, J. M. & Szathmáry, E. (1995) The Major Transitions in Evolution, (Freeman, Oxford). [2] Smith, J. M. (1982) Evolution and the theory of games, (Cambridge University Press, Cambridge USA). [3] Hamilton, W. D. (1996) Narrow Roads of Gene Land Vol.1, (Freeman, New York). [4] Trivers, R. (1985) Social Evolution, (Benjamin Cummings, Menlo Park). [5] Wilson, E. O. (1975) Sociobiology, (Harvard Univ. Press, Cambridge, Massachusetts). [6] Nowak, M. A. & Sigmund, K. (1998) Nature, 393, 573-7. [7] Alexander, R. D. (1987) The Biology of Moral Systems, (AldinedeGruyter, NewYork). [8] Nowak, M. A. & Sigmund, K. (2005) Nature, 437, 1291-8. [9] Sugden, R. (1986) The economics of rights, co-operation and welfare, (Basil Blackell, Oxford, UK). [10] Kandori, M. (1992) The Review of Economic Studies, 59, 63-80. [11] Boem, C. (1999) Hierarchy in the forest, (Harvard University Press, Cambridge, MA, USA). [12] Dunbar, R. (1996) Grooming, gossip, and the evolution of language, (Harvard University Press, Cambridge, MA, USA). [13] Mackie, J. L. (1995) in Issues in Evolutionary Ethics, ed. Thompson, P. (State University of New York Press, NY), pp. 165-177. 212 J.M. Pacheco, F.C. Santos and F.A.C.C. Chalub

[14] Dellarocas, C. (2003) MIT Sloan School of Management working paper, 4297-03. [15] Bolton, G. E., Katok, E. & Ockenfels, A. (2004) Manage.Sci. 50, 1587-1602. [16] Keser, C. (2002) IBM-Watson Research Center, CIRANO working paper, 2002s-75k. [17] Brandt, H. & Sigmund, K. (2005) Proc Natl Acad Sci U S A, 102, 2666-70. [18] Ohtsuki, H. & Iwasa, Y. (2004) J Theor Biol. 231, 107-20. [19] Brandt, H. & Sigmund, K. (2004) J Theor Biol. 231, 475-86. [20] Chalub, F. A. C. C., Santos, F. C. & Pacheco, J. M. (2006) J Theor Biol. 241, 233-240. [21] Ohtsuki, H. & Iwasa, Y. (2006) J Theor Biol. 239, 435-44. [22] Pacheco, J. M., Santos, F. C. & Chalub, F. A. C. C. (2006) PLoS-Computational Biolog,y 2, e178. [23] Ohtsuki, H. & Iwasa, Y. (2007) Journal of Theoretical Biolog,y 244, 518-531. [24] Brinck, I. & Gardenfors, P. (2003) Mind & Language, 18, 484-501. [25] Bowles, S. & Gintis, H. (2004) Theor Popul Biol. 65, 17-28. [26] Bowles, S., Choi, J. K. & Hopfensitz, A. (2003) J Theor Biol. 223, 135-47. [27] Boyd, R. & Richerson, P. J. (1985) Culture and the Evolutionary Process, (University of Chicago Press, Chicago). [28] Boyd, R. & Richerson, P. J. (1990) J Theor Biol. 145, 331-42. [29] Boyd, R., Gintis, H., Bowles, S. & Richerson, P. J. (2003) Proc Natl Acad Sci U S A, 100, 3531-5. [30] de Quervain, D. J., Fischbacher, U., Treyer, V., Schellhammer, M., Schnyder, U., Buck, A. & Fehr, E. (2004) Science, 305, 1254-8. [31] Gintis, H. (2003) J Theor Biol .220, 407-18. [32] Leimar, O. & Hammerstein, P. (2001) Proc Biol Sci. 268, 745-53. [33] Panchanathan, K. & Boyd, R. (2003) J Theor Biol. 224, 115-26. [34] Panchanathan, K. & Boyd, R. (2004) Nature, 432, 499-502. [35] Hofmann, E., McCabe, K., Shachat, K. & Smith, V. (1994) Games and Econ. Behav. 7, 346-380. [36] Sigmund, K., Fehr, E. & Nowak, M. A. (2002) Sci. Am. 286, 82-87. [37] Sen, A. (1987) On ethics and economics, (Blackwell Publishers). [38] Sen, A. (1993) Econometrica, 61, 495-521. [39] Collard, D. (1978) Altruism and economy: a study in non selfish economics, (Martin Robertson). [40] Haley, K. J. & Fessler, D. M. T. (2005) Evolution and Human Behaviour, 26, 245-256. [41] Dijksterhuis, A., Bos, M. W., Nordgren, L. F. & van Baaren, R. B. (2006) Science 311, 1005-7. [42] Crowley, P. H. (2000) J Theor Biol. 204, 543-63.

In: Crossing in Complexity ISBN 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 213-227 °c 2010 Nova Science Publishers, Inc.

Chapter 10

DYNAMICSOF COUPLED PLAYERS AND THE EVOLUTIONOF SYNCHRONOUS COOPERATION — DYNAMICAL SYSTEMS GAMESAS GENERAL FRAMEFOR SYSTEMS INTER-RELATIONSHIP

Eizo Akiyama∗ Graduate School of Systems and Information Engineering, University of Tsukuba, Japan

Abstract

This paper investigates how players in competition for the dynamical resources self- organize and develop synchronous behaviors to increase their collective profit. For this purpose, we first introduce the framework “dynamical systems game [2]” in which players are described as dynamical systems that autonomously adjust their strategy parameters though mutual interactions. Next we briefly review some of the results found in evolutionary simulations about an application model of the dynamical sys- tems game [3]. The results will be analyzed from the viewpoint of the “dynamics of coupled players.” We discuss how the following two key elements affect the develop- ment of dynamical cooperation rules under social dilemma: (1) Formation of synchronous behaviors among interacting players. (2) Evolution of strategies to change the coupling (interaction) strength among players.

1. Evolution of Cooperative Behaviors

When physicists say “cooperative behavior,” it usually means a synchronous behavior of interacting multiple units (e.g. synchronous behavior of neurons). On the other hand, when game theorists say “cooperative behavior,” it usually means a behavior which enhances the collective profit of a group of players (e.g. altruistic behavior). In fact, both seem to be related with each other in real societies because, for example, mutual cooperation in the sense of game theory often requires coordinated behaviors of the players.

∗E-mail address: [email protected] 214 Eizo Akiyama

Synchronous behaviors that arise from interacting units have been known since Huy- gens discovered pendulum clocks hung on the same wall started to synchronize [13]. Syn- chronous phenomena have widely been discovered in ecosystems and physical systems especially after 20th century [27, 8, 29, 9, 10]. Recently theoretical models of coupled non-linear oscillators have been presented and investigated in the field of physics such as [11, 20] 1. However, we need revisions of these theoretical models when we discuss cooperative behaviors in societies. This is because ele- ments in these models are basically physical elements but not decision makers that modify their decision making mechanisms to adapt to their environment (evolution or learning). On the other hand, there are game theoretical models on dynamical cooperation among players such as [19, 28] where kinds of common pool resource dilemma are presented. These works discuss Nash equilibria in which players differentiate their roles with time by cyclic coordination [19, 28]. Equilibrium approaches cannot, however, investigate how such a dynamical cooperation is formed through interaction of players. We discuss in this paper how the evolution of strategies affects the synchronization of players’ behaviors, and how the synchronization of behaviors is related to the evolution of cooperation in the sense of game theory. (From hereafter, we use the term ‘cooperation’ basically in the sense of game theory, in order to avoid confusion.) We first present a framework what we call the dynamical systems game (DS game) pre- sented in [2], in which players are described as autonomous dynamical systems that are coupled with each others via environmental variables. Using this framework, we can inves- tigate the emergence and development of synchronous cooperation. We also introduce an example DS game, the lumberjack’s dilemma game, which describes a social dilemma sit- uation where players consume the common resource whose amount dynamically changes. Next, we quickly review some of the results about the 2-person lumberjack’s dilemma game found in [3]. Last, we investigate the mechanism behind the evolutionary phenomena in [3] from the viewpoint of the “dynamics of coupled players.”

2. Description of Games as Dynamical Systems

If two or more decision-makers trying to obtain optimum results interact with each other, the result for each one will depend in general not merely upon his own actions but on those of the others as well. In the theory of games, von Neumann and Morgen- stern had insight into the fact that this problem is not an ordinal maximum problem and that the formalization of this type of problem should take algebraic form such as matrix and tree. They succeeded in characterizing the problem as one in which every individual “can determine the variables which describe his own actions but not those of the others,” while, in addition, “those alien variables cannot, from his point of view, be described by statistical assumptions.” [21] However, game theory is not congenial to problems involving dynamical phenomena involving multiple decision makers due to the static nature of the algebraic formulation employed in conventional game theory. There are mainly two issues that we would like to

1Furthermore, coupled chaotic maps placed on a lattice are known to show quite rich spatio-temporal be- haviors [17, 18]. Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 215 consider here. The first one regards the effect that a player’s actions can have on the game environment. The actions selected by any one player will certainly have an effect on the actions of others. In reality, however, it is also possible that a player’s actions can affect the actual game environment itself. Through this influence, the actual game in which the player is involved can also change. Then, through such changes in the game environment, the payoffs for a player’s actions may also be changed. In addition to questions involving the effect of a player’s action on the game environ- ment, we wish to consider the issue of the connection between a player’s payoff function and that player’s ‘state’. (We use the word ‘state’ here to mean any general internal prop- erties of a player that may change, the actual condition of the player or the internal model of the outside world that the player has). For example, consider a player participating in a contest repeatedly with the same opponents in a game environment that does not change with time. In this case, will the utility of the player’s possible actions always continue to be the same? Won’t the player’s assessment of his possible actions vary in accordance with changes in his internal state? Further, we would like to touch here upon the fundamental viewpoint of traditional game theory with regard to the above-mentioned situation. In traditional game theory, such a situation is sometimes represented by one (large) game. That is, from the present to the future, all possible actions of all players at all points in time are taken into account. Thus all possible bifurcation patterns of the game are derived, with this situation as a whole depicted as one huge game tree. In this way, we can project the course of time into a static game and analyze its solution in the form of a game-tree or a game matrix. Strategy here means ‘the action plan’ for all points in time, and the analysis of a rational solution for a game is possible only when we know all the possibilities about all players’ actions from the past to the future. However, we actually do not or can not make our decisions in this way. On the other hand, there had been another direction to formalize games. Rashevsky [23] and Rapoport [22] in 1947 tried to formalize game as dynamical systems. This kind of direction had not been paid attention to after their works because of the prevalence of Neu- mann’s game theory with algebraic formulation, but recently “games as dynamical systems” revived later. For example, Rossler¨ considered them using an abstract model of multiply- linked coupled autonomous optimizers [24, 25, 26]. These have recently been developed by Ikegami and Taiji [14, 15] and also by Aguirre [1]. As for the model of social situations, [7] and [16] discuss the coordination problem using the model that involves a dynamic payoff function. Furthermore, [6] investigates business cycles from the viewpoint of period-locking, and [12] analyzes the effect of play- ers’ network structure on the oscillatory phenomena in economic activities. Especially, Akiyama and Kaneko presented a framework called “dynamical systems game” in [2] to deal with the above situation. In the dynamical systems game (DS game), game situation is represented not by the payoff structure (e.g. bi-matrix) but by the coupling of dynamical systems that autonomously adapt to their environment. The schematic figure on the dynamical systems game (DS game) is given in Fig. 1. In a DS game, players live in a certain game environment and always have several possible actions they can take. The game dynamics, g, are composed of the following three system-defining rules: 1. The states of the players’ surroundings (which we call the game environment, x, and 216 Eizo Akiyama

Figure 1. Schematic figure of game dynamics g in the dynamical systems game: Player i has a decision-making function, f i, that uses the state of the game environment, x, and the players’ states, y, as input information and determines the player’s action, ai, as an output. ai then affects x. The function f changes in response to evolution or learning and maximizes (or minimizes) a kind of optimality functional.

the states of all the players, y, change according to a natural law.

2. Players make decisions according to their own decision-making mechanisms, f, by referring to both the states of the players’ surroundings and of all the players (includ- ing oneself).

3. Change in the game environment and players’ actions affect the states of the players.

Players repeatedly carry out actions, and in the process, the system evolves according to these general rules. Using the formulation described by these rules, instead of that based on algebraic payoff matrices, DS game model can explicitly describe not only game-like in- teractions but also the dynamics observed in the game environment and among the players. The total game dynamics are described by the map g, and the players’ decision-making, f, is embedded into g. The dynamics are expressed either in discrete-time fashion, as it- erated ‘mappings,’ or in continuous-time fashion, by differential equations. The function f changes in response to evolution or learning and maximizes (or minimizes) a kind of optimality functional. (See [2] for more detail.) Thus, the dynamics in the game-like interaction can be better represented by the dy- namical systems game than by the model of classical game theory using algebraic forms. Especially, investigation of the synchronization of players’ behaviors requires a model not with algebraic form but with dynamical systems.

3. The Lumberjack’s Dilemma Game

In this paper, we present the application of the dynamical systems game to the issue of social dilemma, and see how players form cooperation rules to consume the common re- Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 217 source and also see how they develop the rules through mutual interactions. Let us consider the following story of, what we call, the lumberjack’s dilemma (LD) game:

There is a wooded hill and N lumberjacks on it. The lumberjacks cut trees for a living. They can maximize their collective profit if they cooperate in waiting until the trees have grown fully before felling them, and share the profits. However, any lumberjack who fells a tree earlier takes the entire profit on that tree. Thus, each lumberjack can maximize his personal profit by cutting trees earlier. If all the lumberjacks do this, however, the hill will become barren and there will eventually be no profit. This situation represents a dilemma. The LD game is defined as a T round game where the state of the game changes with time (round) by players’ decisions. We set T = 400 for computer simulation in this paper. Also, we focus on the case where the number of trees on the hill is one. (See [3] for multiple tree case.) Let x(t) be the tree size at round t and yi(t) be the state of player i. “State” here can be the player’s monetary state, nutritious state, etc. We assume that yi(t) is the payoff for the player at round t. The payoff of the player in the LD game is defined as the averaged payoff over T rounds. Each player i has her decision-making function (mechanism) f i and decide her action at round t, ai(t), using (x(t), y(t)) as the inputs for f i Here y(t) is the vector representing all players’ states y1(t), y2(t), . . . , yN (t). In the computer experiment, f i is implemented in a very simple form as the following rule: ai = 0 (don’t cut the tree) if ηx + (θ, y) + ξ is negative and ai = 1 (cut the tree) otherwise. Note that η ∈ R, θ ∈ RN , ξ ∈ R are strategy parameters. 2 The game state changes from (x(t), y(t)) to (x(t + 1), y(t + 1)) in one round of LD game though three processes: (1) the change by the effect of the natural law (2) players’ decision making (3) the change by the effect of players’ decision. (1) The change by the effect of the natural law includes (a) the growth of the tree and (b) the decrease of the players’ states (e.g. A player who doesn’t do anything becomes hun- gry, expends money). “Natural law” makes the time evolution that is irrelevant to players’ decision makings. By natural law, the state of game shifts from (x(t), y(t)) to (x(t)0, y(t)0). (2) Referring to (x(t)0, y(t)0), each player decides her action ai(t) based on her decision-making function f i, which is described in the above. An action ai(t) is either “wait (do not cut the tree)” or “cut the tree.” (3) The tree cut by players becomes short. The height of the tree becomes (1/3)n when cut by n players. The lumber cut out from the tree will be divided equally by the players who cut by the tree. The state of the player who cut the tree increases by the size of the lumber. Thus, the state of the game shifts from (x(t)0, y(t)0) to (x(t + 1), y(t + 1)). The LD game provides an example of the social dilemma in which maximization of personal profit and maximization of the collective profit of all players conflict each other. In other words, it can be represented in the form of an (n-person) Prisoners’ Dilemma if we project it onto the space of static game. However, there are several important differ- ences between the LD game and the n-person Prisoners’ Dilemma. First, dynamics of the

2The results of simulation that use two dimensional form of f i are not essentially different except simulation time takes longer. 218 Eizo Akiyama size of the trees are expressed explicitly in this LD game. Also, the yield from one tree, and thus a lumberjack’s profit upon cutting it, differs according to the time that it is cut. These profits have a continuous distribution, because the yield of a tree takes continuous values as described below. Finally, a lumberjack’s decision today can affect the future game environment due to the dynamic nature (i.e. the growth) of trees. For the natural law of the tree growth, we use two kinds of maps to define how the size of the tree becomes bigger:

1. x(t)0 = u(x) = 0.7x3 − 2.4x2 + 2.7x Convex map

2. x(t)0 = u(x) = min(1.5x, 1.0) (Piecewise) Linear map

The important point here is not the detailed implementation of u(x) but the fact that we use two types of different maps (convex map and linear map) that share the following na- ture. Both the convex-map tree and the linear-map tree grow rapidly in earlier rounds and eventually stop their growth at the size of 1.0 unless they are not cut. Consequently both the convex-map and the linear-map LD games can represent dilemma; that is, players can maximize their collective profit if they wait the tree to grow enough before cutting the tree, while each of them can increase her personal profit if she cuts the tree earlier than others.

4. Review of Akiyama & Kaneko 2002

Akiyama and Kaneko [3] conducted evolutionary and ecological LD games on a com- puter. In this section, we briefly review some of the results found in [3] about the evolution of cooperation where players compete for a dynamic resource. The results will be analyzed from the viewpoint of “dynamics of coupled players” in later sections. In the computer simulations, each player has her decision-making function to decide her action. Players who have the same decision-making function are called to form a species. Suppose that there are S species of lumberjacks in the game world. Lumberjack species are distinguished by their decision-making function, f. Each generation in this game world has the following 3 phases. First, we conduct r combinations of a random matching tournament. In each combi- nation, in case of an N person game, we randomly select N lumberjacks from among the S species. These N lumberjacks play one LD game of T iterated rounds. Several lum- berjacks of the same species can be chosen to participate. After r combinations of these random matching tournaments, the average score for each species over all the LD games is calculated, and called as the fitness of the species. Second, in a process of selection, the k lowest-ranking species are eliminated from the game world. Third, the remaining r − k species move on to the next generation. 3 mutants are generated from the remaining r − k species. k new species then replace the r eliminated ones. The same procedure is repeated in the next generation. In the initial generation, ran- domly produced S strategies are used as the S species. In the simulation results presented below, parameters are set as follows: N = 2, S = 10, r = 60, T = 400, k = 3. Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 219

Fitness (Convex 2-person 1-tree: 900-1800)

1 0.9 0.8 0.7 0.6 C 0.5 B D 0.4 0.3 A 0.2 0.1 0 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 Generation

Figure 2. The fitness of the most fittest species in each generation: The horizontal axis shows generation. One sees the fitness changes rather stepwise as generations. The periods of generations corresponding to the stepped plateaus in this figure, on each of which the fitness does not change so much, will be called epoch A, B, C and D.

Let us introduce a typical simulation result of the two-person convex-map LD game. Figure 2 shows a snapshot of the simulation result in which the fitness of the fittest species (the species whose fitness is the highest) is plotted as a function of generation (from 900th to 1800th generation). In the early generations of this game, evolution goes toward the competitive tree-cutting societies and the hill becomes barren. As a result, the fitness value falls at first. However, as in Fig. 2, the fitness value of the generation begins to rise step-by-step from about the 900th generation. At each epoch A, B, C and D with stepped plateaus, one type of cooperative game dynamics specific to each epoch arises which is observed in more than half combinations of the random matching tournament. We name these game dynamics as type A, B, C and D, respectively. Fig. 3 shows game dynamics of type A, B and C. For example, the competing two lumberjacks in epoch A are entrained into synchronous (in-phase) period-5 action dynamics “wait, cut, wait, cut and wait,” which enables them to collect the lumber while allowing the tree to grow appropriately. This cooperation rule spreads into the population of epoch A and makes the society cooperative. As generations pass, the dominant cooperation rules shift to type B and to type C. As shown in Fig. 3, type B dynamics are a sequence with a period of 3 actions — “wait, cut, and wait.” Type C dynamics have a period of 4 — “wait, wait, wait, and cut.” The most prominent feature of type C is that the action sequence is performed out of phase (anti- phase), and the two players are entrained to alternately cut and leave the tree to effectively get the resource. Thus, rules of cooperation are formed and shift with generations in spite of the dilemma underlying our LD game. 220 Eizo Akiyama A 400 rounds

round average tree size = 0.12

B (1) ID 00000FC1 (2) ID 00000FCB In-phase (1) (2) 0 40 80 Round

average tree size = 0.27 C Anti-phase

average tree size = 0.45

Figure 3. Transition of dominant cooperation rules: The top figure shows the dominant cooperation rule in epoch A in Fig. 2, the middle one epoch B, and the bottom one epoch C. In each figure, the horizontal axis shows the round. From a total of 400 rounds the first 80 rounds are shown, and also a section of several rounds of actions is enlarged in each figure. The white tile represents the action of “cutting,” while the black tile represents “waiting.” In the top figure (epoch A), the players assume an action cycle with a period of 5, “wait, cut, wait, cut and wait,” simultaneously (in-phase), and the middle figure shows the period 3 action dynamics dominant in epoch B, “wait, cut and wait.” The bottom figure shows the period 4 dynamics in epoch C, “cut, wait, wait and wait.” where both players cut the tree alternately (anti-phase).

The development of cooperation rules as shown in the above cannot be observed in linear LD games. It is specific to convex LD games in which growth of the tree gradually saturate. See [3] for other results on the evolutionary phenomena in LD games, such as the phenomena in later generations, the results for 3 person case, for 2 or 3 trees case, and etc. What we should note here is that it is only possible by the game as dynamical systems, not by the model of classical game theory, to describe and analyze the above shown phe- nomena in which players form and develop cooperation rules by synchronous behaviors to Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 221 manage the dynamics of the resource.

5. Dynamics of Coupled Players

yOpp x fOpp eη yMe qθopp

Opp Opp Me Me fMe η x +θ y +θ y +ξ

Figure 4. Schematic figure of the coupling of players: Players are coupled through the input information to their decision-making function. Strategy parameter η represents the weight for the input information of tree size, which is the state of non-decision-making object. θOpp represents the weight for the input information of the other’s state, which is the state of a decision-maker. Both η and θOpp define the coupling of the focal player with her environment.

The lumberjack’s dilemma game in this paper is described as the dynamical systems game [2], which is formalized as a system of coupled dynamical systems that each au- tonomously adapt to their environment. In other words, players are dynamical systems with adaptation mechanisms. In the lumberjack’s dilemma game, players are coupled through the input information of their decision-making function. There are two kinds of couplings between the players. First, they are indirectly coupled via the tree; that is, a player’s action affects the tree size, which will subsequently affect the other player’s decision. Second, they are directly coupled by referring to each other’s state; that is, a player’s decision is affected by the state of the other player. The strength of the both couplings corresponds to the strategy parameters η and θ respectively as shown in Fig. 4. In models of coupled oscillators, which are usual models of synchronization in physics, coupling strength is usually given in advance. On the other hand, in the computer simulation of the lumberjack’s dilemma game in the previous section, the coupling parameters of each player, η and θ, autonomously change with generations in the players’ pursuit of higher fitness. Let us analyze how the coupling strength of players affects the game dynamics. We sample strategy parameters η, θ, ξ of the two dominant species in epoch C. When two play- ers of these two species, who will be referred to as “player 1” and “player 2” respectively, 222 Eizo Akiyama play the lumberjack’s dilemma game, their action dynamics will be attracted to the anti- phase cooperation pattern shown in the bottom of Fig. 3. Next, we modify strategy parameters of both players such that η → eη and θOpp → qθOpp; that is, the reference to the tree size (x) is multiplied by e and that to the other player’s state (θOpp) is multiplied by q. The same action dynamic that is seen the bottom figure of Fig. 3 should be observed if e = 1 and q = 1. We will change e and q in the followings subsections to see the effect of coupling strength. For example, e = 0 and q = 0 means there is no connection between the two players. (They independently decide their actions without referring to any information except one’s own state yMe.) e = 1 and q = 0 means each of the players only refers to the tree size for decision making, and so there is no informational coupling of the other player for the decision making function. We say “e controls the environmental coupling” and “q controls the player’s coupling”.

5.1. The Effect of the Environmental Coupling

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 → 0.6

0.4 0.4

0.2 0.2

(a) 20 40 60 80 100 (b) 20 40 60 80 100 Figure 5. The effect of the environmental coupling: In both figures, the horizontal axis shows the round while the vertical axis shows the states of both player; the solid line cor- responds to the state of player 1 and the dotted line that of player 2. The coupling of the players are set e = 0 and q = 0 in (a) and e = 1 and q = 0 in (b). Note that, in both figures, there is no player’s coupling (q = 0). One sees in the difference between (a) and (b) the effect of the environmental coupling on the dynamics of players’ states.

To investigate the effect of environmental coupling, let us first consider the situation where both the environmental and the player’s coupling are removed (e = q = 0), and next strengthen the environmental coupling by changing e from 0 to 1 while leaving the player’s coupling removed (q = 0). As seen in Fig. 5-(a), the two players’ states behave independently when e = q = 0 — player 1’s state converges to zero while player 2’s state is attracted to the oscillatory dynamics between around 0.1 and 0.8. What happens here is that player 1 does not cut the tree at all and that player 2 sometimes recovers her state by cutting the tree every time her state gets very low. When the environmental coupling is introduced (e = 1), both players’ states will be attracted to the oscillatory dynamics around higher values (between around 0.3 and 1.1) after the transient rounds (Fig. 5-(b)). The reference to the tree size enables the both players to cut the tree when it becomes too large, which makes the dynamics of the both players’ Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 223 states rather similar at higher level. In this case, the dynamics of the tree size works as a signal to both players’ decision making. 3

(a) (b)

Figure 6. The effect of the environmental coupling — The attractor of the each player’s state: (a) The attractor of the state dynamics of player 1 is plotted as a function of e. (b) That of player 2. In both figures, q is set to be 0 and e (the horizontal axis) changes from 0 to 3.

We have seen that the attractor of the state-dynamics in Fig. 5 is a fixed point at zero when the strength of the environmental coupling e = 0 and a periodic motion when e = 1. Let us investigate how the attractor of the state dynamics changes with e. Figure 6 shows the attractors of the dynamics of both players’ states for various e ∈ {0, 3}. The attractor of player 1’s state-dynamics changes from the fixed point to periodic motions around e = 0.23. In Figs. 6-(a) and -(b), there appears a similar tendency between the two players about the change of the attractors by strengthening the environ- mental coupling (when e gets larger than 0.23). As e becomes larger, attractors of both players’ states gradually increase their values but eventually they decrease.

5.2. The Effect of the Player’s Coupling Next, let’s see the effect of the player’s coupling in addition to the environmental cou- pling. We start from the situation where both the environmental coupling is the same as the original strategies (e = 1) and where the player’s coupling are removed (q = 0). We next strengthen the player’s coupling by changing q from 0 to 1 while keeping q = 0. Figs. 7-(a) and -(b) show how the two players’ states behave when (a) e = 1, q = 0 and (b) e = 1, q = 1. Note that Fig. 5-(b) and Fig. 7-(a) are the same. When there is no player’s coupling (Fig. 7-(a)), two players’ state-dynamics oscillate at similar level but they are not completely synchronized. When the player’s coupling is introduced (Fig. 7-(b)), on the other hand, two players’ state-dynamics show anti-phase synchronization. The player’s coupling enables the players to coordinate their actions and to cut the tree alternately at the attractor of the dynamics, such as “I will cut the tree if the state of the opponent becomes too large. But I can wait if it is too small.” That is, the state of each other’s works as a signal for the temporal differentiation of their roles and help the cooperative dynamics with anti-phase synchronization to be realized.

3On the other hand, the dynamics does not change from Fig. 5-(a) when strengthening the player’s coupling by changing q from 0 to 1 while leaving the environmental coupling removed (q = 0) 224 Eizo Akiyama

1.4 1.4

1.2 1.2

1 1

0.8 0.8

0.6 → 0.6

0.4 0.4

0.2 0.2

(a) 20 40 60 80 100 (b) 20 40 60 80 100 Figure 7. The effect of the player’s coupling: In both figures, the horizontal axis shows the round while the vertical axis shows the states of both player; the solid line corresponds to the state of player 1 and the dotted line that of player 2. The coupling of the players are set e = 1 and q = 0 in (a) and e = 1 and q = 1 in (b). Note that, in both figures, there always is the environmental coupling (e = 1). One sees in the difference between (a) and (b) the effect of the player’s coupling on the dynamics of players’ states.

5.3. Strategy Space for Productive Dynamics Finally, let’s see how large is the strategy space that can make the game dynamics profitable for both players. As we have seen in the above, the change of the coupling strength between the players gives rise to qualitative changes in the game dynamics. It should also change the profits the players can attain from the game dynamics.

(a) (b)

Figure 8. The effect of the environmental coupling on the average score: The average state of player 1 over 400 rounds is plotted as a function of e (q = 0). In (a), the range of e is [0, 3]. In (b), the range [0, 0.5] in (a) is enlarged.

Figure 8 shows how the average state of player 1 changes with the environmental cou- pling, e, when q = 0. As Fig. 8-(a) shows, both too small and too large values of e keep the average state around zero. The moderate increase of e from zero can recover the average state. However note that the recovery of the average states is not gradual but, rather, there is a phase transition of the average state around e = 0.23 as in Fig. 8-(b). We should note that the range of e in which player 1 can make some profit (enjoys non-zero state) is very limited in the strategy space— the range is between around 0.23 and 2.18. In order for players to achieve cooperative and profitable dynamics, the evolution of strategies, first of all, have to find out this limited range of strategy space. Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 225 6. Summary

In this paper, we quickly review some of the results in [3] about the formation and de- velopment of cooperative behaviors by players who are competing for a dynamic resource, and investigate these evolutionary phenomena from the viewpoint of “dynamics of coupled players” in addition to the viewpoint of “evolution of cooperation in games.” In a game like the lumberjack’s dilemma game which involves a dynamical resource with continuous value (e.g. continuous size of the tree), there can be continuously many behavioral patterns and state dynamics and there is no action that is labeled as ‘cooperation’ as in the prisoner’s dilemma game. In order for players to “cooperate” in such an environ- ment, there should be some dynamics where players can coordinate their behaviors and where players can increase their collective profits. The former corresponds to the ‘cooper- ation’ in the sense of physics and the latter corresponds to the ‘cooperation’ in the sense of game theory. As stated in Section 3., both the convex and the linear LD games share a common feature — both can represent social dilemma situation. However, the development of coop- eration rules as in Fig. 3 cannot be observed in linear LD games while it can be observed in convex LD games. The only difference between the two maps lies in the nature of dynam- ics; that is, the tree growth gradually saturates in the convex map, which makes possible the stable synchronization of players’ behaviors within some limited range of strategy space. This implies that the evolution of cooperation rules requires, in the first place, the possibil- ity of cooperative behaviors in the sense of physics, which makes behavioral coordination easier. Synchronous cooperation in games can be discussed only in the models with the view- point that players’ decision-making functions are source of dynamics and with the view- point that such cooperation can be realized when players as dynamical systems are cou- pled [26]. In this sense we need the formalization of games as coupled dynamical systems (players) rather than the algebraic form of classical game theory when considering syn- chronous cooperation. Especially, the coupling of players via the reference to the other’s state (the player’s coupling) makes the mutual coordination of both players’ internal states possible, and makes the anti-phase cooperation rules stable. Finally, the “evolution” of strategies through players’ interactions can work as a mech- anism to search autonomously for the cooperative behavior in the sense of game theory among the cooperative game dynamics in the sense of physics. The formation and transi- tion of cooperation rules in the LD game simulation became possible by the combination of the above two key elements.

Acknowledgement

This research is partially supported by the Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientists (B), 2006–2008, 18700220, and Grant-in-Aid for Scientific Research (S), 2005–2009, 17103002. 226 Eizo Akiyama References

[1] Aguirre J., D’Ovidio F., & Sanjuan M.A., Controlling chaotic transients: Yorke’s game of survival, Phys. Rev. E 69, 016203 (2004)

[2] Akiyama, E. & Kaneko, K. Dynamical systems game theory and dynamics of games, Physica D 147, 221–258 (2000).

[3] Akiyama, E. & Kaneko, K. Dynamical systems game theory II — A new approach to the problem of the social dilemma, Physica D 167, 36–71 (2002).

[4] Axelrod, R. The Evolution of Cooperation, Basic Books, New York, (1984).

[5] Axelrod, R. & Dion D., The further evolution of cooperation, Science (Washington, D. C.) 242, 1385–1390, (1988).

[6] Brenner, T., Weidlich, W. & Witt, U. International co-movements of business cycles in a ‘phase-locking’ model, Metroeconomica 53, 113–138 (2002).

[7] Brenner, T. & Witt, U. Melioration learning in games with constant and frequency- dependent pay-offs, Journal of Economic Behavior and Organization, 2003.

[8] J. Buck, Synchronous rhythmic flashing of fireflies, Quart. Rev. Biol. 63(3), 265–289 (1988).

[9] Chung, J.S., Lee, K.H., & Stroud, D. Dynamical properties of superconducting arrays, Phys. Rev. B 40, 6570 (1989).

[10] Dente, G.C., Moeller, C.E. and Durkin, P.S. Coupled oscillators at a distance: Applica- tions to coupled semiconductor lasers, IEEE J. Quantum Electron. QE-26, 1014–1022 (1990).

[11] Kuramoto, Y. Chemical Oscillation, Waves, and Turbulence, Springer, Berlin, 1984.

[12] Helbing, D., Lammer, S., Witt, U. & Brenner, T., Network-induced oscillatory be- havior in material flow networks and irregular business cycles, Physical Review E 70, 056118 (2004).

[13] C. Huygens, Horologium Oscillatorium (Apud F. Muguet, Parisiis, France, 1673), The Pendulum Clocks, Iowa State University Press, Ames, 1986.

[14] Ikegami, T. & M., Taiji, Imitation and Cooperation in Coupled Dynamical Recog- nizers, Advances in Artificial Life (5th European Conference, ECAL’99: Lausanne, Switzerland, September 1999 Proceedings), eds. D. Floreano, J. Nicoud, F. Mondada, Springer, 1999.

[15] Taiji, M. and T. Ikegami, Dynamics of internal models in game players, Physica D: Nonlinear Phenomena 134 (2), 253–266 (1999).

[16] Joosten, R., Brenner, T., & Witt, U., Games with frequency-dependent stage payoffs, International Journal of Game Theory, 2003. Dynamics of Coupled Players and the Evolution of Synchronous Cooperation 227

[17] K. Kaneko (Ed.), Chaos Focus Issue on Coupled Map Lattices, Chaos 2, 279–407 (1992).

[18] K. Kaneko (Ed.), Theory and Applications of Coupled Map Lattices, Wiley, New York, 1993.

[19] Levhari, D., Mirman, L.D., The great fish war: an example using a dynamic Cournot- Nash. solution, Bell Journal of Economics 11, issue 1, 322–334 (1980).

[20] Mirollo, R.E., & Strogatz, S.H. Synchronization of pulse-coupled biological oscilla- tors, SIAM J. Appl. Math. 50, 1645–1662 (1990).

[21] Neumann, J. & O. Morgenstern, Theory of Games and Economic Behavior, Princeton University Press, Princeton, 1944.

[22] Rapoport, A., Mathematical theory of motivation interaction of two individuals, Bull. Math. Biophys. 9, 17–27 (1947).

[23] Rashevsky, N., A problem in mathematical biophysics of interactions of two or more individuals which may be of interest in mathematical sociology, Bull. Math. Biophys. 9, 9–15 (1947).

[24] Rossler,¨ O. E., Adequate locomotion strategies for an abstract organism in an abstract environment: a relational approach to brain function in Physics and Mathematics of the Nervous System, 4 (1974) 342–369, Springer Lecture Notes in Biomathematics, Conrad, M. and Guttinger, W. and Dalcin, M.

[25] Rossler,¨ O. E., Chaos in Coupled Optimizers, Annals of New York Academy of Sci- ences 504, 229–40 (1987).

[26] Rossler,¨ O. E. Fraiberg-Lenneberg Speech, Chaos, Solitons & Fractals 4-1 125–131 (1994), Elsevier Science Ltd.

[27] Smith, H.M., Synchronous flashing of fireflies, Science 82, 151–152 (1935).

[28] Sorger, G., Markov-perfect Nash equilibria in a class of resource games, Economic Theory, Springer, 11(1), 79–100 (1997).

[29] Walker, T.J. Synchronous rhythmic flashing of fireflies, Science, 166(3907), 891 (1969).

In: Crossing in Complexity ISBN: 978-1-61668-037-4 Editors: I. Licata and A. Sakaji, pp. 229-254 © 2010 Nova Science Publishers, Inc.

Chapter 11

FRACTAL TIME, OBSERVER PERSPECTIVES AND LEVELS OF DESCRIPTION IN NATURE

Susie Vrobel* The Institute for Fractal Research Goethestrasse 66, 34119 Kassel, Germany

Abstract

This paper reviews various approaches to modelling reality by differentiating notions of time which underly those models. Basic notions of time presupposed in physical theories are briefly described and analyzed in terms of the levels of description taken into account, the interfacial cut assumed between the observer and the rest of the world, the resulting observer perspectives and the extent to which these notions are based on temporal natural constraints. Notions of time in physical theories are secondary constructs, derived from our primary experiences of time. Therefore, we must regard our theories as anthropocentric – derived from abstractions and metaphors resulting from our embodied cognition. Theories based on the notion of fractal time and fractal space-time are generalizations or alternative descriptions which allow for a more differentiated modelling of reality. The resulting temporal observer perspectives allow for further differentiation. The notion of fractal time logically precedes those of fractal space-time, as it is based on the primary experiences of time: succession, simultaneity, duration and an extended Now. Against this background, the internal differentiation of the observer and his degree of both conscious and unconscious contextualization turn out to be vital ingredients in our reality generation game. I am fully aware of the fact that the selection of concepts presented here is neither complete nor unbiased and is coloured by my own temporal observer perspective.

Introduction

"The best material model of a cat is another, preferably the same cat."1 Most of us would probably admit that Norbert Wiener’s beautiful definition can’t be beat. However, limited

* E-mail address: [email protected] www.if-online.org 1 Rosenblueth & Wiener 1945, p. 316 230 Susie Vrobel time and resources, as well as our inbuilt desire for complexity reduction, lead us to build models to describe the physical reality we are embedded in by abstracting, i.e., by focussing on certain aspects and disregarding others. This process is steered by observer-participants whose perspectives tend to differ and whose physical make-up acts as a constraint on cognition. 2nd-order cybernetics takes this fact into account by regarding the observer as a cybernetic system in its own right. 3rd-order cybernetics then contextualizes the observer- participant ontologies of the 2nd-order domain: The subjectivity of the observer-participant is described as the result of an embedding process. Contextualization of observer-participant ontologies takes account of both the system-context interface and the observer-involvement interface.2 We thus identify the steersman who creates scientific theories as an embedded observer-participant. This has not always been the case, however, although the role of the observer seems to have become more and more influential with changing paradigms. The Newtonian paradigm, which was based on a reductionist, linear notion of a container-like time and space, did not allow for observer frames. These were introduced with Einstein’s relative concept of time. Quantum mechanics suggested that there are no independent observers, but only observer- participants who generate reality by taking a measurement which induces the collapse of the wave function. 2nd-order cybernetics, synergetics and Prigogine’s chronobiology recognized the need to assign different internal times to nested systems, an approach which was integrated into chaos theory and theories of complexity. Here, the phenomenon of emergence also called for a differentiation between levels of description (LODs) in terms of interfacial cuts as well as in terms of degrees and modes of involvement. In synergetics, order parameters which enslave lower-level phenomena revealed the need to distinguish between LODs in order to describe the emergent circular causality.3 Endophysics emphasized the need to take account of the observer’s internal organization, introducing a micro-relativity.4 The internal complexity of the observer is a constraint on our generation of reality and the perspective from which we construct models. Embodied cognition implies that we have to take account both of the observer’s internal differentiation and the degree to which he is embedded into a context, i.e. his degree of involvement and participation. In order to understand the relationship between the system processes and those of the context, we need to describe interfacing in detail by simultaneously considering all process levels.5 The concept of Fractal Time renders possible such a simultaneous description. It promises to be a useful approach to modelling reality, as it is based on a scale- relativity, which takes account of nested LODs considered, the degree of the observer’s contextualization and the structure of resulting observer perspectives. Notions of fractal time are extensions, generalizations and/or re-interpretations of the concepts of time underlying classical physics, the theory of relativity, quantum theory and the notion of an internal time developed by the Brussels School.6 The need to regard physical theories as secondary concepts of the observer’s primary experience of time, as presented by Pöppel’s arguments, leads to the question as to how an

2 van Nieuwenhuijze 2000. 3 Haken 1995. 4 Rössler 1998. 5 van Nieuwenhuize 2000. 6 Prigogine 1985. Fractal Time, Observer Perspectives and Levels of Description in Nature 231 observer perspective is generated.7 My Theory of Fractal Time defines temporal observer 8 perspectives in terms of tdepth, tlength and tdensity. tdepth, the density of time, is the number of compatible temporal intervals on more than one LOD: it defines simultaneity. tlength, the length of time, defines succession as the number of incompatible temporal intervals on one

LOD. tdensity, the density of time, is measured in the fractal dimension of a temporal interval, thus relating tdepth and tlength. My approach takes account of the observer’s primary experience of time and the need to describe how an observer perspective emerges by recognizing that embodied cognition should be taken as a starting point for dealing with constraints on modelling reality. The temporal observer perspective is a result of the observer’s degree of both conscious and unconscious contextualization. A short introduction of fundamental concepts of time underlying physical theories is followed by a brief excursion into LODs and endo-/exo-observer perspectives. An overview of the observer’s primary experience of time describes the notions of simultaneity, succession, the Now and duration. As physical theories are secondary constructs of those four primary experiences, definitions of observer perspectives should be based on those experiences. My Theory of Fractal Time, whose concepts are based on these primary experiences of time, is presented as an epistemological prerequisite for models of reality, including theories of fractal space-time. The last part of this review looks at the idea of taking embodied cognition as a starting point for dealing with constraints of modelling reality: The fractal temporal observer perspective is presented as the result of the observer’s degree of conscious and unconscious contextualization and as a pre-requisite for notions of fractal space-time.

Notions of Time

Notions of time differ with respect to the ontological status assigned to time and the epistemological assumptions made. They also vary in the way possible resulting temporal relations are described. Questioning the reality of time is a fertile starting point, as it leads us to make explicit assumptions about the observer-world relationship. So, is time real? That’s a toughie. If we assume that we may describe something as real if it exists independent of an observer, our approach is not a scientific one, but based on faith. Therefore, we are limited to an inter-subjective generation of inter-objective notions of time, which we derive from conceptual metaphors. As most of these metaphors are grounded in our sensory-motor system9, embodied cognition should be taken as a starting point when we contemplate basic notions such as time, on which our physical theories are based. Subjective time is just as real to the individual observer as any inter-objective concept of time. A phenomenological approach takes account of both by describing the world as an interface reality.10 However, this does not imply that time is a purely subjective phenomenon generated by observers. But although the observer does not generate time, he may modify its structure, which may be

7 Pöppel 1989. 8 Vrobel 1998. 9 Lakoff and Núñez 2000. 10 Rössler 1995, 1996, 1998. 232 Susie Vrobel understood as an interference pattern resulting from the observer’s embedding in a context. This idea is discussed below. In order to solve the question as to whether the structure of time is a given ordering principle or one generated by individual observers, we have to look at our options in approaching this question. Our access to the world is gained through empirical knowledge – this fact makes it difficult to address the notion of time, because we are always already embedded in the subject matter we want to define. It is logically not conceiveable that we jump out of our embedding temporal structure and describe it from an outside, a-temporal perspective. We are creatures who live in time, with embodied minds whose internal processes are of a temporal nature. Therefore, a non-circular definition of time is not possible. Starting from our own empirical knowledge, we differentiate between familiar aspects of time, such as earlier-, later-, and between-relations (succession), phenomena we perceive as being temporally compatible (simultaneity), the extension of a temporal interval (duration), and the flow of time from the past via the present into the future, which cumulates in our consciousness of the present (the Now). By describing change in terms of earlier-, later- , and between-relations, we refer to the time of physics, t, an inter-objective notion of time. This description reduces time to a parameter and thus lacks vital concepts which are essential to the observer’s reality generation: the notions of simultaneity, duration and the Now. The Now does not have a counterpart in the time of physics. However, it is a significant concept, as, epistemologically speaking, it is our only access to the world. In fact, the Now is all we have. Even when we remember the past and anticipate future events, we do this in the present, in our Now. Our subjective experience of time also includes the flow of time from the past through the present into the future. Events appear to be, first, in the future, then present, and, finally, in the past. This notion of time passing finds no counterpart in the time of physics either. The notions of time we presuppose act as constraints on the development and evaluation of a scientific theory. Therefore, theories which include temporal notions such as the Now will differ from those who fail to include a temporal observer perspective. Newton’s absolute time does not allow for a privileged observer perspective defined by an individual Now, let alone for an internal differentiation of the observer. Both time and space are portrayed as container-like structures.11 Einstein dismissed Newton’s container spacetime12 and defines simultaneity and duration in terms of the observer’s event horizon. His Special and General Theories of Relativity are based on a concept of time which cannot be conceived of independently of the observer’s position. As the speed of light acts as a constraint, the concepts of simultaneity and duration are dependent on the observer frame: There is no absolute reference system. The relativity of simultaneity and time dilation are examples of the observer-centeredness which constrain statements about temporal relations. Einstein does not deny the existence of the Now, but, by

11 "Absolute, true and mathematical time, of itself, and from its own nature, flows equably without relation to anything external, and by another name is called duration. (...) For times and spaces are, as it were, the places as well of themselves as of all other things. All things are placed in time as to order of succession; and in space as to order of situation." (Newton 1687/1962, p.6) 12 "The idea of the independent existence of space and time can be expressed drastically in this way: If matter were to disappear, space and time alone would remain behind (as a kind of stage for physical happening)." (Einstein 1920/1988, p. 144) Fractal Time, Observer Perspectives and Levels of Description in Nature 233 differentiating between subjective and objective time, he assigns the Now a place which only exists in our consciousness. To him, subjective time is a manifestation of our consciousness in action and has no physical counterpart. His introduction of inertial systems and observer frames did allow for the notion of an observer perspective. This perspective, however, was not based on an internal differentiation of the observer, but merely on his relative position in space-time and the resulting event horizon. Quantum Theory recognized that the observer-participants actively take part in the outcome of measurement results. The notion of an observer-participant who takes a measurement and thus causes the collapse of the wave function, allows for a differentiation of individual observer perspectives and a "private" Now. Wheeler defines the role of the observer as that of an active, selecting questioner.13 He sees reality as generated by our questions and the resulting selection process therefore as subjective. The subject plays a constituting role in the generation of reality. The measuring process correlates with the transition from the possible to the actual. The present, our Now, within which the decision on the way of measuring and the measurement itself is taken, may be interpreted as an indivisible whole consisting of particles and the experimental set-up. This interval of duration is indivisible, as the mere question about individual processes within that interval is meaningless. Most interpretations of Quantum Theory allow for the notion of an observer perspective which entails the generation of a Now, but regard the observer’s internal differentiation as being irrelevant to the experimental set-up. It is only releveant insofar as the question posed is the result of a subjective decision-making process. Everett’s interpretation, which suggests understanding the uncollapsed state vector as an objective description of the world (Many World Theory), would allow for as many simultaneous observer perspectives and Nows as there are possible interfacial cuts and measuring results. Prigogine introduces, in addition to the astronomical time t, an internal time T, which denotes the internal age of a system. However, this internal time T, which results from the number of transformations within a system, presupposes a framework time t, as the processes which generate T take place in time t. The baker transformation, for example, presupposes a framework time t, within which a succession of foldings can be imagined to take place. Prigogine did not develop his internal time T to generate an observer perspective. Rather, he conceived of it as an operator, as in quantum mechanics. (As in the baker transformation, an "average" time is defined which corresponds to the superposition of the individual partitions.) However, T may well be interpreted from within the system T describes, provided an embedded observer-participant is assumed to make that interpretation, and thus form an internal temporal observer perspective. T is non-local, i.e., it depends on the global topology of a system and may be used to describe the internal temporal differentiation of any system undergoing change. But even if we know the internal age of a system, we cannot associate it to a local trajectory. The individual local times (partitions in the baker transformation) which result in a global internal

13 "We used to think that the world exists 'out there' independent of us, we the observer safely hidden behind a one- foot thick slab of plate glass, (...), not getting involved, only observing. However, we’ve concluded in the meantime that that isn’t the way the world works. (...) What we thought was there is not there until we ask a question." (Wheeler 1994, pp. 15/16). 234 Susie Vrobel time may or may not be fairly close to the average global time. Prigogine and Stengers present a transparent example in chronogeography.14 Prigogine interprets sensitive dependence on initial conditions (SDIC) as an infinite entropy barrier, which results in irreversibility. SDIC was discovered by Edward Lorentz in the early 1960s15. But neither Lorentz nor the pioneers of Chaos Theory, who thrived on this notion, did explicitly define a new concept of time. However, as many strange attractors happen to display a self-similar stucture, Chaos Theory actually describes temporal fractal structures in phase space. This fractality, though, is a virtual one and does not allow for the description of both succession and simultaneity, as the fractal pattern of the attractor which finally emerges no longer contains the information of the successive vectors from which it evolved. Succession is lost in a phase space portrait. (In contrast to the phase space portrait representation, the notion of fractal time introduced in the second part of this paper takes account of both succession and simultaneity).

Levels of Description: The Endo and Exo-Perspective

One way of comparing different approaches to modelling reality consists of considering the number of LODs taken into account by a model. The notion of LODs was discussed in detail by Hofstadter16 in the context of holism vs reductionism, tangled hierarchies and the inviolate level. For the sake of brevity, I shall not get into detail on his notions of LODs. In the present context, it shall suffice to point out that LODs come in two disguises we are not necessarily aware of: Either or both as abstractions within a fixed coherent perspective and/or as interfacial cuts between the observer and the rest of the world (the endo- and exo- perspective). Descartes’ interfacial cut between res cogitans and res extensa created two LODs on which physical theories thrived for centuries. However, recent advances in cognitive science17 suggest that the Cartesian Cut is not helpful, as it disregards the fact that the mind is embodied, i.e., that most of our cognitive performances are limited by perceptual constraints and conceptual metaphors, which are grounded to a large extent in our sensory-motor system. All abstractions we perform are constrained by such conceptual metaphors.18 Therefore, taking the Cartesian Cut between res cogitans and res extensa seriously, prevents us from accessing the foundations of our cognitive performances. The Newtonian paradigm is based on the exo-perspective. Like Laplace’s demon, an observer is considered to have, theoretically, at his disposal all information which is necessary to compute and predict the future development of a system. The act of observation, including the observer’s perspective and internal organization, is regarded as having no impact on the measured results. Einstein introduced observer frames, defined their

14 "When we look at the structure of a town, or of a landscape, we see temporal elements interacting and coexisting. Brasilia and Pompeii would correspond to a well-defined internal age, somewhat like one of the basic partitions in the baker transformation. On the contrary, modern Rome, whose buildings originated in quite different periods, would correspond to an average time exactly as an arbitrary partition may be decomposed into elements corresponding to different internal times." (Prigogine & Stengers 1985, pp. 272-273) 15 Lorentz 1963. 16 Hofstadter 1980. 17 e.g. Storch et al 2006. 18 Lakoff & Núñez 2000. Fractal Time, Observer Perspectives and Levels of Description in Nature 235 boundaries, and thus introduced the observer’s relative position as a vital notion which determines his perspective and measuring results. However, he did not take account of either the observer’s internal differentiation nor the interference the act of observing implies. Quantum mechanics took this step but did not consider the observer’s internal make-up when setting the Heisenberg Cut. Prigogine introduced the internal time T (as an operator), the internal age of a system, which differs from the astronomical time t. This made it possible to describe the internal differentiation of a system undergoing change. Although the internal time T depends on the global topology of the system, it is also an exo-physical model, as it does not describe T from the perspective of an embedded extended observer. Thus, the above models set differing interfacial cuts, none of which explicity take account of the impact of the observer’s internal differentiation on the measuring result. Rössler’s microconstructivism states that we have to take into account the microscopic movement within the observer when we model reality.19 The embedded observer is confined to an endo-perspective, as he has no access to the thing-in-itself: All the endo-observer may talk about is the world as it appears on his interface – a purely phenomenological account. The exo-perspective remains inaccessible to mortals – it is an idealization which implies the idea of a bodyless super-observer such as Laplace’s Demon. Rössler’s differentiation between the endo- and exo-perspective reveals yet another Gödel limit embedded observers face: our Now manifests itself as our interface reality, a very private event horizon, generated by the microscopic movements within the observer. The more an observer is aware of his internal structure and dynamics, the more differentiated is his Now, his temporal interface. And the Now is our only access to the world. The setting of the interfacial cut determines the notion of time we presuppose when modelling reality. However, there is a limit to our potential awareness of not only the microscopic processes within the observer, but also to his cognitive structures. Most of our thinking is inaccessible to conscious introspection, as we cannot monitor it on low-level thought processes which shape our observer perspectives.20

Simultaneity, Succession, Duration and the Now: Physical Theories are Secondary Constructs of Our Primary Experiences of Time

The concepts of time presented above vary significantly with respect to the significance given to the observer’s internal differentiation and perspective. However, according to Pöppel, they are all secondary constructs, derivatives of our primary experience of time.21 In order to provide a basis for our primary experience of time, natural scientists come up with natural laws whose implicit concept of time is compatible with our perspective and our description of Nature. This implicit concept of time must, however, be appreciated as a

19 Rössler 1995, 1998 and personal communication. 20 Metzinger suggests that this inaccessibility is a result of the fact that the perspective-generating mechanisms are transparent to us because they occur on time scales which are far too fast for us to be accessed consciously. Metzinger 2006. 21 Pöppel 1989, 2000. 236 Susie Vrobel derivative: Pöppel’s argument essentially consists of the fact that we, as human beings, have to cosider a priori the performances carried out by our brains when considering all theories and concepts we have generated. Therefore, all physical theories are necessarily anthropocentric.22 Hence, he concludes that all physical concepts of time are secondary constructs we have developed, as we can only approach time and duration through the filtering processes which result from the limitations of our perceptual apparatus and the integrative performances carried out by our brains. Pöppel describes his approach as neuro-scientific, taking the subjective experience of time as a starting point. In this context, he differentiates between simultaneity, succession, the Now and duration. These constitute our primary experience of time, which, he suggests, precedes the physical and semantic concepts of time.

A. Simultaneity

Our senses both render possible and limit our experience of simultaneity. The differing ways of experiencing simultaneity correlate with various sensory perceptions. We perceive signals we hear as non-simultaneous if they are separated by an interval of approx. 6 milliseconds. Below this threshold, signals are perceived as being simultaneous. Our visual senses show a different sensibility to simultaneity. Here, impressions which are separated by an interval of 20 to 30 milliseconds are experienced as non-simultaneous. If that separating interval is shorter, impressions are perceived as simultaneous. Our tactile sensibility is settled in between the audio- and visual threshold for simultaneity, roughly at 10 milliseconds. The simultaneous perceptions via various senses means that these are temporally nested, as the acoustic and optical spaces are not congruent in terms of their smallest perceivable temporal units. Our brains make up for this incongruence and integrate acoustic and visual inputs into a gestalt. Pöppel defines a simultaneity horizon, which spans at a radius of 10 – 12 meters from the observer (this distance varies slightly for different observers). This is the distance at which audio and visual signals are perceived as being simultaneous. He suggests that this horizon is probably also a constraint on our world view. (Note that, at this distance, the non-congruence of audio and visual signals does not result from the difference of the speeds of light and sound – it is a constraint inherent in our perceptual apparatus.)23

B. Succession

The experience of succession is a little more tricky, as individual events need to be identified before they may be put into some successive order. Events must be separated by at least 30-40 milliseconds to be perceived as being successive. To explain this phenomenon, a qualitative jump is necessary, which connects the processing of a stimulus by our sensory

22 For physical concepts of time, this entails that "... the search for the conditions of rendering possible any experience of time in the real world is determined by mechanisms of our brains, which condition our experience of time. It is not possible to conceive of a theoretical concept of time in physics (e.g. Newton's 'absolute' time), which pretends to exceed experience of time. Therefore, I suggest to regard the physical concepts of time (be it the Newtonian one, that of Einstein or Prigogine) as a secondary construct derived from our primary experience of time." [my translation, Pöppel 1989, p. 380] 23 Pöppel 2000, pp. 38-42. Fractal Time, Observer Perspectives and Levels of Description in Nature 237 organs to the processing which goes on in our brains – the point where we knock sense into what we have heard or seen.

C. Duration and the Now

The experience of the Now is based on yet another performance carried out by our brains, namely integration. Impressions are experienced as present when our brain assembles them into perception gestalts. Pöppel exemplifies this idea with an example from language24 which is reminiscent of Husserl’s example of hearing a succession of musical notes as a tune (Husserl’s example is described below in the context of the notion of fractal time): Pöppel’s definition of something being present is based on a clustering of perception-related experiences which are based on meaningfulness, i.e., perceptual gestalts are constructed by our brains. From this, we may conclude that the duration of the individual present depends on the mental capability of the person who experiences an event. The richer and more differentiated the language, the more complex are the perceptual gestalts this person may construct and, as a result, the more extended this person may define his present. In order to render possible the experience of duration, two components are necessary: firstly, the "identification and integration of perceptual gestalts"25 and, secondly, memory, by means of which time may be skipped and even be overcome through reflexion by providing past experiences to our reflective consciousness.26 Pöppel’s rod for measuring duration is situated within ourselves: the smallest frequencies define the shortest temporal extension of events, which may be generated into perceptional gestalts by means of integration – a subtle performance carried out by our brains, of which we are unaware. Gestalts are indivisible constructs per definitionem. However, if they display internal differentiation, and therefore, temporal extension, they define a certain duration, albeit an indivisible one. The notion of an indivisible Now was introduced by Henri Bergson, half a century before the term gestalt was coined. Bergson’s term durée denotes a non-divisible whole.27 The Bergsonian concept of durée does not accept succession within the non- divisible whole of the duration, although it does contain the past.28

24 "Successive events are perceived as being present only up to a certain limit. An example from language: the word "now" is is made up of successive phonetic events. But when I hear the word "now" now, I perceive the whole word now and not a succession of individual phonetic entities. This indicates another performance of brains at work, namely the integration of temporally separated events in perception gestalts, which, in each case, are present, i.e. constitute the Now. The upper limit to this integration of perception-related experience ranges between two and four seconds. (...) What we experience as being present is not a point without extension on the time axis of classical physics but meaningful events which have been integrated into gestalts." Pöppel 1989, p. 372 (my translation) 25 Pöppel 1989, p. 374 26 Pöppel 1989, p. 375 27 "Let us therefore rather imagine the image of an infinitely small elastic band, contracted, if it were possible, into a mathematical point. We slowly start stretching it, so that the point turns into a line which grows continuously. Let us focus our attention not on the line qua line, but onto the action of pulling it. Notice that this action is indivisible, given that it would, were an interruption to be inserted, become two actions instead of one and that each of these actions is then the indivisible one in question. We can then say that it is not the moving action itself which is ever divisible, but the static line, which the action leaves under it as a trail in space." Bergson 1909, p. 8 (my translation). 28 "The internal duration is the continuous life of a recollection which extends the past into the present, so that the present may clearly contain the perpetually expanding image of the past (...).Without this continuing existence 238 Susie Vrobel

Below, the term Now is used in Husserl’s sense, insofar as it implies the properties of an extended present which exhibits deep nesting of protentions and retentions. The notions of the prime and time condensation, which will be introduced in my Theory of Fractal Time, are based on the Bergsonian concept of duration, which defines a present which implies the past and is, at the same time, indivisible. (Fractal time not only nests past events into the present but also includes the notion of anticipation, which adds the idea of the future being embedded into the Now.) The four elementary experiences of time briefly described above constitute our primary experience of time. Our experience of time thus turns out to be something given and becomes the starting point of philosophical questions concerning the conditions which render experience of time possible. These conditions are generated in different ways, depending on the underlying belief system. Pöppel showed that both physical and semantic approaches, and, therefore, physical theories, are anthropocentrical and thus derivatives of our subjective experience of time.29 The same is true for mathematics: Lakoff & Núñez described how mathematical concepts are derived from embodied cognition.30 The idea that physics should be based on neuropsychology is also supported by Fidelman’s work.31

Fractal Time

Any physical theory should thus take account of the fact that its assumptions and observations are secondary constructs based on our primary experience of simultaneity, succession, the Now and duration. Below, my fractal notion of time is introduced, which is based on our experience of time, in particular on our experience of duration and the Now. It defines the primary experiences of time as a prerequisite for modelling reality. Simultaneity is generated by our brains by integrating events which are connected by during-relations, and we generate succession by integrating incompatible perceptional gestalts, i.e., events which cannot be connected by during-relations, on specific LODs. Against the background of my notion of fractal time, perceptional gestalts may be defined as temporal natural constraints. Most existing notions of fractal time are really defining concepts of a fractal space-time which differ with respect to the way LODs are taken into account and interfacial cuts are set. In his pioneering work on fractals, Mandelbrot coined the terms fractal and fractal time and introduced a differentiation resulting from taking into account nested LODs, temporal intervals within a time series as defined by their degree of resolution.32 He thus provided a way of measuring and comparing time series with respect to their temporal density. This he achieved by assigning a fractal dimension to spatio-temporal structures. The arbitrary choice of the measuring rod applied by an observer determines the outcome of the measurement, which depends on how much detail is taken into account. He made explicit that the role of the

of the past in the present, there would be no duration, only the existence of the moment." Bergson 1909, pp. 27-28 (my translation). 29 "As our brain has only one perspective on the world (whose extent we cannot possibly imagine) as a result of its evolutionary history, all physical theories are necessarily a view through only one (namely our) window to the world. As a result, physical theories are necessarily anthropocentric." Pöppel 1989, p. 380 (my translation) 30 Lakoff & Núñez 2000. 31 Fidelman 2002, 2004 a,b,c and personal communication. 32 Mandelbrot 1982. Fractal Time, Observer Perspectives and Levels of Description in Nature 239 observer is crucial to the extent that he defines the measuring rod and the LODs taken into account when observing or measuring a time series. However, Mandelbrot’s observer’s perspective is an external one, as there is no internal observer, let alone constraints on such an observer’s perception, so our primary experiences of time do not play a role. Some theories of fractal space-time which have been developed in recent years do take account of our primary experience of time. They are intriguing as they define fractal space- times which render possible essentially new approaches and ways of modelling reality. Nottale’s Theory of Scale Relativity, which extends Einstein’s principle of relativity to scale transformations, assumes a continuous but non-differentiable fractal space-time whose geometry is resolution-dependent.33 He actually applies the notion of fractality to the geometry of space-time itself. Space-time resolutions are inherent to the physical description and therefore of universal status. Thus, Nottale’s approach, which aims for a full scale- relativistic physics, describes a differentiation of LODs which both implies and exceeds differentiations within the measuring chain. The principle of scale co-variance requires that the equations of physics are invariant under scale transformations. Nottale’s theory allows for far-reaching conclusions in almost any branch of science, be it biological, inorganic or economic phenomena. He makes the appealing suggestion that evolutionary analogies between phylogeny and ontogeny (cell differentiation, tissue and organ building, etc) may be manifestations of an underlying memory phenomenon which expresses itself on each level of description, at each scale of organization.34 Together with Timar35, Nottale focusses on temporal structures from the endo- perspective, which takes account of the fundamentally scale-relativistic character of our perception of time.36 Their approach includes an extension of Boscovitch’s invariance principle to scale-invariance and may be applied, among other fields, to the description of certain mental disorders as a temporal incommensurability between observers as well as between an observer and his environment.37 El Naschie’s Cantorian space-time is a fractal structure inherent in space-time itself.38 It is based on a Peano-like Cantorian geometry, which has also been suggested by Ord (a Peano- Moore curve).39 Among many other physical phenomena, Young’s double-slit experiment may be re-interpreted by the Cantorian space-time model by assuming that the is a result of the geodesic waves of space-time itself. El Naschie’s e-infinity, a discrete hierarchical fractal space-time with infinite dimensions, allows for the description of both relativity and quantum particle physics. Dubois’ anticipatory systems contain a model of themselves and are thus fractal structures, in essence.40 The present state of a system is calculated not only on the basis of its past but also depends on its future (or, potentially future) states. Dubois differentiates between weak and strong anticipation. Weak anticipation, which is based on prediction, is generated from an exo-perspective, whereas strong anticipation, which implies that the

33 Nottale 2001. 34 Nottale et al 2002. 35 Timar 2003. 36 Nottale 2002. 37 personal communication 38 El Naschie 1995, 2004. 39 Ord 1983. 40 Dubois 1988, 2001. 240 Susie Vrobel anticipatory faculties are inherent in the system itself, is the result of computing from an endo-perspective. While exo-anticipation is generated when a system models a time series which describes external systems, endo-anticipation is based on the system’s eigen dynamics – it is a property embedded in the describing system itself. Examples of strong anticipation are Dubois’ notions of incursion and hyperincursion which define a computational method that takes into account the future states of a system in order to calculate its present state. Dubois’ approach is an example of the idea that any complex system which is capable of reflecting and steering its own behaviour must be able to generate a model of itself and map this model onto itself. The fractal approach has also been applied as a method of analyzing time series. Flicker- noise spectroscopy (FNS), a method developed by S.F. Timashev to measure time series on several LODs simultaneously, employs the idea of using a fractal measuring chain in order to recognize fractal structures in a chaotic signal.41 The theories and methods described above all imply the idea of scaling structures, either as a property of the fabric of our universe, a generation mechanism or a fractal measuring method. These approaches open up fundamentally new ways of interpreting physical phenomena and have produced a wealth of concepts which explain the underlying structure of the manifestations of reality in a new way. But to link these approaches to our primary experiences of time, we have to fall back onto notions conceived of nearly 70 years ago. The first person to develop a theory of fractal time which takes account of our primary experiences of time was the German phenomenologist Edmund Husserl.42 Although he did not use the term fractal, he implicitly introduced this notion half a century before Mandelbrot’s seminal book appeared. Although Husserl failed in his attempt to show that time is generated by the subject (which was the purpose of his treatise on the phenomenology on the inner consciousness of time), he implicity described the notion of fractal time by introducing a nested model of the Now. His model is based on the observer’s perceptions and the resulting temporal observer perspective. Husserl’s notion of the Now is based on the modes of empirical knowledge retention, consciousness of the present, and protention. As the potential cumulation point of all retentions and protentions, the consciousness of the present represents future events by anticipating them and past events by reflecting them, in a modified way, in the Now. A present which hosts both retention and protention must exhibit extention. Husserl shows that it is necessary to assume the concepts of retention and protention in order to understand how we perceive change, exemplified by our skill to perceive not only a series of isolated notes, but a tune43. When we listen to a tune, we hear a succession of musical notes. But how come that we do not perceive simply a succession of unrelated notes – that we are able to hear a tune? We connect the note we have just heard with the present one and the tone we anticipate to follow it, all within our extended Now. By repeatedly remembering a tone (retention) and anticipating the next tone (protention), we generate a nested temporal pattern within the Now. Thus, the observer creates a simultaneity of retension, the consciousness of the present and protention. This simultaneity shapes our nested Now.

41 The primary idea of FNS is to disclose information hidden in correlation links which are present in a sequence of various irregularities (spikes, jumps, discontinuities in derivatives of different order) that occur in the measured dynamic variables at all levels of spatiotemporal hierarchy of systems under study. Timashev 2006. 42 Husserl 1928. 43 Husserl 1928. Fractal Time, Observer Perspectives and Levels of Description in Nature 241

If we had no memory of the preceding note and no anticipation of the next one, we would only perceive a succession of isolated, uncorrelated notes. However, as we are able to perceive a tune and not just a succession of isolated notes, we must assume the Now to be extended and provide for both succession and simultaneity. Reiterated nestings of successive overlapping and simultaneous events within the Now generate a nested, fractal structure. This is true for any perception of change: We have to assume a nested structure of the Now to explain our ability to perceive a tune or any other time series as a meaningful entity. I therefore assumed a nested structure of the Now which contains overlapping events which add the dimension of simultaneity to that of succession. Without simultaneity, no before and after relations, no correlated succession, would be conceivable, as simultaneity creates the framework time which connects otherwise isolated events into a before and after relation. Husserl failed to show that time is created by the subject.44 However, his nested model of the Now remains convincing. Below, I shall describe how the observer, although he does not generate time, does, through his choice of nestings, generate the structure of the Now. In my Theory of Fractal Time, I have adopted Husserl’s notion of a nested Now.45 The concepts of duration, succession and simultaneity are based on our primary experiences of time. One aim of my theory was to provide a means of quantifying the internal stucture of the observer’s interface by differentiating between the length of time, tlength, the depth of time,

tdepth, and the density of time, tdensity. These concepts allow us to describe a nested temporal perspective and the structure of our temporal observer interfaces.

tlength is the number of incompatible events in a time series, i.e., events which cannot be expressed in terms of during-relations (simultaneity). tlength defines the temporal dimension of succession for individual LODs.

tdepth is the number of compatible events in a time series, i.e., events which can be expressed in terms of during-relations. tdepth defines the temporal dimension of simultaneity and provides the framework time which allows us to structure events in tlength on individual LODs. 46 tdensity is the fractal dimension of a time series. It describes the relation between compatible and incompatible, i.e., successive and simultaneous events, the density of time.

N.B.: tdepth logically precedes tlength, as there is no succession without simultaneity.

As an example for a fractal time series, consider any multi-layered signal. To illustrate the idea with audio signals, imagine the frequency ratios of musical notes which are played

44 Bieri shows that Husserl's approach is contradictory, being based, on the one hand, on the timeless character of the subject but, on the other hand, describing reflexion in the consciousness of the present as a succession: "One will not be able to avoid interpreting this 'succession' as a real time structure. This is because it is phenomenologically inconceivable that a formally possible thought of a consciousness first constructs a succession and then places itself into that very succession and only in doing so manages a temporal presentation of its data." (Bieri 1972, p. 197, (my translation). 45 Vrobel 1998, 2004, 2006a. 46 There are many ways of determining fractal dimensions, among them the Hausdorff dimension, Mandelbrot’s self-similarity dimension and Barsnley’s box-counting method. The latter provides the most general approach, as even a plane-filling structure may be described in terms of a fractal. Barnsley shifts the notion of fractality into the observer perspective. This allows a description of a fractal perspective as the result of re-iterated Nows nested by the observer. (Mandelbrot 1982, Barnsley 1988). 242 Susie Vrobel simultaneously. The least complex frequency ratio between two musical notes is 2:1, which defines the interval between them as an octave. The note A played on an oboe, for example, has a frequency of 440 Hz. The next higher A played on this musical instrument would have a frequency of 880 Hz, and so on. The nested overtones generate a cascade of embedddings, whose structures are translatable into each other, as the overtones are integer multiples of the fundamental frequency.47 It is possible to provide a translation between the individual nested LODs if the pattern displays a self-similar structure, i.e., if the embedded LODs host structures which are 48 identical to those of the embedding LODs, albeit of different extension in tlength. The resulting commensurability gives rise to our notion of consonance.49 In the musical example of nested overtones, consonance is created by overlapping frequencies which are easily translatable into each other in terms of tlength and tdepth. The notions of tlength and tdepth suffice to define a temporal observer perspective. tdensity is implicit in the idea of a temporal perspective, as it describes the relation between the number of nestings and the successive events on their respective LODs. These two temporal dimensions ( tlength and tdepth) must be assumed in order to explain our perception of a multi-layered signal.50 The Newtonian metric of time may be defined as a special case of fractal time metrics.

Figure 1. Triadic Koch island.

47 Vrobel 2006e. 48 This assumes that the observer’s internal make-up is capable of differentiating between these LODs. 49 "The idea of consonance is ultimately grounded in the notion of commensurability, an essential in Greek mathematics. We recognise consonance when we perceive a certain number of vibrations of one frequency exactly matching a certain number of another frequency." (Fauvel et al 2003). 50 such as the Risset scale (cf Vrobel 2006d) Fractal Time, Observer Perspectives and Levels of Description in Nature 243

Imagine a fractal clock (as exemplified by the triadic Koch island, see Figure 1) which is run by an infinite number of pointers attached to the perimeter of the triadic Koch island, with all pointers ticking away simultaneously, each at its own speed. This fractal clock ticks away just like any ordinary clock, except that there is an infinite number of pointers instead of just two (or three). The infinitely nested structure of the triadic Koch curve exhibits an infinite number of intervals, which the pointers of a fractal clock have to tick away. While pointer no. 1 ticks only three times (per lap), pointer no. 2 is ticking six times, pointer no. 3 is ticking eighteen times, and so on, ad infinitum (see Figure 2).

t depth: 3 ticks during 12 ticks during 48 ticks during 192 ticks during 768 ticks during

Figure 2.

If projected onto a one-dimensional straight line, the infinitely nested structure of the triadic Koch curve forms a continuum, and thereby, a Newtonian metric: the set of points generated in this way is the set of rational numbers. Therefore, the Newtonian metric may be defined in terms of fractal time, as t length of the nesting level , i.e., tdepth = .

Temporal Natural Constraints

The choice of the Koch curve as an example of a fractal clock is only for the didactic purpose of illustrating the nested structure of this fractal, be it spatial or temporal. Scaling behaviour in nature is usually limited by upper and lower boundaries between which a fractal and possibly self-similar domain manifests itself.51 I have introduced the concept of temporal natural constraints, with the Prime being the smallest interval in a temporal nesting cascade. The Prime is defined as an interval which cannot host further nestings: it is a nesting constraint. It defines the smallest interval within a nesting cascade, a gestalt which is indivisible in the Bergsonian sense.52 No limit to more extended, embedding length scales is assumed, as a phenomenological model of nested Nows describes individual observer perspectives whose depth is determined by the observer’s embedding performances. For as long as the observer is capable of embedding new Nows, the nesting cascade has no upper limit.

51 Cramer points out that objects we define fractals in Nature, such as coastlines, deltas, ferns, etc., exhibit only a limited scale-invariance: "The concept of the fractal dimension and self-similarity is, to begin with, a mathematical one. For real physical and chemical objects, diffusion curves, surfaces of crystals or proteins, self-similarity will never be fully realized for all scales of length. There is an upper and a lower limit for it." Cramer 1988, p. 172 (my translation). Cf. also: Olsen et al 1987. Nottale equates such an impassable lower length-scale with the Planck scale and the upper one with the cosmic length scale which is related to the cosmological constant. (Nottale 2001). 52 Cf.: Bergson 1909. 244 Susie Vrobel

If the structure of the Prime, which recurs on all LODs within a nesting cascade, is set as a constant, a scale relativity emerges: time is "bent" with respect to the Prime Structure Constant (PSC).53 The distortion of the PSC allows us to formulate internal relations within a nested structure and thus translate between LODs. The "bending" of time in relation to the Prime requires congruence on all or several LODs. If such congruence is achieved, say, by an observer who registers the PSC only and disregards the lengths of the interval that structure "covers", condensation occurs54. Condensation is a property generated by congruent nestings. It can be measured by the quantities of condensation velocity v(c) and condensation acceleration a(c). The basic quantities for the determination of v(c) and a(c) are tdepth and tlength. The quotient of tlength 55 of LOD 1 and tlength of LOD 2 equals the condensation velocity v(c) for LOD 2  LOD 1 (provided the units of both LODs can be converted to one another). For scale-invariant structures Figure 356 such as the Koch curve, v(c) is identical with the scaling factor s. Thus, for Figure 3, the condensation velocity v(c) for the Koch curve is LOD1 (4/3) / LOD2 (4/9) = 3; LOD2 (4/9) / LOD 3 (4/27) = 3; ...etc. The condensation acceleration is constant: a (c) = 1.

Figure 3.

There are TNCs on possible embeddings, generated both by physical structures of the outside world and the observer’s internal differentiation. As Pöppel showed, some of these constraints on our experience of time and duration result from the limitations of our perceptual apparatus. TNCs may be selection effects. Limitations to the structurability of time

53 Vrobel 2005a, 2006e. 54 Vrobel 1998, 2000. 55  denotes nested in. 56 Vrobel 1998, p. 45. Fractal Time, Observer Perspectives and Levels of Description in Nature 245 by the observer are very likely a success story, as they render possible communication, travel, trade and science. Without such constraints, no boundaries could have been set up within which limited entities may be defined. Within the framework of my notion of fractal time, LODs and primary experiences of time may be conceived of as TNCs. More complex structures such as metaphors and gestalts also act as TNCs, as they are pre-attentively generated indivisible wholes and thus not accessible to introspective analysis.57

Observer Perspectives: The Fractal Temporal Interface

The terms succession and simultaneity, duration and the Now are not absolute concepts, but relative notions which presuppose an observer perspective, as the truth values of statements like "A happens before B", "A is simultaneous with B" or "A covers the same temporal interval as B" are context-dependent, i.e., they only make sense if defined for a particular observer perspective. Therefore, the temporal nesting of events shifts the property of fractality (simultaneity) to the eye of the beholder, the observer-participant. An observer constructs a perspective by generating depth. This is true for both spatial and temporal perspectives. By representing objects at different distances from the observer as being of different sizes, he generates an invisible reference frame. For temporal intervals, the temporal perspective is generated by the nesting of multi-layered signals and the reiterated retention and protention performances the observer carries out. The notion of perspective presupposes simultaneity of different subsets. No perspective would arise if we observed the 58 contents of different subsets successively, as no tdepth would be generated. With this in mind, observer types may be described. A non-fractal observer perceives only isolated notes in a tune or isolated events in a time series, and is thus only be able to observe successive events. He cannot generate tdepth, as simultaneity and memory formation would be unknown to him. Therefore, he is not capable of generating a Temporal Fractal Perspective through reiterated nesting performances.59 A non-fractal observer lives in an eternal succession of unconnected Nows, in which no learning or reflection can take place. A fractal observer, in contrast, is capable of perceiving events on a number of LODs, and therefore able to generate a Temporal Fractal Perspective. A fractal observer experiences succession and simultaneity of events directly, in real time. Unless an observer is impaired, e.g., by a neurodegenerative disease, he generates both succession and simultaneity. But although most of us are fractal observers, our awareness of our nested perspective is limited. The suspicion that there are both nested and non-nested perspectives dawns on us when we face situations in which our expectations are not met or in which we stumble over linguistic curiosities. One such curiosity is the name of the yesterday- today-tomorrow shrub. The colour of its blossoms changes within days from deep violet-blue to a light blue and finally to white. However, as this slow fading of the colour is staggered, the shrub displays blossoms of all three colours for a longish period of time. Whoever coined the name yesterday-today-tomorrow shrub seems to have observed it during this period of

57 Vrobel 2006f. 58 Vrobel 2006d. 59 Vrobel 2005b. 246 Susie Vrobel time on at least two LODs: on both that of the plant as a whole and that of its individual blossoms. It was an observer with a fractal temporal interface, generated by nested Nows. A non-fractal observer would not come up with a name like yesterday-today-tomorrow shrub, as he would have looked at the individual successive colour change of the blossoms on one LOD only. In order to refer to the past, the present and the future in one Now, it takes a nested interface which provides for both simultaneity and succession. Although our Temporal Fractal Perspective is invisible, it is possible to deduce its existence from examples such as the the name of a shrub. Simultaneity is a dimension generated by nesting temporally compatible events, with every embedding performance adding a level of description. The term simultaneity assumes an interfacial cut between an observer and the rest of the world. Simultaneity between events may be seen as the result of successful integration efforts carried out by the observer’s brain. In general, fractal observers may be described as agents whose ability to contextualize determines the outcome of their measurements. This also affects our notions of the passage of time and entropy. I have suggested that the arrow of time we perceive (as the passage from the past to the present and into the future) cannot be explained by an increase in entropy, as the latter is an LOD-dependent measure, which depends on both the observer’s position, his available time window for carrying out a measurement and his interfacial cut. For an observer situated inside a universe which consists of alternatingly nested ice cubes and hot water bottles, entropy may increase or decrease, depending on his position. As this observer cannot determine whether the outermost layer is an ice cube or a hot water bottle, he cannot conclude whether entropy will eventually increase or decrease (assuming such an outer layer exists). All he could use as a measure of change are progress units, which would increase in any case, describing a private arrow of time which does not necessarily correspond to the notion of entropy.60 Against the background of my Theory of Fractal Time, the arrow of time runs from the inside towards the outside of a nesting cascade of Nows (with, in each case, the current Now hosting all previous Nows) rather than from the past via the present into the future. Condensation is not affected by the arrow of time, as it does not imply the notion of succession. It may be experienced by an observer if his internal differentiation matches that of incoming signals from the outside world on nested LODs. This matching of internal and external LODs must occur between scale-invariant nestings in order to render possible translation via a known scaling factor. In nature, scale-invarinace usually occurs in the shape of a statistical self-similarity, such as 1/f noise. Exact scale invariance is an artefact which must be generated both for the observer and his embedding context. For such exact scale- invariance on both the observer’s side of the interfacial cut and that of the structure of the embedding context, non-temporal cognition of temporal structures is conceivable. However, also statistical correlations such as 1/f noise, which abound in nature, both within our brains and bodies and in the outside world, generate scaling patterns.61 (It should be noted that the patterns we describe as being part of the outside world are really inter-subjectively established mental constructs. However, as the naive realism implied in a clear division between inside and outside matches our metaphors, I refer to these concepts for didactic purposes. But the reader should keep in mind that the outside world is a mental construct generated by the observer.) In a condensation scenario, time is "bent" with respect to the PSC.

60 Vrobel 1996, 2006b. 61 van Orden et al 2005. Fractal Time, Observer Perspectives and Levels of Description in Nature 247

This allows the observer to catch a glimpse beyond his immediate present: As a result of the structural congruence of his nested LODs, his Now is extended. This glimpse involves no duration, i.e., no extension in tlength - its only extension is in the dimension of tdepth. This reduction of time to simultaneity within the observer’s temporal interface renders possible a non-temporal access to cognition of temporal structures (for scale-invariant nested structures).62 The complexity an observer’s temporal interface thus displays is a relative measure which results from the interaction of that observer with his context. The structure of the interfacial cut may be measured in terms of interface complexity, which relates internal and external scale-invariances in terms of the number of simultaneous one-to-one mappings.63 The higher the number of simultaneous matchings on the respective internal and external LODs, the less complex the interfacial structure. The scalability of observation is based on pattern recognition of the shared process dynamics of the object, the interface and its context.

Participation: Temporal Embedding

An important change in perspective from classical science to relativity was the inclusion of the boundary in the shape of the event horizon. Quantum theory defined this boundary as an interface in which and through which the processes of a system are linked with the those of its context. As the linking of processes requires a matching of their metrics and dynamics, their internal temporal structures need to be made explicit. Van Nieuwenhuijze notes that the fractal perspective implies that we not only include the interface in our description of how an object is linked to its context, but we also regard the interface itself as being extended.64 Internal and external process dynamics are linked within the interface, whose extension may be described in terms of tlength, tdepth, and tdensity. Together, these define the dynamics of temporal distortions, i.e., dilation and contraction. The notion of fractal time allows us to render explicit the relation between an object and its context by zooming in on the boundary, the interface where they connect. An analysis of the temporal organization of the observer participant and his context reveals how the part relates to the whole. There is a direct relationship between the way we define an object and our degree of participation. The latter determines the position of the interfacial cut. Our selection of a specific LOD (or, in the fractal perspective, a number of LODs) determines the processing level(s) on which we distinguish an object from its context. This we tend to do on the basis of our own internal process dynamics, which determines the degree of observer involvement and observer attachment. Observation thus requires process coupling. Van Nieuwenhuijze’s approach also takes account of our primary experiences of time. His model conceives of reality as an interference pattern which results from the observer’s interaction with the context he is embedded in. This requires differentiating between nested LODs within the observer which correspond to possible ways of interacting with our context. He renders explicit the observer’s involvement and states that we need to include this factor as part of our formulations of observation. Every scientific formula thereby needs to contain a

62 Vrobel 1995. 63 Vrobel 2006c 64 van Nieuwenhuijze 1998, 2000, and personal communication. 248 Susie Vrobel component which specifies our degree of involvement. Our capacity of relating to embedding LODs determines the way we make sense of the world and may interact with our surroundings. Conversely, it determines the way we function.65 Therefore, the development of viable physical theories is determined by our capacity to generate a nesting cascade of successful contextualizations and to be aware of our degree of involvement. The need to distinguish between observer types in terms of their internal differentiation and the position of their interfacial cuts results from the degree of contextualization an embedded observer generates. In the next section, I shall describe the idea of an extended observer to illustrate the arbitrariness of the setting of the interfacial cut. Identifying misattributions is a non-trivial task, as individual or collective observer perspectives can only be judged from an exo-perspective, an idealized position and degree of participation. The notion of sheer simultaneity turns the observer into a mere participant, who, in a state of total immersion, loses the capacity to observe.

TNCs Revisited: Embodiment

Gestalts and metaphors we use in constructing of our models of reality are complex structures. However, we do not perceive them as such, as they are part of the invisible reference frame within which we perform logical inferences. In fact, they are the indivisible building blocks of our observer perspectives. As TNCs, primary experiences of time belong to this domain. Also our interpretations of background/foreground (or the trajector/landmark relation66) create abstractions and metaphors on which we base our models. These interpretations result directly from the limitations of our embodied cognition. Differing degrees of nestings in an observer perspective, which manifest themselves as perceptions of simultaneous contrasts or the failure to contextualize, also form TNCs.67 The observer types introduced in this review (fractal and non-fractal) are defined by "given" constraints, i.e., limitations we are not aware of.68 We will probably never be able to shed light on the preconditions which form our observer frames. However, we may realize how vulnerable our own observer perspectives are if we consider observers with unusual interfacial cuts, i.e, whose ability to distinguish what is part of them and what belongs to the rest of the world appears to be significantly distorted from that of the norm. In his Berkeley lecture, Metzinger described such extreme observer types.69 On the one hand, there are individuals who deny that part of their bodies belong to them. For example, a person may claim that one of his legs does not belong to him, although he can clearly see that it is attached to his body. This condition is well documented under the term "neglect"70. On the other hand, there are observers who assign parts to themselves of what most other individuals would describe as belonging to the outside world. They set an interfacial cut which leads them to believe that they are the whole world in the sense that all events are controlled by their volitional acts. Metzinger describes a patient who spends all day staring

65 van Nieuwenhuijze 2000. 66 Lakoff & Núñez 2000. 67 Dakin et al 2005, De Grave 2006, Tschacher 2006. 68 Vrobel 2006f. 69 Metzinger 2006. 70 Metzinger 2006. Fractal Time, Observer Perspectives and Levels of Description in Nature 249 out of the window, making the sun move across the sky. This may sound like an extreme case, but then many managers and politicians also delude themselves into being in control of their organizations or their economies. There are less extreme and more familiar examples of extended observers. Think of how men perceive their cars as physical extensions of their bodies: A man in a Ferrari acts differently from one in a VW Beetle. Observer extensions are all around us. They range from clothes, glasses, hearing aids, cyber-goggles, fierce dogs and guns to microscopes, telescopes, gravitational lenses and more complex selection effects such as social and linguistic conventions.71 The extended observer is an important concept, as our perception and description of reality significantly varies, depending on where we set the interfacial cut. The rubber hand illusion illustrates this convincingly72: A subject is sitting at a table, with his left arm resting on it. His hand is screened off from view and a rubber hand is placed in front of him. The subject is asked to stare at the rubber hand while both the rubber hand and his own hand which is resting on the table are stroked with paintbrushes simultaneously. The visual feedback links the visual and tactile perceptions (neurons that fire together wire together), so that after a while, the subject reported that he felt his own hand was being stroked, even if only the rubber hand was touched by the paintbrush: The intermodal matching sufficed for self-attribution. This conditioning effect has changed the subjects’ perspective by shifting the location of the interfacial cut between the observer and the rest of the world. In this context, Metzinger defines the concept of mineness: all representational states which are embedded into the currently active self-model gain the additional higher-order property of phenomenal mineness.73 Whether or not the observer’s perspective generated between mineness and non-mineness (or self and non-self) is based on a misattribution is not always easy to reveal. Some misattributions may become hardwired as a result of continuous positive feedback. This may result in rigid observer perspectives and which manifest themselves as educated incapacity, religious fanaticism, paranoid tendencies and scientific paradigms. The setting of the interfacial cut generates our observer frame and determines the way we interpret experiences. Simultaneity generates gestalts which may become hard-wired as the result of a conditioning effect. Sheer simultaneity creates a super-gestalt, with the observer incorporating the whole world. In this case, the observer perspective is lost and replaced by mere participation.74 These gestalts are the TNCs which shape our temporal observer perspectives within the framework of which we try to generate a consistent interface reality for ourselves and construct scientific theories. Although it is a start, the fractal observer perspectives described above are only partly subject to introspective analysis, e.g., in terms of observer embeddedness, the setting of the interfacial cut and the complexity of the observer’s Now. This is also true of the temporal perspectives of the scientists whose theories generate the prevailing paradigm. .

71 Vrobel 2006b. 72 Botvinick & Cohen 1998. 73 Metzinger 2003, 2006. 74 Vrobel 2007. 250 Susie Vrobel

Conclusion

The notion of fractal time described in this paper takes account of our primary experiences of time. An endophysical perspective such as the one implied in my Theory of Fractal Time takes account of the incompatible temporal extensions of succession and simultaneity and defines observer types in terms of their temporal fractal observer perspective. In order to account for our primary perceptions of time, the Now must be assumed to be extended and display a nested structure. The ontological implications of the concepts of time presented in this review differ with respect to the observer perspective and the attainability of endo- and exo-perspectives. A strong case can be made in favour of a fractal model of time, as it is a generalization which includes both the Newtonian continuum and relativity as special cases. The notions of time implied in these theories, as well as those assumed in quantum theory and Prigogine’s internal time T, are secondary constructs which are based on our embodied primary experience of time. My Theory of Fractal Time describes some of the invisible constraints which govern our observation.75 The observer position, the number of LODs taken into account and the setting of the interfacial cut all determine the outcome of a measurement (an observation). In general, it may be stated that any attempt to analyze a time series, in fact, any modelling of reality, must take account of constraints on our embodied cognition which come in the shape of LODs such as interfacial cuts, our internal differentiation and our degree of conscious and unconscious contextualization, which generate our temporal observer perspectives. Therefore, the notion of time introduced in my Theory of Fractal Time may be seen as a prerequisite for theories of fractal space-time.76 The structures of time and space are mental constructs.77 They emerge as the result of the circular causation generated by embodied interaction and cognition.78 Our theories are thus anthropocentric and are limited by constraints resulting from our cognition, which has evolved to optimize the survival of our bodies, not to catch a glimpse of the Platonic realm of ideas. As we have no access to the thing-in-itself, the noumenon, a phenomenological description of the world, based on the gestalts and metaphors our embodied cogniton has generated, is all we can build our models on: interface reality. The rest is faith and speculation. However, there is no need to despair. If we agree that our models of reality are limited by TNCs resulting from embodied cognition and that our inter-subjectively generated reality does not describe the noumenon, we may happily continue to indulge in naive realism. As Metzinger pointed out, naive realism is an extremely useful perspective for our phenomenal selves to deal with the world.79 After all, it is the result of a long selection process. And as long as it works, we may as well stick with it. We may never be able to access the

75 Vrobel 1999. 76 "More generally speaking one could say that a 'fractal time' (cf. [Vrobel 1995]) serves as an explanation of fractal space-time."76 77 Lakoff and Núñez 2000, Storch et al 2006, van Nieuwenhuijze 1998. 78 Storch et al 2006. 79 "... naive realism has been a functionally adequate assumption for us, as we only needed to represent the fact 'there’s a wolf there' (...) not 'there’s an active wolf representation in my brain now'." (Metzinger 2006) Fractal Time, Observer Perspectives and Levels of Description in Nature 251 preconditions of perception and processing, let alone access the noumenon. But as long as we are aware of the fact that interface reality is all we may talk about, we may as well enjoy the limitations of our condition humaine:

"I am plagued by doubts. What if everything is an illusion and nothing exists? In that case, I definitely overpaid for my carpet."80

References

Allen, W. (1986): Without Feathers. Sphere Books, London, p. 6. Barnsley, M. (1988): Fractals Everywhere. Academic Press, pp. 176ff. Bergson, H. (1909): Einführung in die Metaphysik. Diederichs, Jena. Bieri, P. (1972): Zeit und Zeiterfahrung. Suhrkamp, Frankfurt. Botvinick, M. & J. Cohen (1998): Rubber Hands 'Feel' Touch That Eyes See. In: Nature, Vol. 391, p. 756. Cramer, F. (1988): Chaos und Ordnung. DVA, Stuttgart. Dakin, S. et al (2005): Weak suppression of visual context in chronic schizophrenia. In: Current Biology, 15, pp. R822-R824. De Grave, D. (2006): The implosion of reality. Schizophrenia, the anterior cingulate cortex and anticipation. In: CASYS ’05 – Seventh International Conference on Computing Anticipatory Systems. Conference Proceedings, HEC-Lug, Liège, Belgium. Dubois, D. M. (1988): Introduction to Computing Anticipatory Systems. International Journal of Computing Anticipatory Systems 2. Edited by Daniel M. Dubois, pp. 3-14. Dubois, D. (2001): Incursive and Hyperincursive Systems, Fractal Machine and Anticipatory Logic. CASYS 2002. AIP Conference Proceedings 573, pp. 437-451, 2001. Einstein, A. (1988): Relativity – The Special and the General Theory. Methuen, London 1988 (this translation was first published in 1920 by Methuen & Co. Ltd). El Naschie, M.S. (1995): Young Double-Slit Experiment, Heisenberg Uncertainty Principle and Correlation in Cantorian Space-Time. In: Quantum Mechanics, Diffusion and Chaotic Fractals. Edited by Mohammed S. el Naschie, Otto E. Rössler & Ilya Prigogine. Pergamon, Elsevier Science, pp. 93-100. El Naschie, M.S. (2004): A Review of E-Infinity Theory and the Mass Spectrum of High Energy Particle Physics. In: Chaos, Solitons and Fractals, Vol.19, No.1. Elsevier Science 2004, pp. 209-236. Everett III, H. (1957): Relative State Formulation of Quantum Mechanics. In: Reviews of Modern Physics, Vol. 29, No.3, pp. 454-462. Fauvel, J. et al (2003): Music and Mathematics – From Pythagoras to Fractals. Oxford University Press, p. 27. Fidelman, U. (2002): Kant, the Two Stages of Visual Search, Quantum Mechanics and Antimatter. International Journal of Computing Anticipatory Systems, 13, pp. 274-289. Fidelman, U. (2004a): Cognitive and Neuropsychological Basis of Quantum Mechanics: Part I. Quantum Particles as Kantian Ideas. Kybernetes, 33 (8), pp. 1247-1257.

80 Woody Allen 1986. 252 Susie Vrobel

Fidelman, U. (2004b): Cognitive and Neuropsychological Basis of Quantum Mechanics. Part II: Quantum Mechanical Behaviour of Macroscopic Objcts. Kybernetes 33 (9, 10), pp. 1463-1471. Fidelman, U. (2004c): Cognitive and Neuropsychological Basis of Quantum Mechanics. Part III: Antimatter: Preattentional Macroscopic-Like Behaviour of Microscopic Particles. Kybernetes, 34 (5), pp. 694-703. Haken, H. (1995): Erfolgsgeheimnisse der Natur. Synergetik: Die Lehre vom Zusammenwirken. Rowohlt, Reinbek. Hofstadter, D.R. (1980); Gödel, Escher, Bach: An Eternal Golden Braid. Penguin, p. 686. Husserl, E. (1928/1980): Vorlesungen zur Phänomenologie des inneren Zeitbewußtseins. First published in 1928. Niemeyer. Lakoff, G. & R. E. Núñez (2000): Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being. Basic Books. Lorentz, E.N. (1963): Deterministic Nonperiodic Flow. In: Journal of the Atmospheric Sciences 20, pp. 130-141. Mandelbrot, B. B. (1982): The Fractal Geometry of Nature. W.H. Freeman, San Francisco. Metzinger, T. (2003): Being No One. MIT Press, Cambridge, MA. Metzinger, T. (2006): Being No One: Consciousness, the Phenomenal Self and the First- Person Perspective. Foerster Lectures on the Immortality of the Soul. The UC Berkeley Graduate Council. http://video.google.com/videoplay?docid=-3658963188758918426&q=neuroscience Newton, I. (1962): Mathematical Principles of Natural Philosophy, translated by A. Motte, revised and annotated by F. Cajori. University of California Press, Berkeley, Vol 1, p.6. (First published in 1687). Nottale, L. (2001): Scale Relativity, Fractal Space-Time and Morphogenesis of Structures. In: Sciences of the Interface. Edited by H.H. Diebner, T. Druckrey, P. Weibel. Genista, Tübingen, pp. 38-51. Nottale, L. (2002): Relativité, être et ne pas être. In: Penser les limites. Ecrits en l’honneur d’André Green. Delachiaux et Niestlé, Paris, p.157. Nottale, L. et al (2002): Développement Humain et Loi Log-Périodique (by Roland Cash, Jean Chaline, Laurent Nottale, Pierre Grou), In: C.R. Biologies 325. Académie des Sciences / Editions Scientifiques et Médicales Elsevier SAS, pp. 585-590. Olsen, L.F., H. Degn, A.V. Holden (1987). Chaos in Biological Systems. Plenum Press (NATO ASI Series). Ord, G.N. (1983): Fractal Space-Time: A Geometric Analogue of Relativistic Quantum Mechanics. In: Journal of Physics A.: Mathematical and General, Vol 16, The Institute of Physics, pp. 1869-1884. Pöppel, E. (1989): Erlebte Zeit und die Zeit Überhaupt: Ein Versuch der Integration. In: Die Zeit. Dauer und Augenblick. Edited by H. Gumin, H. Meier. Piper, München. Pöppel, E. (2000): Grenzen des Bewußtseins – Wie kommen wir zur Zeit, und wie entsteht die Wirklichkeit? Insel Taschenbuch, Frankfurt. Prigogine, I. & I. Stengers (1985): Order out of Chaos. Fontana, London. Prigogine, I. (1985).: Vom Sein zum Werden. Piper, München. Rosenblueth A. & N. Wiener (1945): The Role of Models in Science. In: Philosophy of Science 12, pp. 316-322. Fractal Time, Observer Perspectives and Levels of Description in Nature 253

Rössler O.E. (1995): Intra-Observer Chaos: Hidden Root of Quantum Mechanics? In: Quantum Mechanics, Diffusion and Chaotic Fractals. Edited by Mohammed S. el Naschie, Otto E. Rössler & Ilya Prigogine. Pergamon, Elsevier Science, pp. 105-112. Rössler, O.E. (1996): Relative-State Theory: Four New Aspects. In: Chaos, Solitons and Fractals, Vol. 7, No. 6, Elsevier Science, pp. 845-852. Rössler O.E. (1998): Endophysics. World Scientific, Singapore. Storch, M. et al (2006): Embodiment. Die Wechselwirkung von Körper und Psyche verstehen und nutzen. Huber, Bern. Timar, P. (2003): La transitionnalité de l’espace-temps psychique. In: Psychanalyse et Sciences de la Complexité. Société Psychanalytique de Paris, Science et Culture. http://www.spp.asso.fr/Main/PsychanalyseCulture/SciencesDeLaComplexite/Items/3.htm Timashev, S.F. (2006): Flicker Noise Spectroscopy and its Application: Information Hidden in Chaotic Signals (Review), in: Russian Journal of Electrochemistry, Vol. 42, No. 5, MAIK Nautka/Interperiodica, Russia, pp. 424-466. Tschacher, W. (2006): Reduced Perception of the Motion-Induced Blindness Illusion in Schizophrenia, in: Schizophrenia Research 81, Elsevier, pp. 261-267. Van Nieuwenhuijze, O. (1998): Integral Health Care. Dissertation draft, unpublished. Van Nieuwenhuijze, O. (2000):Option, Choices, Doubts and Decisions (Precisioning the Pivot Point of Power). In: CASYS ’00 – Seventh International Conference on Computing Anticipatory Systems. Conference Proceedings, HEC-Lug, Liège. Van Orden, G.C. et al (2005): Human Cognition and 1/f Scaling. In: Journal of Experimental Psychology: General 134, American Psychological Association, pp. 117-123. Varela, F.J. (1979): Principles of Biological Autonomy. North Holland. Vrobel, S. (1995): Fraktale Zeit. Draft dissertation, unpublished. Vrobel, S. (1996): Ice Cubes And Hot Water Bottles. In Fractals. An Interdisciplinary Journal on the Complex Geometry of Nature. Vol. 5 No. 1. World Scientific, Singapore, pp. 145-151. Vrobel, S. (1998): Fractal Time. Houston: The Institute for Advanced Interdisciplinary Research, Houston. Vrobel, S. (1999): Fractal Time and the Gift of Natural Constraints. Tempos in Science and Nature: Structures, Relations, Complexity. Annals of the New York Academy of Sciences, Volume 879, pp. 172-179. Vrobel, S. (2000): How to Make Nature Blush: On the Construction of a Fractal Temporal Interface. In: Stochastics and Chaotic Dynamics in the Lakes: STOCHAOS. Edited by D.S. Broomhead, E.A. Luchinskaya, P.V.E. McClintock and T. Mullin. New York: AIP (American Institute of Physics), pp. 557-561. Vrobel, S. (2004): Fractal Time and Nested Detectors. In Proceedings of the First IMA Conference on Fractal Geometry: Mathematical Techniques, Algorithms and Applications. DeMontfort University, Leicester, pp. 173-188. Vrobel, S. (2005a): Reality Generation. In: Complexity in the living – a problem-oriented approach. Edited by R. Benigni, A. Colosimo, A Giuliani, P. Sirabella, J.P. Zbilut. Rome: Rapporti Istisan, pp. 60-77. Vrobel, S. (2005b): The Nested Structure of the Now. Paper presented at the Ninth Annual Conference on the Consciousness and Experiential Psychology Section of the British Psychological Society: Reconstructing Consciousness, Mind and Being. University of Oxford. 254 Susie Vrobel

Vrobel, S. (2006a): A Nested Detector with a Fractal Temporal Interface. In CASYS ’05 – Seventh International Conference on Computing Anticipatory Systems. Conference Proceedings, HEC-Lug, Liège. Vrobel, S. (2006b): A Description of Entropy as a Level-Bound Quantity. In CASYS ’05 – Seventh International Conference on Computing Anticipatory Systems. Conference Proceedings, HEC-Lug, Liège. Vrobel, S. (2006c): Nesting Performances Generate Simultaneity: Towards a Definition of Interface Complexity. In: Cybernetics and Systems Vol 2 (Edited by Robert Trappel). Austrian Society for Cybernetic Studies, Vienna, pp. 375-380. Vrobel, S. (2006d): Fractal Time in Cognitive Processes. Paper presented at the 2nd International Nonlinear Science Conference, Heraklion. Vrobel, S. (2006e): Temporal Observer Perspectives. In: SCTPLS Newsletter, Vol. 14, No. 1 Society for Chaos Theory in Psychology & Life Sciences, October 2006. Vrobel, S. (2006f): Simultaneity and Contextualization: The Now’s Fractal Event Horizon. Talk held at 13th Herbstakademie: Cognition and Embodiment. Ascona, Switzerland. Vrobel, S. (forthcoming in 2007): Sheer Simultaneity: Fractal Time Condensation (invited paper): Talk held at Intersymp 2006, Baden-Baden. The International Institute for Advanced Studies in Systems Research and Cybernetics (IIAS) Conference Proceedings. Wheeler, J.A. (1994): Time Today. In: Physical Origins of Time Asymmetry. Edited by J.J. Halliwell, J. Pérez-Mercader, W.H. Zurek. Cambridge University Press, Cambridge.

INDEX

A B abstraction, 22, 49, 50, 85, 98, 99, 101, 102, 103, Bacillus subtilis, 154, 177 104, 109 bacteria, 59, 83, 123 action potential, 23 bacterium, 5, 81 activity level, 30 Banach spaces, 49 actuators, 108 banks, 34, 193 adaptation, 36, 42, 221 barriers, 35 adenine, 135, 141, 142 basal ganglia, 41, 42 age, 201, 233, 234, 235 base pair, 134, 135, 139, 143 aggregate demand, 194 behavior, 3, 4, 8, 21, 27, 28, 29, 31, 32, 35, 39, 40, aggressiveness, 88 41, 42, 48, 57, 58, 59, 60, 61, 62, 77, 79, 81, 83, algorithm, 127, 151, 160, 163, 165, 166, 167, 168, 85, 86, 87, 103, 104, 118, 119, 126, 129, 130, 144, 169, 173, 174, 175, 176, 179 152, 159, 163, 164, 213, 225, 226 alternatives, 113 bending, 244 altruism, 200 bias, 81, 169 altruistic behavior, 213 binding, 39, 45, 176 ambiguity, 8, 199 biological activity, 183 amino acids, 3 biological consequences, 183 amplitude, 79, 137 biological systems, vii, 153, 199 anger, 88 biotechnology, 152 annealing, 38, 40 birds, 192 anthropologists, 199 blocks, 37 applied mathematics, 152 blood, 33 arithmetic, 26, 47, 159 blood flow, 33 arrow of time, 246 body size, 45 artificial intelligence, 32, 109 bonding, 181 ASI, 252 bounds, 162, 174 assessment, 9, 174, 201, 209, 215 brain, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, assumptions, 50, 113, 196, 214, 231, 238 33, 34, 35, 36, 39, 40, 41, 42, 43, 44, 45, 46, 47, asymmetry, 60 48, 49, 50, 51, 52, 54, 55, 111, 227, 237, 238, 246, atomic force, 135 250 atomism, 11, 12, 13, 14 brain size, 45 atoms, 11, 194 brain stem, 35 auditory cortex, 45 branching, 116, 129 authors, 60, 156, 165, 188 breakdown, 194 automata, 24, 26, 32, 47, 51, 86, 87, 88, 89 Brownian motion, 59, 70, 123 autonomy, 59, 111, 129 building blocks, 248 avoidance, 32 business cycle, 215, 226 awareness, 235, 245 axiomatization, 50 axons, 23

256 Index

components, 28, 58, 69, 78, 154, 155, 156, 157, 158, C 193, 237 composition, 58 calculus, 5, 33, 55, 173 comprehension, 49 calibration, 36 compressibility, 68 carbon, 185 compression, 121 caricatures, 195 computation, 2, 21, 29, 31, 47, 49, 54, 153 catalysis, 58 computational modeling, 21, 22, 44 catastrophes, 127, 207 computational theory, 55 causality, 230 computer simulations, 204, 218 causation, 250 computerization, 22 cell, 23, 28, 29, 33, 36, 40, 41, 42, 43, 44, 45, 46, 58, computing, 21, 27, 40, 44, 45, 88, 131, 186, 191, 81, 152, 162, 165, 174, 178, 239 195, 204, 240 cell cycle, 178 concentration, 43, 45, 60, 156, 158, 166, 180 cell metabolism, 174 condensation, 238, 244, 246 central nervous system, 46 conditioning, 249 cerebellum, 36, 41, 42, 43 conductors, 148 cerebral cortex, 35, 41, 42, 45 configuration, 3, 4, 6, 83, 116, 139 channels, 23, 27, 28, 35 confrontation, 33, 51 chaos, 39, 40, 88, 108, 125, 128, 131, 230 confusion, 164, 214 chaotic behavior, 27, 39, 40 congruence, 244, 247 chemical kinetics, 28 conjecture, 206 chemical reactions, 3 connectivity, 42, 44 chimpanzee, 45 consciousness, 47, 232, 233, 237, 240, 241 chronobiology, 230 consensus, 124, 200 clarity, 62, 189 construction, 22, 113, 135 classes, 29, 170, 185 consumption, 123, 192 classical economics, 191, 194 contextualization, 229, 230, 231, 248, 250 classical mechanics, 87 continuity, 77 classification, 2, 31, 167, 170 contradiction, 2, 123 closure, 95, 96, 97, 128 control, 24, 26, 29, 32, 35, 39, 104, 105, 106, 107, clustering, 237 108, 109, 114, 116, 117, 120, 124, 131, 135, 167, clusters, 160, 167 170, 176, 249 CO2, 173 convergence, 38, 90, 169, 195, 206, 210 codes, 34 correlation, 34, 74, 75, 83, 84, 121, 122, 133, 193, coding, 28, 43, 46, 184, 185 240 coercion, 109 correlation function, 133 cognition, 5, 199, 200, 201, 229, 230, 231, 238, 246, correlations, 79, 246 247, 248, 250 cortex, 25, 28, 44, 45, 46, 54, 55, 251 cognitive activity, 42 costs, 209 cognitive capacities, 203, 206 couples, 62, 123, 148 cognitive function, 32, 39 coupling, 3, 4, 5, 8, 62, 63, 77, 124, 136, 181, 213, cognitive performance, 234 215, 221, 222, 223, 224, 225, 247 cognitive process, 5, 51, 113 covalent bond, 3, 135, 182 cohesiveness, 204 covalent bonding, 182 collaboration, 29, 124 credit, 192 common sense, 109, 111, 113, 124 critical state, 116, 129 communication, 47, 52, 55, 89, 92, 245 criticism, 3 community, 38, 50, 51, 52, 188, 200, 207 crystals, 243 compensation, 107 cues, 35, 209 competition, 30, 32, 33, 109, 191, 194, 195, 196, 213 cultural influence, 113 competitive markets, 195 culture, 84 competitor, 109 curiosity, 245 complement, 27, 77, 87 customers, 209 complex numbers, 164 cycling, 169 complexity, vii, 3, 8, 26, 28, 35, 57, 58, 59, 62, 72, cytosine, 135, 141, 142 73, 82, 83, 91, 92, 98, 101, 102, 103, 104, 113, 118, 119, 124, 130, 156, 181, 183, 191, 192, 194, 195, 200, 202, 203, 230, 247, 249

Index 257

displacement, 136 D disposition, 182 distortions, 34, 247 damping, 147 distributed computing, 48 dance, 24 distribution, 6, 36, 41, 44, 57, 61, 70, 76, 84, 85, 91, Darwinism, 124 118, 125, 171, 186, 188, 193, 203, 210, 218 data analysis, 23 distribution function, 210 data mining, 152 divergence, 69, 90, 128 data processing, 88 diversity, 23, 88, 201 database, 22 division, 58, 246 David Bohm, 14, 19 DNA, vii, 133, 134, 135, 137, 139, 140, 141, 143, death, 25 145, 146, 147, 148, 149, 167, 173, 175 debt, 193 dogs, 249 decay, 43 dominance, 44, 46, 54 decision makers, 214 dopamine, 41 decision making, 62, 109, 113, 116, 117, 120, 124, double helix, 134, 139, 148 214, 217, 222, 223 downsizing, 187 decision-making process, 233 draft, 253 decisions, 26, 62, 118, 119, 194, 201, 209, 215, 216, drug addict, 192 217 drug addiction, 192 decoding, 37 drug discovery, 177 decomposition, 31, 157, 169 drugs, 157, 188 definition, 4, 32, 34, 50, 58, 78, 88, 104, 122, 133, duplication, 46 161, 162, 182, 188, 229, 232, 237 duration, 229, 231, 232, 233, 235, 236, 237, 238, deformation, 60, 137 241, 244, 245, 247 degenerate, 91 dynamical systems, 39, 40, 119, 131, 153, 158, 173, degradation, 155, 167 174, 175, 178, 179, 213, 214, 215, 216, 220, 221, degradation rate, 167 225 denaturation, 147 density, 18, 57, 61, 62, 63, 65, 70, 71, 76, 77, 78, 79, 80, 81, 82, 84, 85, 87, 91, 100, 101, 105, 113, 114, E 116, 125, 126, 127, 130, 167, 231, 238, 241 deoxyribonucleic acid, 146, 147, 148 earnings, 193 dependent variable, 168 ears, 26, 35 depression, 43, 88 ecology, 131 depth perception, 30 e-commerce, 192 deregulation, 192 economic behaviour, 207 derivatives, 117, 121, 127, 128, 159, 166, 169, 170, economic efficiency, 207 171, 235, 238, 240 economic systems, 191, 192, 193, 194, 195, 196 destiny, 4 economics, 58, 130, 170, 191, 192, 193, 194, 195, determinism, 2 196, 197, 200, 207, 211, 212 deviation, 73 elasticity, 68, 135 differential equations, 23, 28, 30, 43, 57, 78, 94, 117, electrons, 12, 22, 134 152, 153, 157, 166, 168, 170, 173, 174, 177, 178, elementary particle, 12, 80 216 emission, 173 differentiation, 223, 229, 230, 232, 233, 235, 237, encoding, 35, 113 238, 239, 244, 246, 248, 250 energy, 4, 37, 38, 58, 71, 78, 126, 134, 136, 167, diffusion, 2, 28, 45, 52, 57, 60, 61, 64, 71, 72, 73, 76, 179, 182, 183, 185, 194 77, 78, 81, 91, 106, 126, 170, 171, 243 entanglements, 101 diffusion process, 71, 170 entropy, 37, 57, 60, 61, 70, 76, 81, 87, 118, 123, 124, dilation, 232, 247 126, 130, 182, 234, 246 dimensionality, 57, 59, 75, 95, 109, 117, 118, 124 environment, 3, 4, 5, 8, 30, 31, 35, 41, 58, 59, 80, discontinuity, 171 151, 152, 154, 158, 160, 167, 168, 173, 174, 182, discretization, 151, 154, 159, 163, 165, 169, 171, 191, 194, 214, 215, 216, 218, 221, 225, 227, 239 173, 174, 176 environmental change, 157, 162 discrimination, 81, 83 environmental effects, 151, 153, 157 disorder, 57, 61, 70, 78, 94, 123, 130, 187 environmental factors, 153, 157, 158, 159, 160, 161, disordered systems, 195 165 dispersion, 103 environmental influences, 165

258 Index environmental protection, 151, 167 formal language, 47 enzymes, 3 fractal dimension, 231, 238, 241, 243 epidemic, 207 fractal space, 229, 231, 238, 239, 250 episodic memory, 42 fractal structure, 239, 240, 241 epistemology, 11, 13, 24 fractality, 234, 239, 241, 245 equality, 66 freedom, 85, 87 equating, 166 freezing, 116 equilibrium, 2, 7, 27, 38, 39, 108, 110, 129, 136, frontal lobe, 84 139, 162, 163, 191, 192, 194, 195, 197, 206, 207 fusion, 46 estimating, 168 ethics, 207, 212 Euclidean space, 36, 185 G euphoria, 194 Euro, 197 game theory, 213, 214, 215, 216, 220, 225, 226 evaporation, 116 ganglion, 44 evolution, vii, 2, 3, 8, 46, 49, 51, 52, 57, 58, 59, 60, gene, 24, 152, 153, 154, 156, 157, 158, 159, 160, 61, 62, 63, 72, 73, 79, 81, 82, 84, 85, 90, 91, 100, 161, 167, 173, 174, 175, 176, 177, 178, 179, 180, 103, 105, 115, 123, 124, 130, 131, 193, 194, 199, 189, 229, 230 200, 203, 205, 211, 214, 216, 217, 218, 219, 224, gene expression, 154, 157, 175, 176, 177, 179, 180 225, 226 general knowledge, 88 execution, 209 generalization, 4, 73, 74, 85, 99, 101, 250 exercise, 34, 61, 123, 153 generation, 39, 204, 209, 218, 219, 229, 230, 231, experimental design, 165, 177 232, 233, 240 exploitation, 61 genes, 152, 153, 154, 158, 160, 161, 162, 164, 165, external environment, 7 166, 167, 173, 174, 205 external influences, 193 genetic code, 46 extinction, 194 genetics, 167 extraction, 25, 42 genome, 83, 148 glasses, 194, 197, 249 global markets, 194 F goals, 3, 35 google, 252 fabric, 240 gossip, 211 failure, 27, 32, 248 government, 191 faith, 191, 231, 250 graph, 153, 185 family, 47, 68, 79, 111, 114, 121, 155, 178, 194 gravitation, 33, 196 fanaticism, 249 gravity, 34 feedback, 26, 41, 61, 62, 63, 64, 67, 69, 70, 73, 77, grounding, 33 79, 80, 81, 83, 87, 109, 118, 124, 126, 127, 128, groups, 25, 197, 207 129, 130, 201, 249 growth, 46, 58, 64, 65, 103, 118, 128, 154, 155, 164, fibers, 41, 43, 44, 45 174, 217, 218, 220, 225 field theory, 195 guanine, 135, 141 finance, 179, 192 financial institutions, 191 financial markets, 192, 193, 196 H financial sector, 168, 170 fires, 85 Hamiltonian, 77, 133, 135, 136 fish, 39, 40, 227 happiness, 88 fitness, 204, 209, 218, 219, 221 Hausdorff dimension, 241 fixation, 13, 204 health, 152 flatness, 76 health care, 152 flexibility, 5, 6 height, 217 flood, 22 heredity, 58 fluctuations, vii, 70, 148, 170, 193 heterogeneity, 196 fluid, 68, 69, 117, 196 heterogeneous systems, 195 food, 123, 157 Hilbert space, 34, 49, 178 forebrain, 26 hippocampus, 28, 40, 41 forecasting, 152, 191 histogram, 28 forgetting, 183 holism, 234

Index 259 homeostasis, 58 inflation, 193 hospitality, 21 information processing, 98, 124, 174 host, 242, 243 inhibition, 155 House, 35, 53 inhomogeneity, 147 human activity, 47 initial state, 38, 109, 158, 160 human behavior, 109 insects, 59, 85 human brain, 33, 35, 43, 46, 47, 48, 84 insight, 26, 34, 40, 44, 45, 48, 163, 195, 201, 214 human condition, 26 instability, 2, 64, 66, 67, 72, 79, 128, 151, 153, 156, human experience, 109 163, 164, 165, 169, 195 human nature, 88 instruction, 41 human psychology, 47 instruments, 184, 189 humility, 189 insulators, 134 hunter-gatherers, 200 integration, 35, 75, 105, 151, 159, 237, 246 hurricanes, 191 intellect, 26 hybrid, 80, 153, 158, 165, 166, 171, 174, 176 intelligence, 1, 8, 26, 62, 104, 119, 120, 124, 131 hydrogen, 4, 152, 164, 165, 174, 181, 182 intelligent systems, 131 hydrogen bonds, 182 intentionality, 60 hydrophobicity, 1, 6, 182, 183, 184, 185, 186, 187, interactions, 2, 3, 4, 28, 31, 34, 35, 37, 42, 44, 46, 188 48, 49, 50, 52, 58, 82, 96, 97, 113, 118, 120, 124, hypersensitivity, 191 130, 135, 136, 139, 153, 155, 157, 158, 160, 165, hypothesis, 3, 8, 28, 32, 49, 124, 152 168, 173, 174, 177, 181, 182, 183, 185, 187, 188, hysteresis, 30, 37, 193 193, 195, 197, 200, 201, 202, 203, 209, 213, 216, 217, 225, 227, 250 interface, 182, 230, 231, 235, 241, 246, 247, 249, I 250, 251 interference, 57, 130, 232, 235, 239, 247 ideal, 2, 6, 134, 192 internal time, 230, 233, 234, 235 idealization, 68, 235 interrelations, 174 identification, 8, 152, 237 interval, 61, 64, 112, 113, 123, 129, 154, 159, 167, identity, 75, 204 231, 232, 233, 236, 242, 243, 244, 245 illumination, 41 intervention, 183 illusion, 249, 251 introspection, 235 IMA, 253 intuition, 50, 51, 109, 192, 196 image, 31, 80, 82, 85, 88, 92, 93, 95, 97, 101, 102, invariants, 25, 58, 83, 85, 87, 98, 100, 107, 109, 114, 104, 120, 200, 207, 208, 237 115, 123, 128, 131 images, 14, 61, 82, 85, 88, 94, 96, 97, 98, 104, 120, investors, 172 130 ions, 23, 27 imagination, 193 isolation, 39, 60, 193 imitation, 84, 104, 204, 210 iteration, 157, 169 immersion, 248 implementation, 33, 109, 218 impotence, 62 J impulsive, 158 in vitro, 4, 55 judgment, 204 in vivo, 4 justification, 129 incidence, 2 inclusion, 40, 101, 152, 153, 154, 158, 247 independence, 90 K independent variable, 168 indexing, 96 Keynes, 192, 195 indication, 46, 116 kinks, 136, 141, 142 indices, 95, 96, 98, 167 knots, 173 induction, 23, 161 inefficiency, 207 inequality, 61, 69, 76, 77, 91, 98, 102, 103, 104, 123, L 126 inferences, 26, 248 land, 34, 39 infinite, 2, 49, 65, 79, 191, 192, 195, 201, 234, 239, landscape, 4, 234, 194 243

260 Index language, vii, 24, 32, 47, 48, 49, 50, 51, 52, 88, 90, membranes, 23 91, 211, 237 memory, 8, 42, 49, 53, 83, 146, 237, 239, 241, 245 lattices, 50 memory formation, 245 laws, 5, 62, 189 men, 249 learning, 24, 26, 27, 30, 36, 40, 41, 42, 43, 46, 48, mental disorder, 239 50, 51, 54, 83, 84, 156, 157, 162, 166, 167, 170, mental image, 58 174, 214, 216, 226, 245 messages, 37, 90, 91 learning process, 162 metabolism, 62, 81, 123, 152 learning skills, 42 metabolome, 174 lending, 192 metaphor, 31, 195 lesions, 32, 48 microstructure, 196 LIFE, 57 midbrain, 25, 26 life sciences, 179 migration, 81, 149, 210 limitation, 11, 14, 59, 62, 83, 89, 90, 110, 113, 128 model system, 140 line, 2, 22, 29, 37, 60, 80, 81, 134, 140, 141, 142, modeling, 22, 33, 42, 43, 46, 47, 57, 111, 123, 130, 145, 153, 166, 208, 222, 224, 237, 243 134, 135, 139, 140, 151, 152, 153, 157, 160, 164, linear function, 106 166, 168, 170, 173, 174, 175, 177, 180 linear model, 154, 167, 168, 174, 179 modules, 54 linguistics, 90 modulus, 165 linkage, 24, 38, 39 molecules, 4, 37, 70, 134, 146, 149, 165, 182, 183 links, 37, 78, 104, 182, 185, 200, 240, 249 momentum, 80 living conditions, 152 money, 217 localization, 40 monomers, 182 locus, 107 moral hazard, 201 long distance, 39 morality, 207 long-term memory, 42 morphogenesis, 55 LTD, 43 morphology, 42 Lyapunov function, 163, 165 motion, 16, 19, 35, 57, 58, 60, 62, 63, 64, 65, 67, 68, 69, 70, 76, 77, 78, 79, 81, 85, 86, 87, 118, 123, 130, 134, 146, 171, 223 M motivation, 113, 168, 174, 227 motor behavior, 28 machinery, 49, 62 motor control, 46 macromolecules, 3, 23, 134, 182, 188 motor skills, 42 magnet, 37, 38 movement, 15, 16, 34, 35, 39, 81, 143, 235 magnetization, 38 mRNA, 154, 165 magnets, 37, 193 multidimensional, 127 manifolds, 27 multiples, 242 manipulation, 47, 152 multiplication, 124, 156, 159, 161, 164 mapping, 15, 16, 36, 156, 182, 184 muscles, 35, 36, 45 market, vii, 59, 172, 191, 194, 196, 207 mutagenesis, 174 markets, 191, 192, 193, 195 mutation, 183, 199, 211 Markov chain, 89 mutation rate, 211 marriage, 192 mythology, 2 marrow, 196 mathematical methods, 153 mathematics, 2, 5, 15, 21, 22, 24, 26, 27, 34, 38, 39, N 46, 47, 49, 50, 52, 91, 179, 182, 238, 242 matrix, 44, 52, 74, 75, 84, 93, 97, 106, 107, 122, 154, nanodevices, 134, 135 155, 156, 157, 158, 159, 160, 161, 162, 164, 165, nanotechnology, 134 166, 167, 169, 173, 185, 186, 196, 214, 215 NATO, 252 matrix algebra, 173 natural laws, 235 meanings, 2, 153 natural sciences, 192 measurement, 165, 166, 168, 173, 230, 233, 238, natural selection, 124, 199, 200 246, 250 negative emotions, 88 measures, 113 negativity, 76 mediation, 23 neglect, 248 melting, 116 nervous system, 25, 44 membership, 101

Index 261 network, 24, 27, 28, 29, 30, 33, 37, 38, 40, 41, 44, parameter estimation, 151, 152, 156, 166, 167, 168, 48, 49, 153, 155, 160, 164, 166, 167, 176, 178, 172, 173, 179 181, 185, 207, 215 parameters, 23, 30, 31, 43, 45, 46, 47, 51, 76, 104, neural connection, 46 107, 108, 117, 120, 121, 122, 130, 133, 141, 142, neural function, 22, 42 143, 147, 155, 156, 166, 167, 169, 170, 172, 173, neural networks, 21, 24, 25, 26, 27, 29, 32, 33, 37, 174, 194, 213, 217, 218, 221, 222, 230 38, 39, 40, 45, 47, 48, 53, 59, 83 partial differential equations, 170 neurons, 23, 24, 27, 28, 29, 34, 37, 38, 39, 40, 41, particle physics, 12, 239 42, 43, 44, 48, 49, 83, 84, 85, 87, 88, 99, 123, 213, particles, 2, 34, 59, 60, 67, 68, 73, 79, 92, 123, 233 249 partition, 166, 234 neurophysiology, 32 partnership, 24 neuropsychology, 238 passive, 111 neuroscience, 21, 22, 26, 32, 33, 42, 46, 84, 252 path integrals, 196 neutrons, 147 pattern recognition, 31, 38, 41, 42, 83, 247 Newtonian physics, 63 peers, 202 next generation, 12, 100, 209, 218 peptides, 185, 187, 188 nitrogen, 177 perceptions, 124, 236, 240, 248, 249, 250 NMR, 182 personal communication, 235, 238, 239, 247 Nobel Prize, 23, 25, 38 pessimism, 172, 193 nodes, 153, 160, 174, 185 PET, 33, 53 noise, 5, 37, 38, 40, 54, 93, 101, 107, 108, 157, 171, phase transitions, vii, 3, 38, 117, 118 240, 246 phenomenology, 58, 62, 240 nonlinear dynamics, 27, 39, 59, 133, 147, 148 philosophers, 11, 12 nonlinear systems, 27, 38 physical properties, 79, 116 normal distribution, 172 physics, 2, 11, 12, 13, 14, 19, 22, 33, 34, 37, 38, 58, novelty, 101, 124 61, 62, 80, 91, 116, 133, 145, 181, 186, 191, 192, nuclei, 46 194, 195, 196, 210, 214, 221, 225, 230, 232, 236, nucleic acid, 135, 181 237, 238, 239 nucleotides, 139 physiology, 186 nucleus, 12, 44 Planck constant, 69, 78 planning, 35, 47 plasticity, 27, 42 O Plato, 49, 50 plausibility, 196 obesity, 192 pleasure, 29 objectives, 61, 88 poison, 157 observable behavior, 62 polymer, 182 observations, 29, 167, 168, 192, 193, 195, 196, 238 polymers, 182 one dimension, 149 pools, 14 operator, 30, 79, 88, 89, 165, 169, 233, 235 poor, 12 optimism, 172, 193 population, 28, 46, 201, 203, 208, 209, 210, 211, 219 optimization, 38, 83, 107, 127, 151, 152, 154, 156, positive feedback, 249 157, 166, 167, 170, 172, 174, 178, 179, 180 positive reinforcement, 41 optimization method, 167, 170 positron, 22, 33 ordinary differential equations, 154, 173 positron emission tomography, 33 organ, 36, 239 potassium, 23 organic matter, 58 power, 13, 34, 52, 127, 191, 192 organic polymers, 181 predictability, 194 organism, 26, 58, 152, 154, 174, 227 prediction, 55, 62, 96, 109, 152, 156, 160, 167, 169, organizational behavior, 3 176, 182, 196, 239 orientation, 25 prejudice, 192 oscillation, 39, 169, 172, 178 pressure, 69, 191, 193 price changes, 193 probability, 5, 18, 29, 34, 37, 38, 57, 61, 62, 63, 64, P 65, 67, 68, 70, 71, 77, 78, 79, 80, 81, 82, 84, 85, 87, 88, 91, 93, 100, 101, 105, 113, 114, 116, 123, paradigm shift, 192, 196 125, 126, 127, 128, 129, 130, 204, 209, 210 parameter, 51, 71, 78, 114, 117, 151, 152, 156, 166, probability distribution, 29, 64, 70, 81, 85, 91, 210 167, 168, 169, 170, 172, 173, 174, 179, 221, 232 probe, 27, 43

262 Index production, 49, 58, 60, 152, 164, 165, 179 recollection, 42, 237 profit, 194, 213, 217, 218, 224 reconstruction, 123 profits, 194, 217, 218, 224, 225 recovery, 224 program, 2, 21, 25, 29, 47, 50, 159 recurrence, 6, 9, 186 programming, 29, 50, 51, 154, 174, 178 reductionism, 1, 2, 3, 59, 234 programming languages, 50, 51 reference frame, 245, 248 proliferation, 46, 192 reference system, 36, 232 promoter, 203 refining, 32 propagation, 22, 23, 34, 68, 125, 148 reflection, 32, 35, 49, 82, 94, 95, 96, 97, 245 proportionality, 160 region, 15, 25, 31, 32, 33, 36, 42, 46, 48, 67, 133, protein folding, 183, 184, 189, 194 143, 162, 187 protein sequence, 9, 181, 182, 184, 187, 189 regression, 156, 167, 168, 170, 176, 178, 179 proteins, 4, 5, 6, 7, 8, 134, 135, 154, 178, 181, 182, regulation, 152, 177, 195 183, 184, 185, 187, 188, 243 regulations, 195 proteome, 174, 178 reinforcement, 27, 41, 42 prototype, 38 reinforcement learning, 27, 41, 42 psychology, 62, 81, 82, 94, 109, 124 relationship, 5, 13, 27, 34, 35, 36, 37, 44, 80, 118, psychophysics, 30 230, 231, 247 punishment, 199, 206 relativity, 34, 230, 232, 239, 244, 247, 250 pyramidal cells, 28 relaxation, 60, 83 relevance, 30, 40, 188, 189 reliability, 37 Q reproduction, 58 reputation, 199, 200, 201, 202, 203, 204, 205, 206, quadratic programming, 167, 170 209, 211 quantum chemistry, 137 residues, 134, 182, 183, 184, 185, 186, 187, 188 quantum entanglement, 66, 79, 84, 87, 132 resolution, 169, 238 quantum mechanics, 13, 19, 69, 70, 78, 79, 87, 88, resources, 44, 191, 199, 213, 230 130, 233 retaliation, 202 quantum state, 67, 79 retention, 240, 245 quantum theory, 12, 13, 14, 19, 230, 250 retina, 25, 29, 35, 36, 41, 44, 45 quarks, 12 rhythm, 39, 40, 129 risk, 13, 43, 167 risk management, 167 R RNA, 147 robustness, 167, 209 radiation, 162 rolling, 68 radius, 139, 185, 236 rotations, 137 random numbers, 172, 210 rubber, 249 random walk, 81, 129, 196 range, 3, 22, 38, 46, 51, 52, 134, 135, 140, 164, 208, 224, 225, 249 S rational expectations, 191 rationality, 113 saturation, 41 real time, 82, 108, 192, 241, 245 scaling, 75, 240, 244, 246 realism, 246, 250 scarce resources, 194 reality, 2, 34, 98, 108, 191, 196, 215, 229, 230, 231, schema, 31, 32, 33, 35 232, 233, 234, 235, 238, 239, 240, 247, 248, 249, schizophrenia, 251 250, 251 Schrödinger equation, 78 reason, 4, 28, 41, 162, 165, 169, 171, 174, 183, 193, scientific method, 2 195 scientific theory, 232 reasoning, 84, 131 scores, 191 recall, 15, 69, 82, 85, 172 segregation, 46 recalling, 88 selecting, 113, 233 reception, 37 self-awareness, 62, 81, 82, 83, 108, 123 receptive field, 45 self-image, 62, 82, 86, 95, 108, 109, 123 receptors, 24, 39 self-organization, 88, 131 reciprocity, 199, 200, 201, 202, 203, 204, 205, 209 self-similarity, 241, 243, 246 recognition, 38, 42 semantics, 47, 50, 51, 52, 90, 93

Index 263 semiconductor, 134, 226 state of shock, 191 semiconductor lasers, 226 statistical inference, 24, 27 semigroup, 164 statistics, 28, 37, 39, 41, 85, 87, 127, 153 sensations, 34 stigmatized, 194 senses, 39, 58, 236 stimulus, 23, 25, 40, 41, 111, 134, 236 sensing, 30 stochastic processes, 78, 171 sensitivity, 2, 113, 176, 194 stock, 52, 59, 98, 170, 171, 172, 194 sensory perceptions, 236 stock markets, 194 sensory systems, 25 stock price, 170 separation, 110, 167, 172 storage, 53, 55 shape, 155, 181, 235, 246, 247, 249, 250 strategies, 9, 42, 194, 201, 203, 204, 205, 208, 209, shares, 36, 203 210, 211, 213, 214, 218, 223, 224, 225, 227 sharing, 186 strength, 4, 70, 91, 93, 101, 102, 108, 126, 193, 213, shock waves, 67, 125 221, 222, 223, 224 signals, 35, 44, 236, 241, 245, 246 stress, 22, 31, 35, 37, 69, 207 signs, 115, 119 stretching, 51, 237 silver, 54 string theory, 34 simulation, 22, 83, 84, 111, 177, 210, 217, 218, 219, students, 192 221, 225 subjective experience, 232, 236 skills, 36, 42, 189 subjectivity, 230 smoothing, 168 subnetworks, 152, 160, 165, 174 social behavior, 32 subtraction, 152 social dilemma, 199, 213, 214, 216, 217, 225, 226 sugar, 136 social distance, 82 sulphur, 5 social events, 88 supply, 192 social group, 4, 200 suppression, 107, 108, 251 social norms, 199, 200, 201, 206 survivability, 83, 85, 87, 91, 92, 94, 98, 101, 102, social phenomena, 98, 109 103, 104, 109, 120, 124 social sciences, 195 survival, 40, 98, 104, 226, 250 social security, 207 survival value, 40 social situations, 215 symbiosis, 124 social structure, 88, 200 symbols, 13, 37, 47, 91, 168 social theory, 88 symmetry, 66, 67, 135, 202, 206, 208 sodium, 23 synapse, 41, 43 solar system, 59, 98 synaptic plasticity, 25, 27, 28, 40, 41, 42, 44 solitons, 125, 133, 138, 146, 147, 148 synaptic strength, 24 solvation, 182, 183 synaptic transmission, 53 space, 6, 28, 33, 34, 35, 36, 37, 49, 50, 63, 67, 71, synchronization, 39, 214, 216, 221, 223, 225 77, 81, 82, 83, 85, 93, 94, 117, 124, 125, 153, 154, syndrome, 42 159, 162, 166, 182, 184, 185, 186, 188, 196, 207, synergetics, 230 208, 217, 224, 225, 230, 232, 234, 237, 250 synthesis, 58, 83, 155, 178, 180 space-time, 3, 34, 231, 233, 239, 250 spatial location, 185 special relativity, 34 T species, 33, 46, 58, 60, 62, 83, 109, 194, 200, 218, 219, 221 targets, 169, 170 spectroscopy, 135, 240 temperature, 5, 37, 38, 210 spectrum, 137, 188 thalamus, 25, 44 speculation, 250 Theory of Everything, 9 speed, 68, 125, 232, 243 thermal energy, 81 speed of light, 232 thermodynamics, 57, 58, 60, 61, 72, 73, 81, 82, 91, spin, 38, 196 93, 118, 123 spinal cord, 26, 35, 39, 55 thoughts, 52, 82 stability, 14, 53, 83, 128, 130, 151, 152, 153, 154, thresholds, 22, 24, 38, 113, 153, 158, 165, 166, 167, 156, 159, 160, 161, 162, 163, 164, 165, 169, 170, 169, 174, 193, 211, 236 173, 174, 176, 201 thymine, 135, 141, 142 stabilization, 128 time periods, 172 standard deviation, 69 time series, 121, 123, 184, 185, 238, 239, 240, 241, stapes, 14 245, 250

264 Index topological invariants, 118 topological structures, 185 V topology, vii, 49, 57, 78, 83, 114, 120, 186, 233, 235 torsion, 135, 147 vapor, 193 tracks, 39 variability, 28, 157 trade, 245 variables, 57, 61, 81, 82, 83, 85, 87, 88, 90, 94, 95, trade-off, 156, 169 96, 99, 100, 101, 105, 107, 109, 110, 113, 114, tradition, 51 115, 117, 118, 119, 121, 128, 154, 155, 167, 168, training, 40, 41, 43, 44, 48 170, 172, 194, 214, 240 trajectory, 2, 35, 39, 76, 80, 114, 128, 158, 233 variance, 66, 67, 71, 72, 87, 91, 92, 103, 108, 129, transactions, 49 156, 169 transcription, 147, 176, 177 vector, 2, 36, 83, 152, 154, 155, 156, 157, 158, 162, transformation, 25, 29, 35, 36, 48, 116, 233, 234 165, 166, 168, 169, 170, 171, 172, 173, 217, 233 transformations, 25, 35, 75, 83, 106, 233, 239 velocity, 61, 62, 64, 68, 69, 70, 71, 80, 117, 125, transition, 6, 33, 37, 38, 49, 84, 103, 105, 110, 111, 130, 141, 142, 143, 144, 193, 244 112, 116, 117, 119, 148, 149, 176, 194, 210, 224, vertebrates, 46 225, 233 viruses, 83 transition period, 110, 119 viscosity, 117, 125, 126 transitions, 88, 116 vision, vii, 24, 31, 32, 46, 49, 54, 55, 188, 195 translation, 181, 236, 237, 238, 241, 242, 243, 246, visual field, 44 251 visual processing, 39 transmission, 27, 37, 88, 204 vocabulary, 24, 32, 34 transport, 134, 148, 149 volatility, 193, 196 traveling waves, 60 trees, 113, 217, 218, 220 W tribes, 202, 203, 204, 205, 209, 210, 211 triggers, 32 war, 48, 204, 227 tumorigenesis, 177 wave propagation, 147 tunneling, 134, 149 wealth, 49, 160, 193, 196, 240 turbulence, 128 wealth distribution, 196 turbulent flows, 193 web, vii, 189 welfare, 211 U wilderness, 195 wood, 14 uncertainty, 1, 2, 3, 4, 5, 6, 8, 69, 80, 174 uniform, 45, 57, 169 universe, 1, 12, 14, 50, 240, 246 updating, 6, 26