<<

Here we put on record contributions originally submitted for a workshop " Concepts in Development and Evolution II", held in the fall of 1996. In the mean time, the main body of the workshop material has undergone changes and matured into a volume due to appear under a similar title at Cambridge University Press in the coming year. In this preprint we include commentaries on the original contributions made during the workshop or thereafter which will not be part of the published volume.

Peter J. Beurton Raphael Falk Hans-Jörg Rheinberger GENE CONCEPTS IN DEVELOPMENT AND EVOLUTION

GENE CONCEPTS: FRAGMENTS FROM THE PERSPECTIVE OF MOLECULAR Hans-Jörg Rheinberger ...... 3

Comments by Michel Morange ...... 20

THE DEVELOPMENTAL GENE CONCEPT: HISTORY AND LIMITS Michel Morange ...... 25

Comments by Scott F. Gilbert ...... 43 DECODING THE GENETIC PROGRAM Evelyn Fox Keller ...... 49

Comments by Thomas Fogle ...... 63

THE DISSOLUTION OF CODING IN Thomas Fogle ...... 69

Comments by Fred Gifford ...... 88

A UNIFIED VIEW OF THE GENE, OR HOW TO OVERCOME REDUCTIONISM Peter John Beurton ...... 97

Comments by James R. Griesemer ...... 119

Comments by Thomas Fogle ...... 122

Addresses ...... 127

GENE CONCEPTS: FRAGMENTS FROM THE PERSPECTIVE OF MOLECULAR BIOLOGY

Hans-Jörg Rheinberger

"[It is] the vague, the unknown that moves the world" (Bernard 1954, p. 26).

ABSTRACT

The paper is divided in three parts. In the first part, I argue for an epistemology of the imprecise and try to characterize the historical and disciplinary trajectory of gene representations as the trajectory of an exemplar of a boundary object. In the second part, I follow the apparently sim- ple solution of the gene problem to which early molecular biology gave rise, and I retrace some of the steps and events through which the later development of molecular biology came to ex- plode this simple notion. The last part derives some conjectures from this story and seizes upon the notion of "integron" developed by François Jacob in order to establish a symmetrical per- spective on genomes and phenomes.

1. INTRODUCTION

In what follows I intend to cast some light on the changing epistemic and experimental dispo- sitions through which molecular biology came to deal with genes. The paper is meant neither as a systematic assessment nor as a critique of the way molecular biology has appropriated this concept. Nor will I be able to retrace the history of the gene as an object of experimentation in molecular biology in all its complexity. My concern in this overview is to point out, in a loose and associative fashion, some questions that I think will have to be addressed if we wish to un- derstand where the second half of the twentieth century has taken us with respect to that unit that Herman J. Muller, on the occasion of the fiftieth anniversary of the rediscovery of 's pea work, described with the following words:

The real core of gene theory still appears to lie in the deep unknown. That is, we have as yet no actual knowledge of the mechanism underlying that unique property which makes a gene a gene - its ability to cause the synthesis of another structure like itself, in which even the of the original gene are copied. ... We do not know of such things yet in chemistry (Muller 1951, pp. 95-96). These remarkable sentences were written in 1950. What has molecular biology taught us since then about those "unique properties which make a gene a gene"? No geneticist would repeat Muller's chemical ignoramus today. Yet we will see that instead of solving the riddle of the gene Hans-Jörg Rheinberger and rescuing it once for ever from the "deep unknown," molecular biology has managed to re- define its properties and its boundaries, and it has continued to change our conception of this strange entity repeatedly and almost beyond recognition.

2. EPISTEMOLOGY: FLUCTUATING OBJECTS AND IMPRECISE CONCEPTS

If there are concepts endowed with organizing power in a research field, they are embedded in experimental operations. The practices in which the sciences are grounded engender epistemic objects, epistemic things as I call them, as targets of research. Despite their vagueness, these entities move the world of science. As a rule, disciplines become organized around one or a few of these "boundary objects" that underlie the conceptual translations between different domains (Star and Griesemer 1988). For a long time, in , such an object has been the atom; in chemistry, the molecule; in classical , it became the gene. It is the historically changing set of epistemic practices that gives contours to these objects. According to received accounts, which I need not question here in depth, the boundary object of classical genetics has worked as a formal unit: That which, in an ever more sophisticated context of breeding experiments, accounts for the appearance or disappearance of certain characters that can be traced through subsequent generations. Accordingly, what has made classical genetics different from nine- teenth-century inquiries in heredity, is that its practice allowed to combine the notion of char- acter discreteness, rooted in the Darwinian and early De Vriesian traditions, with Weismann's distinction between germ plasm and body substance. The result, read back into Mendel's exper- iments, was a deliberate distinction between genetic units and unit characters; taken in their en- tirety, between genotype and , respectively.

This is a reminder, and trivial to that extent. Let me give now an equivalent caricature of what molecular genetics has contributed to the field. At the beginning, molecular genetics, with its set of biochemical practices and genetic manipulations, was characterized by switching from higher plants and animals to and phages as model organisms. First, it transformed its boundary object, the gene, into a material, physico-chemical entity. Second, it has made of this object a unit endowed with informational qualities. The first transformation provided a solution to the problem that classical genetics had with the stability of its units. The answer was: Genes consist of metastable macromolecules of the sort of nucleic acids. The second transformation provided a solution to the problem that classical genetics had with its units' mode of reproduc- tion, and the connection between genotype and phenotype. The answer was: se- quences, and DNA in particular, can be replicated specifically and faithfully by virtue of the stereochemical properties of their building blocks. In addition, DNA stretches specify traits by

4 Gene Concepts: Fragments from the Perspective of Molecular Biology virtue of the ordered sequence of they contain for being translated, with the help of a complex cytoplasmic machinery, into corresponding sequences of amino acids that yield structural or enzymes catalyzing all sorts of metabolic reactions.

What I would like to stress against this excessively schematic and simple-looking outline so suggestive in its clarity and distinctness after the event, is that the fruitfulness of boundary ob- jects in research does not depend on whether they can be given a precise and codified meaning from the outset. Stated otherwise, it is not necessary, indeed it can be rather counterproductive, to try to sharpen the conceptual boundaries of vaguely bounded research objects while in oper- ation. As long as the objects of research are in flux, the corresponding concepts must remain in flux, too (Elkana 1970). Boundary objects require boundary concepts. The fruitfulness of such concepts depends on their operational potential. "All definitions of the gene require operational criteria" (Portin 1993, p. 174). It is these criteria that make them work as definitions, if at all.

The spectacular rise of molecular biology has come about without a comprehensive, exact, and rigid definition of what a gene is. As I will trace in the historical section of this essay, this claim can be substantiated for both aspects distinguishing the gene concept of molecular biology from that of classical genetics: The aspect of representing a material entity, and that of being a carrier of information (Sarkar 1996). The meaning of both of these notions has remained fuzzy and tied to the experimental spaces that the new biology was going to explore, from the identification of DNA as the hereditary material in bacteria in 1944 to the genome sequencing projects of the late 1980s. I am even inclined to postulate that attempts at precise definitions generally have worked as epistemological obstacles (Bachelard 1957), as theoretical artifacts at best, such as the early efforts to provide a purely quantitative definition of biological information in the sense of information-theory (Kay, forthcoming). Just to give one more example: On the basis of his mapping experiments, tried to clarify the messy field in 1955 and dissolve the gene into three units - of hereditary expression or function, of recombination, and of . He called these units "cistron," "recon," and "muton," respectively (Benzer 1955; Holmes, this volume). No doubt, the distinctions were theoretically justified. They relied on the most advanced experiments in phage genetics. A cistron was analyzable into recons, a recon into mutons, presumably ending with the simplest possible unit, a DNA base pair. But despite their clarity, for the variegated community of molecular experimentalists, these distinc- tions proved to be too restrictive, partially redundant, and prone to conflation. They did not catch on in the long run.

I contend, in contrast to other authors of this volume, that is not the task of the epistemologist to either critique or try to specify vague concepts in the hope of helping scientists to clarify their convoluted minds and to do better science with them. There is an urgent need, however, to un-

5 Hans-Jörg Rheinberger derstand how and why fuzzy concepts work in science (Moles 1995). Instead of trying to codify precision of meaning, we need an epistemology of the vague and the exuberant. Boundary ob- jects and boundary concepts operate on and derive their power from a peculiar epistemic ten- sion: To be tools of research, they must reach out into the realm of what we do not yet know. I would like to characterize this tension as "contained excess." François Jacob, in a similar con- text, has spoken of a "game of the possible" (Jacob 1981). If we look at contemporary textbooks written by researchers from the forefront of molecular biology, we find very sloppy definitions of the "gene" - if we find definitions at all. Quite obviously, molecular genetics is not held to- gether by such a definition. Let us take this as a lesson about the dynamics of science instead of accusing the actors to be careless in defining the entities they work with. It is revealing and intriguing indeed that, e.g., in the glossary appended to a recent book on the Secrets of the Gene written by the French molecular biologist and former director of the Pasteur Institute, François Gros, there is not even an entry for the term "gene" (Gros 1991). There is an entry, however, for the term "genome," to which I will come back. This is by no means a new feature of recent literature, however. In the glossary to Leslie Dunn's 1965 classic A Short we find, at the end of the entry "gene," the following caveat: "At present, discussions of prop- erties to be explained are more useful than attempts at rigid definition" (Dunn 1991, p. 234).

Do molecular biologists need a unified and generalized gene concept? As mentioned, if we screen the pertinent literature, there appears to be no singular, unique, and rigidly determined usage of the term. What we find, is context-dependence (Fogle, this volume). Despite its unify- ing appeal, molecular biology is made up of many different contexts, in terms of disciplinary contributions, of experimental systems, and of genome reading conditions. Boundary objects and related concepts work because they are malleable and can be adapted according to the vary- ing needs in these different contexts. But in order to grant a certain coherence to the field, these translations must remain reversible. We may say that two strategies can be found at work here. One consists in trying to give boundary objects sharper contours according to specific experi- mental contexts. Paradoxically, but as a rule, this leads to the elimination from discourse of the objects so defined. To give an example: The attempt to use ribosomal RNA as a template for bacterial protein synthesis led Marshall Nirenberg and Heinrich Matthaei to the characteriza- tion of a non-ribosomal messenger RNA, and consequently, a non-template ribosomal RNA (Rheinberger 1997, chapter 13). In lucky cases such as this one, such shifts in the "reference potential" of scientific expressions (Kitcher 1982) go along with the emergence of unprecedent- ed new objects that are at least as vague as those from which the search began. The other strat- egy consists in immunizing the conceptual framework against the fluctuations that result from the first strategy. In the case mentioned above, microsomal templates remained an option for

6 Gene Concepts: Fragments from the Perspective of Molecular Biology all those whose work was based on eukaryotic cells. Maintaining such conceptual indetermina- cy over a period of time, however, requires a certain level of imprecision, and thus of fuzzy boundaries.

Molecular biology is a hybrid science combining experimental systems form biophysics, bio- chemistry, and genetics, and it uses widely different model organisms in its search for biologi- cal function at the molecular level. Not surprisingly, it presents itself as conceptually hybrid as well, which is not to say that it has no consistency. Precisely as the result of its translational power, the discourse of molecular biology pervades contemporary biology as a whole, includ- ing evolution. The message I would like to disseminate is that we have to learn more about such hybrid consistencies: How they come about, how they work, and how they evolve.

Let me give a few examples of fragmented perspectives on genes, both enabled and constrained by experimental systems that are situated in these different domains. For a biophysicist working with a crystalline DNA fiber and an x-ray apparatus, a gene might be sufficiently characterized by a particular conformation of a double helix. If asked, he or she might define a gene in terms of the atomic coordinates of the nucleic acid bases. For a biochemist working with isolated DNA fragments in the test tube, genes might be sufficiently defined as nucleotide polymers ex- hibiting certain stereochemical features and recurrent sequence patterns. The biochemist can reasonably try to give a macromolecular definition of the gene, based on the unique chemical features of DNA. For a molecular geneticist, genes might be defined as informational elements of that eventually give rise to specific functional or structural products: Transfer RNAs, ribosomal RNAs, enzymes, and proteins destined to serve other purposes. Molecular ge- neticists certainly will insist on considering these issues in terms of replication, transcription, and translation and will require examination of the products of hereditary units when speaking of genes. For those interested in evolution, genes might be the products of mutated, reshuffled, duplicated, transposed, and rearranged bits of DNA within a complex chromosomal environ- ment that has evolved through differential reproduction, selection, or other evolutionary mech- anisms (Beurton, this volume). Therefore, evolutionists will rely on concepts such as transmission, lineage, and historical contingency. For developmental geneticists, genes might be sufficiently described, on the one hand, as hierarchically ordered switches that, when turned on or off, induce differentiation, and on the other hand, as patches of instructions that are real- ized in synchrony through the action of these switches (Gilbert and Morange, this volume). Thus, developmental biologists are likely to refer to the regulatory aspects of genetic circuitry when defining a gene or a larger transcriptional unit such as an . We could go on and add more items to the list.

7 Hans-Jörg Rheinberger

Is it necessary or even desirable to have a unified concept of the gene in order to tie all these disciplinary specializations together and to develop them in a coordinated fashion? From a his- torical perspective, it can be stated that, obviously, this has not been the case in the half century since molecular biology came into existence. Quite frankly, I do not think that it would have helped the development of the field in appreciable ways; further, I contend that an attempt to do so today would produce nothing more than an exercise in rhetoric. The coherence of molec- ular biology is not tied into an axiomatic structure or an algorithm; it is embedded in a complex set of experimenal systems, each with its generic epistemic practices, that have evolved over time and that have constrained earlier interpretations as well as allowed new ambiguities to arise. Genes as we now know them are boundary objects par excellence that are crafted, more than by any theory, by the practices and instruments that helped to create the new biology.

1 3. HISTORY: EARLY RIPPLES AND RECENT WAVES

As mentioned at the beginning, the development of gene concepts looks rather straightforward from a sufficiently distant historical vantage point from which surprising convergences can be observed post hoc. It can, however, be arguably questioned whether we should reconstruct the history of genetics as a history of gene concepts. My contention is that this would amount to a fallacious epistemological artifact, if such reconstruction were not tied to a history of the usages of these concepts. The following staccato account cannot do full justice to this claim. It inevi- tably cuts short laborious and errant experimental explorations that proceeded from widely dif- ferent starting points and sometimes were not even motivated by genetic questions at the outset (for a more detailed account see Morange 1994).

At the beginnings of molecular genetics as distinguished from classical genetics stands the search for a materially grounded, molecular constitution of the units of heredity. In the late 1920s, H. J. Muller as well as Lewis Stadler had shown that X-rays can be used to mutate genes (Muller 1927, Stadler 1928). But for the majority of the classical geneticists involved in extend- ed crossing and breeding experiments, at least in the first three decades of this century, the ma- terial constitution of the gene was a problem that needed no answer within the realm of their own experimental regimes. For the Max Delbrück who turned to biology in the 1930s, however, this became the central target to be addressed. As far as his reasoning went, genes could be assumed to be autocatalytic proteins that had the capacity to become permanently al- tered through physical intervention. After some early attempts at determining the gene as an el- ementary physical unit (Timofeeff-Ressovsky, Zimmer, and Delbrück 1935), he set out to

1 This heading has been inspired by the title of a review by on "Protein synthesis -- early waves and recent ripples" (Zamecnik 1976).

8 Gene Concepts: Fragments from the Perspective of Molecular Biology investigate phages which he imagined as being the smallest naturally occurring equivalents of genes (Delbrück 1942). It is one of the many ironies of the history of molecular biology that phage as an experimental system became as formal as the systems of classical genetics. What physics contributed was a set of rigorous measurment techniques. Only after a long detour and in conjunction with other developments did phage research finally come to contribute to the physical elucidation of the gene.

A milestone in the transition from classical genetics to molecular genetics was reached with and 's one gene-one enzyme hypothesis. It emerged from the hy- brid discipline of biochemical genetics based on the crassa as a model organ- ism (Beadle 1945). The one gene-one enzyme hypothesis is quite different from the one gene- one character relation of early classical genetics (Schwarz, this volume). As a rule enzymes in- tervene in the metabolic production of characters, or phenes. One particular character can de- pend on a whole cascade of enzymatic reactions, something envisaged by Alfred Kühn as "gene action chains" acting on "substrate chains" (Kühn 1941). One mutable entity, in turn, can affect many characters if the respective enzyme (or gene product) acts at the basis of metabolic bifur- cations thus explaining the long-observed phenomenon of . Still, neither Beadle and Tatum nor Kühn, in approaching genes by tracing their biochemical products, came anywhere near to throwing light on the physical agency of the genes they were investigating.

The new genetics had to be confronted with Oswald Avery's transformation experiments with different types and strains of Pneumococcus (Avery, MacLeod, and McCarty 1944) as well as with chemist 's analysis of the specific base composition of deoxyribonucleic acid toward the end of the 1940s before genes started to be addressed as made of DNA instead of representing autocatalytic proteins (Vischer, Zamenhoff, and Chargaff 1949). The one gene- one enzyme hypothesis now took the form of one segment of chromosomal DNA-one proteinic enzyme.

At this point, determining the molecular structure of DNA became a desirable, if not urgent task. I have to cut short here the long biochemical and physical story of a molecule known to be a constituent of the nucleus since the last decades of the nineteenth century (Portugal and Cohen 1977). Until the end of the 1940s, however, this story had been hardly if at all connected to the discourse of genetics. It belonged to structural chemistry instead and was part of textile fiber research. When and , and presented their evidence for a double helix model of DNA in 1953, a straightforward mode of replication for a specified class of biological macromolecules could be envisaged. With that,

9 Hans-Jörg Rheinberger the riddle of autocatalysis, that is, the replication of genes, could be conceived in terms of base pairing via hydrogen bonds, a stereochemical feature that became known as base complemen- tarity (Olby 1994).

Heterocatalysis on the other hand, that is, the the mechanism by which genes give rise to their products, would occupy molecular biologists, geneticists, and especially biochemists for the next twenty years to come. Another strand of inquiry until then completely disconnected from genetics, the biochemical analysis of how proteins are fabricated in the cell, began to be woven into the texture of molecular biology. Despite many efforts to solve the "problem of coding" theoretically (Sarkar 1996), it is largely in the context of biochemical experimental systems that, in the course of the 1950s, the notion of genetic "information" and genetic information transfer took on a generic biological meaning, resulting in what Francis Crick eventually codi- fied as the central dogma of molecular biology: DNA is transcribed into messenger RNA, mes- senger RNA is translated into protein. This "flow of information" was envisaged as being irreversible.

Around 1960, the principle of colinearity between a particular sequence of DNA and a corre- sponding sequence of amino acids had gained plausibility and became crucial for defining what a gene is on the molecular level: A finite, linear sequence of nucleotides that carries the instruc- tion for a corresponding, finite and linear amino acid sequence of a polypeptide. The contiguous array of nucleotides was envisaged to be translated according to a nucleotide letter code. The demonstration of a strict colinearity between gene (polynucleotide) and gene product (polypep- tide) (Yanofsky et al. 1964) was indeed essential for establishing the experimental regime that finally, around 1966, led to the completion of the . Without the postulate of colin- earity, "code" would have remained an empty concept. Colinearity became a notion of opera- tional power - temporarily, until the unexpected complexity of the eukaryotic genome came to be recognized.

With Marshall Nirenberg and Heinrich Matthaei's finding, in 1961, that a uniform stretch of nu- cleic acid consisting of uridine residues can be translated into a uniform stretch of a protein con- sisting of phenylalanine residues in a test tube protein synthesis system, the new molecular genetics had reached the heyday of its precocious simplicity (Nirenberg and Matthaei 1961). The experimental strategies of molecular biology appeared to have come to full fruition. The gene had first become a physico-chemical molecule, and subsequently a carrier of sequence in- formation. During that same year, however, matters had already started to appear more intricate and more entangled. François Jacob and presented their operon model for the regulation of the production of a couple of sugar metabolizing enzymes (Jacob and Monod 1961). The news was that genes come in two classes, one structural, the other reg-

10 Gene Concepts: Fragments from the Perspective of Molecular Biology ulatory, and that operator regions had to be envisaged on parts of the chromosomal DNA that did not code for polypeptides, but nevertheless were essential for the regulation of gene expres- sion.

Since then, these "non-coding," but specific, regulatory DNA-elements have proliferated al- most beyond enumeration. There are promoter and terminator sequences; upstream and down- stream activating elements in upstream and downstream transcribed, but untranslated regions; there are leader sequences; externally and internally transcribed spacers before, between, and after structural genes; there are interspersed repetitive elements and tandemly repeated se- quences such as satellites, LINEs (long interspersed sequences) and SINEs (short interspersed sequences) of various classes and sizes whose function is still far from being understood, given all their bewildering details (Fischer 1995). Are these batteries of elements to be counted as part and parcel of genes, or not?

On the level of posttranscriptional modification, the picture has become equally diffuse, messy, and complicated. Soon it was realized that DNA transcripts such as transfer RNA and ribosomal RNA had to be trimmed and matured in a complex enzymatic manner to become transformed into functional molecules, and that messenger RNAs of eukaryotes underwent extensive post- transcriptional modification both at their 5'-ends (capping) and their 3'-ends (polyadenylation) to be ready to go into the translation machinery. In the late 1970s, to the surprise of everybody, molecular biologists had to acquaint themselves with the idea that eukaryotic genes were com- posed of modules, and that, after transcription, "introns" were cut out and "exons" spliced in order to yield a functional message. This was one of the first major scientific offshoots of re- combinant DNA technology. With that, the colinearity postulate that had been so crucial for the early experimental history of the code, receded into background. What was a gene? Was it the contiguous DNA stretch from which the primary transcript was derived, or was it the spliced messenger that sometimes comprised a fraction as little as ten percent or less of the primary transcript? Since the late 1970s, we have become familiar with various kinds of RNA splicing: Autocatalytic self-splicing, alternative splicing of one single transcript to yield different mes- sages; and even trans-splicing of different primary transcripts to yield one hybrid message. The egg laying hormone of Aplysia, for example, is telling in this respect: One and the same stretch of DNA finally gives rise to eleven protein products involved in the reproductive behavior of this snail (Gros 1991, pp. 492-499).

Let me mention here one last mechanism that has recently been found to operate on the level of RNA transcripts. It is called "messenger RNA editing" (for review, see Adler and Hajduk 1994). In this case, which meanwhile has turned out not just to be an exotic curiosity of some trypanosomes, the original transcript is not merely cut and pasted, but its nucleotide sequence

11 Hans-Jörg Rheinberger is systematically altered after transcription. The nucleotide replacement happens before trans- lation starts, and is mediated by various guide RNAs and enzymes that excise old and insert new nucleotides in order to yield a product that is no longer complementary to the DNA stretch from which it was originally derived. What is a gene?

The trouble with the gene in molecular biology continues on the level of translation. There are findings such as translational starts at different start codons on one and the same messenger RNA. There are instances of "obligatory" frameshifting within a given message without which a nonfunctional polypeptide would result. There is posttranslational protein modification such as removing amino acids both from the amino terminus and from the carboxy terminus of the translated polypeptide. There is an observation that has been made within the last few years and which amounts to what is now being called "protein splicing": Portions of the original transla- tion product have to be catalytically or autocatalytically cleaved and joined together in a new order before yielding a functional protein (Cooper and Stevens 1995). And finally, the news of the year from the translational field is that a ribosome can manage to translate one single polypeptide by accommodating two different messenger RNAs: A case of "trans translation" thus, to use Raymond Gesteland's term (Atkins and Gesteland 1996).

So, how shall we define the gene? Its autocatalytic property has been relegated to the DNA at large. Which are the properties that define the heterocatalytic entities? Which sequence ele- ments are to be included, which ones excluded? On the middle grounds of transcription, ambi- guities multiply. On the other side of the big divide between putative genes and correlated phenes, that is, the proteins, the ambiguities do not come to an end, as we have seen. There seems to be an end, however, in that the final protein products should function after all. François Gros, after a long life in molecular biology, has come to the rather paradox and heretically sounding conclusion that the gene is specified, if at all, by "the products that result from its ac- tivity," that is the RNA molecules and proteins to which they give rise (Gros 1991, p. 297). But is such a retrodefinition satisfactory? Fogle (this volume) argues to the contrary. Furthermore, what are we going to do with all the nonfunctional products caused by DNA mutation, by tran- scriptional and by translational errors?

Let me come to one last point. With eukaryotic model organisms moving center stage, the ge- nome as a whole has, over the years, assumed a more and more flexible and dynamic configu- ration. Not only have the mobile genetic elements, that Barbara McClintock characterized some fifty years ago in , gained currency in the form of transposons that regularly and irregu- larly become excised and inserted all over the genome. There are also other forms of gene shuf- fling at the DNA level. A gigantic amount of somatic gene tinkering and DNA splicing is involved in organizing the immune response, that is, in giving rise to the production of poten-

12 Gene Concepts: Fragments from the Perspective of Molecular Biology tially millions of different antibodies. No genome would be large enough to cope with this task if not parcelling of genes and permutation of their parts had been invented. Gene families con- tain silenced genes (sometimes called pseudogenes); we find jumping genes; and polymor- phism on the DNA level, that is, multiple genes and isoforms. In short, there is a whole battery of mechanisms and entities constituting what could be called hereditary "respiration," or breath- ing.

Molecular biologists have scarcely scratched the surface and not even started to understand this flexible genetic apparatus in terms of evolution and in terms of . In or- der to arrive at an adult stage or to give rise to viable offspring, these gene-phene or phene-gene complexes have to be reproduced in their proper context, their genomic as well as their cellular and inter-cellular environment. The development of molecular biology itself, that enterprise so often described as an ultimately reductionistic conquest, has made it impossible to think of the gene any longer as a continuous piece of DNA matter colinear with a piece of protein matter and defined by a ready-made instruction laid down in its nucleotide sequence and contained be- tween a precise start and end point. Today, it has become more reasonable, and it even might turn out to be sufficient, to speak of genomes, at least of "genetic material" (Kitcher 1982, p. 357) instead of genes, in the developmental as well as in the evolutionary dimension. It has be- come evident that the genome is a dynamic body of ancestrally tinkered pieces and forms of iteration. Genome sequencing combined with intelligent comparison may bring out more of this structure.

If there is a chance to understand evolution beyond the classical synthesis, it is from the per- spective of this dynamic configuration. The purported elementary building block of this com- plex machinery, the simple , is not the prime element of the genetic process, but solely one component in a huge arsenal of DNA tinkering. Likewise, the comparatively simple arrangement of genes in the bacterial may not reflect a primitive simplicity, but may rather be the byproduct of some billion years of streamlining. We have come a long way with molecular biology from genes to genomes. There is still a longer way to go from genomes to organisms that will need the efforts of a new generation of molecular developmental biolo- gists, and the path from there to populations and communities, and vice versa, will not be short- er and left for still another generation.

The foundering of "The Gene" seems to be well under way (Burian 1985; Falk 1984, 1986; Carlson 1991; Fischer 1995). Recently, Jürgen Brosius and Stephen Jay Gould have come up with a new terminology. They propose to abandon the gene concept altogether and to call "nuon" any segment of DNA that has a recognizable structure and/or function (such as a coding segment, a repetitive element, a regulatory element). By duplication, amplification, recombina-

13 Hans-Jörg Rheinberger tion, retroposition, and the like mechanisms, nuons can give rise to "potonuons," that is, entities potentially recruitable as new nuons. These in turn have either the chance of being transformed into "naptonuons," that is, dissipate their non-adaptive former information without acquiring new one, or else into "xaptonuons," that is, elements exapted to a new function (Brosius and Gould 1993). As tempting as this evolutionary genome-terminology may be, its operationality remains to be probed. One of its drawbacks may consist in that it remains completely restricted to the DNA adaptive level.

4. INTEGRONS, AND SOME CONJECTURES

We are farther than ever from the gene as a simple and single bit of DNA carrying the informa- tion for a simple and single, colinear polypeptide chain. There is one thing, however, that ap- pears to have survived all the recent turmoil around the foundering of the gene - "le gène éclaté" (Gros 1991, chapter 7) - , and it conveys a certain to the unbroken gene talk of ge- netic engineering with its naturalistic and often deterministic overtones. This is the so-called central dogma of molecular biology as defined by Francis Crick some forty years ago. The pos- tulate claims that the flow of genetic information is strictly unidirectional and irreversible; it goes from DNA to protein and not the other way round. Whatever may happen on that way, on the DNA level, on the RNA level, or on the protein level, "once 'information' has passed into protein it cannot get out again" (Crick 1958, p. 153). Although there is a complicated machinery of enzymes and DNA binding proteins capable of replicating, recombining and manipulating DNA, and despite considerable evidence for directed mutation, there is no hard evidence for an environmentally or somatically guided, "intelligent" alteration of and retroaction on the genetic material. Despite epigenetic inheritance mechanisms that can carry on patterns of gene activa- tion and repression such as methylation patterns into the next generation(s) (Jablonka and Lamb 1995; Russo, Martienssen, and Riggs 1996), a functionally improved protein that could bring about a corresponding sequence change in its DNA region has so far not been found. Although this Lamarckian dream has never ceased to be dreamed, until today nobody has had an inkling how such a mechanism could work in a biochemically specifiable manner. There appears to have evolved no protein code for producing sequence-specific strings of DNA.

But this hard core of Weismann's legacy notwithstanding, time has come to treat the genome and the phenome as reciprocally important partners from the perspective of an integrated en- semble. With Evelyn Fox Keller, we can state that "owing in large part to the reemergence of , molecular biology may well be said to have 'discovered the organism'" again. But we should also take serious her reminder that "the subject of the new biology, how- ever whole and embodied, is only a distant relative to the organism that had occupied an earlier

14 Gene Concepts: Fragments from the Perspective of Molecular Biology generation of embryologists" (Keller 1995, p. 117). It has become a body pervaded by a whole bunch of linguistic metaphors, above all of them: Information, code, signalling, and communi- cation.

I propose to come back to a suggestion that François Jacob made some 25 years ago and to speak of "integrons" instead of genes and gene products (Jacob 1970). Think of a piece of DNA which gives rise to a DNA-polymerizing enzyme, and forget for a moment that this already in- volves a complex transcription and translation machinery; the polymerase in turn gives rise to more DNA of the same kind. Any mutation in this DNA that raises the efficiency of the product (polymerase making more DNA) will lead to the selection of this basic "integron" among its competitors under conditions in which space and resources are limited. Given such a "hypercy- cle" (Eigen and Schuster 1979), the information said to be contained in the gene takes on mean- ing. In its simplest form, this meaning is confined to the capacity of proliferation. There is more involved, even in this simple case, than just information in the quantitative sense information- theory conveys to this notion. Whatever escape we may try, genetic information cannot be dis- sociated from biological meaning. It has to make "sense" in terms of a biological function. Even when rejecting the strictly hierarchical aspects of Michael Polanyi's theory of boundary condi- tions, we can retain his claim that there is a "semantic relation" between the genetic level and the organismic level in that the latter conveys "meaning" to the former (Polanyi 1969, p. 236). In assuming such a relation, however, we are in the realm of the symbolic. After all, organisms may be viewed as symbolical machines, even if their only basic signifying activity is to further signify, i.e., to reproduce. Just as there is no text without context, "a molecule does not become a message because of any particular shape or structure. [A] molecule becomes a message only in the context of a larger system of physical constraints which I have called a 'language'" (Pattee 1969). But let us remain cautious. Biologists have always widely borrowed from other fields in shaping their vocabulary. Ultimately, however, the objects of their study have also always forced them to transcend analogies. Neither natural languages nor technical information pro- cessing systems are isomorphic with the generation and transmission of "information" in living systems. Finally it is these differences that count. Let us assume that meaning is something tran- siently generated at the intersection between different chains of signifiers. If we look for mean- ing in the organism, we must look, not at its genes, but at the multiple interfaces between the genome and the body (Griesemer, in press).

In order to avoid the ambivalence of talking about information, some prefer to speak about in- struction - "Bauanleitung" in the words of Manfred Eigen (Eigen 1987). But in using the per- formative idiom, one has to be cautious not to equate instruction with the notions of "blueprint" or genetic "program." For, as Jacob reminds us, in order to be read and executed, the "program" of a living being needs always also the products of its execution in order to be read and executed

15 Hans-Jörg Rheinberger

(Jacob 1970, p. 318). Henri Atlan, therefore, is inclined to reverse the metaphor and see the whole metabolic machinery of the organism as constituting a "program" and the genes as "data" fed into the machinery (Atlan and Koppel 1990). Where such a suggestion might lead is open for debate. More is open for debate. The biosphere as a whole might be conceived as an ensem- ble of biosemiotic regimes on many different levels, from biochemical signalling to human and machinic communication (Eder and Rembold 1992). In this perspective, the hereditary process finally might be viewed as just one - although a very basic - form, to turn around the title of J. L. Austin's classic, of "How to Do Words with Things."

You may expect me to come up with a nice solution to this meandering story of the gene at the end. As far as the scientific story goes, there is none. As to an epistemological take-home les- son, I have one. Alas, it is a very disappointing message for non-deconstructivists. I might say in paraphrasing virologist André Lwoff, the early mentor of François Jacob at the Pasteur Insti- tute in , and teacher of Gunther Stent: A gene is a gene is a gene.1 This is a strong claim (see also Kitcher 1992). Taken seriously, it means that in science every presumed referent is turned into a future signifier.

The gene has been a powerful epistemic entity in the history of heredity, in all the vagueness that is characteristic for such entities. It is tempting to generalize this statement and assume that fruitful scientific concepts are bound to be polysemic. I will resist this temptation and assure my critics that precision continues to be a value in science. But precision itself has historically changing boundaries. Assessing what it means to be fuzzy, instead of eliminating vagueness al- together and implementing precision, has become a major concern in fields such as AI-research. Lofti Zadeh claims that

there is a rapidly growing interest in inexact reasoning and processing of knowledge that is imprecise, incomplete, or not totally reliable. And it is in this connection that it will become more and more widely recognized that classical logical systems are inadequate for dealing with uncertainty and that something like fuzzy logic is needed for that purpose‚ (Zadeh 1987, p. 27). On a methodological level and in contrast to the technical solution which the syntax of fuzzy logic has to offer, the question arises whether we need, in order to understand conceptual tink- ering in research, more rigid metaconcepts than those first-order concepts that we, as epistemol- ogists, analyze. I am inclined to deny this. Why should historians and epistemologists be less imprecise, less operational, and less opportunistic after all, than scientists? But do not let our narratives be less coherent than our protagonists' stories either! Let us only be aware that "there

1 André Lwoff concluded his Marjorie Stephenson Memorial Lecture in April 1957 with, as he put it, the "prosy, coarse and vulgar" statement: "Viruses should be considered as viruses because viruses are viruses" (Lwoff 1957, p. 252).

16 Gene Concepts: Fragments from the Perspective of Molecular Biology is an incompatibility between precision and complexity. As the complexity of a system increas- es, our ability to make precise and yet non-trivial assertions about its behavior diminishes" (Za- deh 1987, p. 23). The history of the gene could contribute to the exploration of this epistemological principle of uncertainty and show how to manage complexity, not by striving at a "Theory of Everything," but by looking for patterns of "moderate compressibility" (Cov- eney and Highfield, p. 39).

REFERENCES

Adler, Brian K. , and Stephen L. Hajduk. 1994. Mechanism and origins of RNA editing. Cur- rent Opinion in Genetics and Development 4: 316-322.

Atkins, John F., and Raymond F. Gesteland. 1996. A case for trans translation. Nature 379: 769-771.

Atlan, Henri , and Moshe Koppel. 1990. The cellular computer DNA: Program or data. Bulletin of Mathematical Biology 52: 335-348.

Avery, Oswald T. , Colin M. MacLeod, and Maclyn McCarty. 1944. Studies on the chemical transformation of pneumococcal types. Journal of Experimental Biology and 79: 137-158.

Bachelard, Gaston. 1957. La formation de l'esprit scientifique. Paris: Vrin.

Beadle, George W. 1945. The genetic control of biochemical reactions. The Harvey Lectures 40: 179-194.

Benzer, Seymour. 1955. Fine structure of a genetic region in bacteriophage. Proceedings of the National Academy of Sciences of the of America 41: 344-354.

Bernard, Claude, ed. 1954. Philosophie. Manuscrit inédit. Edited by J. Chevalier. Paris: Edi- tions Hatier-Boivin.

Brosius, Jürgen , and Stephen Jay Gould. 1993. Molecular constructivity. Nature 365: 102.

Burian, Richard. 1985. On conceptual change in biology: The case of the gene. In Evolution at a Crossroads: The New Biology and the New Philosophy of Science, edited by D. J. Depew and B. H. Weber. Cambridge, MA: MIT Press.

Carlson, Elof, and A. 1991. Defining the gene: An evolving concept. American Journal of Hu- man Genetics 49: 475-487.

Cooper, Antony A. , and Tom H. Stevens. 1995. Protein splicing: Self-splicing of genetically mobile elements at the protein level. Trends in Biochemical Science 20: 351-356.

Coveney, Peter , and Roger Highfield. 1995. Frontiers of Complexity: The Search for Order in a Chaotic World. London: Faber and Faber.

17 Hans-Jörg Rheinberger

Crick, Francis H. C. 1958. On protein synthesis. Symposia of the Society for Experimental Bi- ology London 12: 138-163.

Delbrück, Max. 1942. Bacterial viruses (). Advances in Enzymology 2: 1-32.

Dunn, Leslie C. 1991 (1965). A Short History of Genetics. Ames, Iowa: Iowa State University Press.

Eder, J. , and H. Rembold. 1992. Biosemiotics - a paradigm of biology. Die Naturwissenschaf- ten 79: 60-67.

Eigen, Manfred , and Peter Schuster. 1979. The Hypercycle - A Principle of Natural Selforga- nization. Heidelberg: Springer.

Eigen, Manfred. 1987. Stufen zum Leben. München: Piper.

Elkana, Yehuda. 1970. Helmholtz‘ 'Kraft': An illustration of concepts in flux. Historical Studies in the Physical Sciences 2: 263-298.

Falk, Raphael. 1984. The gene in search of an identity. Human Genetics 68: 195-204.

Falk, Raphael. 1986. What is a gene? Studies in the History and Philosophy of Science 17: 133- 173.

Fischer, Ernst Peter. 1995. How many genes has a human being? The analytical limits of a com- plex concept. In The Human Genome, edited by E. P. Fischer and S. Klose. München: Piper.

Griesemer, James R. In press. The informational gene and the substantial body: On the gener- alization of evolutionary theory by abstraction. In Varieties of Idealization, edited by N. Cartwright and M. Jones. Amsterdam: Editions Rodopi.

Gros, François. 1991. Les secrets du gène. Paris: Editions Odile Jacob.

Jablonka, Eva , and Marion J. Lamb. 1995. Epigenetic Inheritance and Evolution: The Lama- rckian Dimension. Oxford: Oxford University Press.

Jacob, François , and Jacques Monod. 1961. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology 3: 318-356***.

Jacob, François. 1970. La logique du vivant. Paris: Gallimard.

Jacob, François. 1981. Le jeu des possibles. Paris: Fayard.

Kay, Lily E. Forthcoming. Who Wrote the Book of Life? A History of the Genetic Code. Chica- go: Press.

Keller, Evelyn F. 1995. Refiguring Life: Metaphors of Twentieth-Century Biology. New York: Press.

Kitcher, Philip. 1982. Genes. The British Journal for the Philosophy of Science 33: 337-359.

18 Gene Concepts: Fragments from the Perspective of Molecular Biology

Kitcher, Philip. 1992. Gene: Current usages. In Keywords in Evolutionary Biology, edited by E. F. K. a. E. A. Lloyd. Cambridge: Harvard Univeristy Press.

Kühn, Alfred. 1941. Über eine Genwirkkette der Pigmentbildung bei Insekten. Nachrichten der Akadmie der Wissenschaften in Göttingen Mathematisch-Physikalische Klasse: 231-261.

Lwoff, André. 1957. The concept of virus. Journal of General Microbiology 17: 239-253.

Moles, Abraham A. 1995. Les sciences de l'imprécis. Paris: Seuil.

Morange, Michel. 1994. Histoire de la biologie moléculaire. Paris: Editions la Découverte.

Muller, Hermann J. 1927. Artificial transmutation of the gene. Science 66: 84-87.

Muller, Herman J. 1951. The development of the gene theory. In Genetics in the 20th Century, edited by L. C. Dunn. New York: Macmillan.

Nirenberg, Marshall W. , and J. Heinrich Matthaei. 1961. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences of the United States of America 47: 1588-1602

Olby, Robert. 1994. The Path to the Double Helix. New York: Dover.

Pattee, Howard. 1969. How does a molecule become a message? Developmental Biology Sup- plement 3: 1-16.

Polanyi, Michael. 1969. Life's irreducible structure. In Knowing and Being: Essays by Michael Polanyi, edited by M. Grene. Chicago: University of Chicago Press.

Portin, Petter. 1993. The concept of the gene: Short history and present status. The Quarterly Review of Biology 68: 173-223.

Portugal, Franklin H. , and Jack S. Cohen. 1977. A Century of DNA. Cambridge, Mass.: MIT Press.

Rheinberger, Hans-Jörg. 1997. Toward a History of Epistemic Things: Synthesizing Proteins in the Test Tube. Stanford: Press.

Russo, Vincenzo E. A. , Robert A. Martienssen, and Arthur D. Riggs. 1996. Epigenetic Mech- anisms of Gene Regulation. New York: Cold Spring Harbor Laboratory Press.

Sarkar, Sahotra. 1996. Biological information: A skeptical look at some central dogmas of mo- lecular biology. In The Philosophy and History of Molecular Biology: New Perspectives, ed- ited by S. Sarkar. Dordrecht: Kluwer Academic Publishers.

Stadler, Lewis J. 1928. Genetic effects of x-rays in maize. Proceedings of the National Acade- my of Sciences of the United States of America 14: 69-75.

19 Hans-Jörg Rheinberger

Star, Susan Leigh, and James R. Griesemer. 1988. Institutional ecology, 'translations' and boundary objects: Amateurs and professionals in Berkeley‘s Museum of Vertebrate Zoology 1907-39. Social Studies of Science 19: 387-420.

Timofeeff-Ressovsky, Nikolaj, Karl Zimmer, and Max Delbrück. 1935. Über die Natur der Genmutation und der Genstruktur. Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen Mathematisch-physikalische Klasse, Fachgruppe VI, Nachrichten aus der Biolo- gie, N.F. 1: 190-245.

Vischer, Ernst, Stephen Zamenoff, and Erwin Chargaff. 1949. Microbial nucleic acids: The desoxypentose nucleic acids of avian tubercle bacilli and yeast. Journal of Biological Chem- istry 177: 429-438.

Yanofsky, Charles, B.C. Carlton, J.R. Guest, D.R. Helinski, and U Henning. 1964. On the colineraity of gene structure and protein structure. Proceedings of the National Academy of Sciences USA 51: 266-272.

Zadeh, Lofti. 1987. Coping with the imprecision of the real world. In Fuzzy Sets and Applica- tions: Selected Papers by Lofti A. Zadeh, edited by S. O. R. R. Yager R. M. Tong, and H. T. Nguyen. New York: John Wiley.

Zamecnik, Paul C. 1976. Protein synthesis -- early waves and recent ripples. In Reflections in , edited by A. Kornberg, B. L. Horecker, L. Cornudella and J. Oro. New York: Pergamon Press.

Comments by Michel Morange

At the beginning of his text, Hans-Jörg remarks that his only concern is to point out some ques- tions. In fact, H.-J. raises many questions, only some of which I will deal with here. I would like to divide this discussion into three parts: 1) H.-J. Rheinberger's historical account of the way the gene concept was transformed by Molecular Biology; 2) the present status of the gene con- cept, with an analysis of the relationships between genes, genomes and organisms; 3) and, fi- nally, the epistemological consequences of the previous analyses: what are the general lessons that we can learn from the specific study of the gene concept? These three parts do not com- pletely match up with the presentation adopted by H.-J., but I felt that it would clarify the dis- cussion to organize my comments in this way.

I

The first part is itself subdivided into two periods. From the birth of Molecular Biology to 1960, and from 1960 to the present. During the first period, Molecular Biology answered two impor- tant questions, the physico-chemical nature of the genes and their mechanisms of action, which had been neglected by "classical genetics". Molecular biologists showed that genes were at the same time material entities (made of DNA), and carriers of information. This history has al- ready been extensively examined, and it is obviously not the most original part of H.-J.'s con- tribution. I agree with him on his choice of the major steps in this history. I simply have some doubts that, for Max Delbrück, the central target to be addressed was the molecular constitution

20 Gene Concepts: Fragments from the Perspective of Molecular Biology of the units of heredity. My feeling is that Max Delbrück's main concern was more the search for the physical principles explaining the two main characteristics of genes, mutability and self- replication, than the molecular constitution of the gene itself (Kay 1985; Fischer and Lipson 1988). This explains the fact that phage studies became as formal as classical genetics: their evolution is neither a paradox nor an irony of the history of Molecular Biology, but it was some- how contained in Max Delbrück's initial project.

However, I totally agree with H.-J. that the difficulties appeared with the work of Jacob and Monod on the in the early sixties. The problem did not lie in the distinction between regulatory and structural genes - both code for proteins - but in the association of genes in an operon, and, most of all, in the existence of regulatory, non-coding DNA sequences, the first characterized being the operator. Studies of higher organisms have since revealed the impor- tance and complexity of these non-coding regulatory sequences. Their existence makes the def- inition of gene boundaries very difficult. This first attack on the simplicity of the gene concept did not remain in isolation and, as Thomas Fogle also did (Fogle, this volume), H.-J. provides us with a clear, complete and up-to-date description of all the observations which loosen the definition of what is a gene: the existence of RNA post-transcriptional modifications, the dis- covery of genes-in-pieces, the occurrence of alternative RNA splicing which generates two or more different protein products from a single gene, or the phenomenon of editing. A similar loosening of the relationship between genes and proteins was also demonstrated at the transla- tional level, for instance with the quite recent discoveries of protein splicing or trans translation.

All these discoveries which make the answer to the question: what is a gene? increasingly dif- ficult, are nicely described by H.-J. Rheinberger. My only concern is their "real" importance. In principle, the characterization, for instance, of only one DNA fragment which is read in two dif- ferent frames and gives rise to two different proteins, is sufficient to state that the definition of a gene as a fragment of DNA coding for a protein is no longer valid. But in practice these dif- ferent discoveries do not carry the same weight. The abundance and diversity of regulatory se- quences and the fragmentation of genes are general phenomena. Alternative splicing is a more limited phenomenon, exploited by Nature to generate from one single gene proteins with slight- ly different, but related properties. Trans splicing or editing are limited to some organisms and genes. Their existence does not preclude molecular biologists from neglecting them in their dai- ly work when they interpret their data.

II

This point leads directly to the second part of my comments on H.-J. Rheinberger's contribu- tion. Must this "shake up" of the gene concept lead to a reevaluation of the importance of the genome as a whole, and to a reequilibration of the relationships between genes, genomes and organisms?

H.-J. describes the experiments which demonstrated the flexibility of the genome, in particular the evidence for the existence of mobile genetic elements. As in the previous case, I am in total agreement with H.-J.'s description of these mechanisms, but I remain doubtful about their im- portance. H.-J. tells us that " a gigantic amount of somatic gene tinkering and DNA splicing is involved in organizing the immune response". Gigantic is the appropriate adjective if one refers to the number of sequences generated by this system. But such a mechanism remains limited to the genes coding for immunoglobulins and immune cell receptors.

More generally, I remain doubtful whether the genome is really more than genes, whether changes in its organization during evolution will reveal more than the study of "isolated genes".

21 Hans-Jörg Rheinberger

H.-J. says that "genome sequencing combined with intelligent comparison will bring out more of this (the genome) structure". As he probably did, I eagerly looked at the recently obtained first complete genomic sequences, either bacterial or eukaryotic (yeast). These works were probably not "intelligent" enough, because so far they have revealed nothing significant con- cerning a supragenic order (Fleischmann et al 1995; Goffeau et al 1996). As H.-J. tells us, there is probably a very long way to go from genes to genomes.

What about the next step, from genomes to organisms? H.-J. shows that there was a progressive reequilibration between genes, genomes and organisms. As already discussed by Evelyn Fox Keller (1995), this reequilibration resulted from the increasing interest of molecular biologists in unraveling the complexity of developmental mechanisms. Will this reequilibration reach the point where, according to Henri Atlan, the cell, the fertilized egg, will be the program, whereas the genes will only be the data fed into the machinery? H.-J. does not adopt this position. He argues that linguistic and informational metaphors have to be used very carefully: information in living beings has meaning only in the large context of the physical constraints of the organ- ism.

Whatever the reequilibration between genes, genomes and organisms will ultimately be, H.-J. shows that Weismann's legacy still remains a hard core of present biology, actualized as it was in the central dogma of Molecular Biology. Information goes from genes to proteins, not in the reverse direction. No "intelligent" alterations of the genes and genomes have so far been de- scribed. There is no doubt that Weismann's legacy leaves an asymmetry in the relationships be- tween genes and genomes on one side, the organism on the other. However, there are interesting, recent results, not discussed by H.-J., which show that organisms are able to adapt their mutation rates and to reorganize their genomes in answer to the environmental conditions (Keller 1992; Shapiro 1995). Living beings seem able to control the strength of evolution, not its direction.

III

In the third part of this discussion, I would like to focus on the gene concept and on what the analysis of this concept tells us about Science. My guess is that this part of H.-J.'s contribution is the most original, the one which deserves the most extensive discussion.

In a very vivid and well informed manner, H.-J. demonstrates that the definition of what is a gene is different for researchers working in different areas of molecular biology. Since a gene is an epistemic object, embedded in the different practices of biologists and molecular biolo- gists, its definition is as diverse as these practices. For instance, in developmental biology, the regulatory aspects of genetic circuitry are given importance. I think that most of us will agree with the description of the gene concept as a loose, dirty concept, a concept in flux as named by Yehuda Elkana, a boundary object able to circulate between the different communities of biologists. My only question concerning the analysis of H.-J., which is marginal for the discus- sion of the gene concept, is whether molecular biology is a hybrid discipline. Historically, it was. Today, I am doubtful of its hybrid nature, even if the researchers who call themselves mo- lecular biologists use different techniques and have different interests.

But the most original contribution of H.-J. is not his diagnosis of the gene as a boundary object, a loose concept, but the way he considers the "solution" of the present crisis and its interest for science studies in general. The opinion of H.-J. is very clear: any attempt to clarify the gene con- cept would have been, and would still be, in the best case useless, in the worst an obstacle to past and present progress. Paraphrasing André Lwoff, H.-J. says that the only good definition

22 Gene Concepts: Fragments from the Perspective of Molecular Biology of a gene is that a gene is ... a gene. H.-J. shows that the conceptual clarifications introduced by S. Benzer in the fifties had no real impact, at least in the language of biologists. I fear, as H.-J. does, that the new terminology introduced by Jürgen Brosius and Stephen Jay Gould to charac- terize the behavior of DNA fragments during evolution, the nuon, the potonuon, the naptonuon and the xaptonuon, will have a similar unfavorable fate (Brosius and Gould 1992).

I agree with H.-J. that we must renounce a normative epistemology, giving advice to scientists or criticizing their way of doing things. What has to be done is not to clear the dirty concepts, neither in Science nor in Epistemology or History of Science, but to understand why such fuzzy concepts work so efficiently, in what respect they allow to contain what H.-J. calls the "excess" of Nature, in what respect they contribute to fight against its underdetermination. This problem is probably central for our gene project. It raises in my mind three different questions and/or comments that I would like to submit to H.-J. for consideration.

The first concerns the interest of epistemic objects such as genes. They have no well defined contours, because they are embedded in different experimental practices. However, I am not convinced that all definitions of a gene refer to operational criteria. Genes are more than the simple addition of experimental practices. What is this "more"? What role does it play in the development of knowledge?

My second point concerns the cases - rare or not, I have no present evaluation - when scientists, or scientists and philosophers, agree to give a precise definition of these dirty concepts. I can mention the case of the adaptive enzymes: in 1952, Sol Spiegelman, J. Monod and the most in- fluential researchers in this field agreed to substitute the term inducible enzyme to the ambigu- ous one, adaptive enzyme (Cohn et al 1953). I want also to mention a more recent example: molecular and developmental biologists agreed on a restrictive use of the concept of homology (Reeck et al 1987). Why was such an agreement on a restrictive use of the concept of homology possible in this case and impossible until now for the concept of gene? Where does the differ- ence lie?

My last comment concerns the only statement in H.-J.'s contribution with which I truly dis- agree. Since scientific concepts are and have to be vague, at least fuzzy to be productive, must we adapt our logical systems to this fuzziness, as H.-J. suggests? Must we adopt a fuzzy logic and use inexact reasoning? I am convinced that fuzziness of concepts is a necessary conse- quence of the lack of adaptation of our logic to the "excess" of Nature. We must admit it as a limitation, somehow suffer from it, but not change our way of doing things and Science. To sup- port the latter affirmation, and in order to bring these comments to their conclusion, I would like to quote Jean Jaurès, the French socialist politician killed while trying to oppose the declaration of the First World War. This quotation was placed under the title of Le Populaire, a French so- cialist newspaper: "Aller à l'idéal et comprendre le réel, To aim at the ideal, but to understand what reality is". I have some doubts now, after the discussion which took place during this meet- ing, on the reality of ... reality. Nevertheless, I think that we must fight to have scientific con- cepts as "clean" as possible, but that we must also realize it is sometimes better to renounce this aim, and that, in any case, it will always remain an impossible task.

REFERENCES

Brosius, J., and Gould, S. J. 1992. On "genomenclature": A comprehensive (and respectful) tax- onomy for pseudogenes and other "junk DNA". Proc. Natl. Acad. Sci. USA 89:10706-10710.

23 Hans-Jörg Rheinberger

Cohn, M., Monod, J., Pollock, M. R., Spiegelman, S., and Stanier, R. Y. 1953. Terminology of enzyme formation. Nature 172:1096-1097.

Fischer, E. P., and Lipson, C. 1988. Thinking about Science: Max Delbrück and the origins of molecular biology. New York: W. W. Norton.

Fleischmann, R. D., et al. 1995. Whole-genome random sequencing and assembly of Haemo- philus influenzae Rd. Science 269:496-512.

Fogle, T. The dissolution of protein coding genes in molecular biology. This volume.

Goffeau, A., et al. 1996. Life with 6000 genes. Science 274:546-567.

Kay, L. E. 1985. Conceptual models and analytical tools: the biology of physicist Max Delbrück. J. Hist. Biol. 18:207-246.

Keller, E. F. 1992. Between language and science: the question of directed mutation in molec- ular genetics. Perspect. Biol. Med. 35:293-306.

Keller, E. F. 1995. Refiguring Life: metaphors of 20th century biology. New York: Columbia University Press.

Reeck, G. R., et al. 1987. "Homology" in proteins and nucleic acids: a terminology muddle and a way out of it. Cell 50:667.

Shapiro, J. A. 1995. Adaptive mutation: Who's really in the garden? Science 268:373-374.

24 THE DEVELOPMENTAL GENE CONCEPT: HISTORY AND LIMITS

Michel Morange

The concept of developmental gene occupies a central position in contemporary biological re- search, at the crossroads between developmental and evolutionary biology. In this brief text, I would like to outline the major steps which led to the formation of this concept and emphasize some of the difficulties encountered by biologists in its present-day use. Today, all biologists agree that a limited number of genes, highly conserved during evolution, play a major role in the control of development. However, the precise terminology is not yet stabilized: biologists speak of master genes, switch genes, selector genes, control genes, master control genes, key regulatory genes, genes controlling development, developmental genes, developmental regula- tory genes, developmental control genes or even morphology sculptors. The history of these different names and their usage would be worthy of a study in itself: the use of one or another expression is a subtle way to recognize a debt towards one or another of the traditions which led to the formation of the developmental gene concept.

I. HISTORICAL FORMATION OF THE CONCEPT

The separation which occurred at the beginning of the 20th century between the science of he- redity, genetics, and embryology has been well documented (Allen 1978). The scientific and sociological reasons behind this separation have been analyzed (Sapp 1983, 1987; Gilbert in press). This separation was, however, specific to American genetics. In Germany, the two dis- ciplines remained tightly linked (Harwood 1993). The first attempts to reduce this rift were made during the thirties. After the publication of Embryology and Genetics (Morgan 1934), in which the "Pope of Genetics", Thomas H. Morgan, advocated that these two disciplines join their efforts but did not really show the way to do so, studies were done to bridge the gap (Gil- bert 1988, 1991a). Some of these studies were intended to characterize the role of genes in de- velopment as a way to uncover the normal physiological function of genes. Such were the studies of Richard Goldschmidt (1938; Allen 1974), or George Beadle (Kohler 1991; Burian, Gayon and Zallen 1991). The existence of a specific developmental function of genes, distinct from their normal physiological function, was absent from the writings of these Michel Morange authors: in fact, these works led to the characterization of the one gene-one enzyme relation- ship, a major step for the young discipline of molecular biology, which nonetheless did not pro- vide any immediate clue on the role of genes in development.

The studies performed by Paul Chesley (1935) and Salomé Gluecksohn-Schoenheimer (1938, 1940; Gilbert 1991a) on the brachyury mutation (and later by S. G.-S. on the kinky mutation (1949)) attempted to correlate the alterations of development observed in these mutants to the mechanisms revealed by the German school of embryology headed by (1938; Saha 1991). The continuous isolation by Calvin B. Bridges, but also by many other geneticists (Bridges 1917; Li 1927; Balkaschina 1929; Bridges and Dobzhansky 1933; Waddington 1939), of mutations deeply affecting the structure of Drosophila and its development, the work of Walter Landauer on the creeper factor in fowl (1944), or the studies of D.F. Poulson (1937) sys- tematically correlating chromosomal modifications and alterations in the development of , demonstrated the early involvement of genes in development. These studies were fundamental in showing that genes actively participate in the first steps of embryo- genesis, and in reversing the ideas of the most conservative embryologists who thought that genes were responsible only for the determination of the most superficial characteristics of the organism. During these years (1915-1950), the mutations that would play a major role in the future history of developmental biology and that affect what would later be considered as de- velopmental genes, were described. However, the characterization of these genes did not im- mediately lead to the idea that a specific class of genes was involved in development. Some genes (those affected by these mutations) were considered to play definite roles in the embry- onic developmental process, whereas others were thought to affect more superficial character- istics. Yet there was a continuum between the two categories of genes, and the mechanism of action, the of the different genes, was thought to follow the same rules.

The work initiated by C.B. Bridges and pursued by Edward B. Lewis in 1940 on the bithorax mutations led geneticists a step further (Lewis 1992), though without enabling the formation of the developmental gene concept. The careful description and analysis of these different muta- tions showed that some of them were pseudoallelic and did not obey the rules of independent recombination. To explain this pseudoallelism, and to correlate it with the macroscopic modi- fications affecting what became the bithorax complex, Ed Lewis (1951) suggested the existence of different but related genes, resulting from the duplication of an ancestral gene: the action of these genes controlled the nature of the segments of the larva or adult. It was however falsely deduced from these observations that the different pseudoallelic genes controlled a series of se- quentially related biochemical reactions. The association of the bithorax complex with a meta- bolic pathway was to be abandoned in the sixties for an interpretation of the data within the framework of the operon model (Lewis 1963). A model, proposed only in 1978 by Ed Lewis

26 The Developmental Gene Concept: History and limits

(Lewis 1978), established a correlation between the organization of the genetic material and the construction of the organism. The future successful development of the studies on the homeotic genes, with the characterization of the and the discovery of similarly organized genes in higher organisms, must not lead to a retrospective interpretation of these works. The article published in Nature in 1978 was difficult to read, containing an incorrect estimate of the number of genes involved. But most of all, this article did not make any allusion whatsoever to a possible generalization of the results: the existence of the bithorax complex was linked with the segmented nature of insects (and arthropods); it was not a general solution for developmen- tal biology.

It is obvious, however, that the studies on the bithorax complex emphasized the importance of some genes in controlling the formation of living organisms. The idea that development is con- trolled by a subfamily of genes can be traced back to the work of François Jacob and Jacques Monod on the regulatory mechanisms operating in micro-organisms (Jacob and Monod 1961; Monod and Jacob 1961; Brenner et al 1990). In fact, by the sixties it was widely admitted that the models extended very rapidly by the Pasteurian molecular biologists (Jacob and Monod 1963) to higher organisms provided a solution to the problem raised as early as 1934 by T.H. Morgan, the necessary differential control of the activities of genes during development and in the different tissues of the adult organism. As early as 1962, the British embryologist Conrad H. Waddington recognized the interest of these models to explain the control of the activity of the nuclear genes by signals coming from the cytoplasm, the only way to reconcile the role of cytoplasm (emphasized by embryologists) and nucleus (favored by geneticists) in the develop- ment of the organism (Waddington 1962; Gilbert 1991b).

However, the most important contribution of the models of F. Jacob and J. Monod was else- where: these models distinguished in the genome two different kinds of genes, the structural genes coding for proteins and enzymes, and the regulatory genes coding for repressors, the only function of which was to control the activity of structural genes (Jacob and Monod 1959). F. Jacob and J. Monod introduced a hierarchy in the genome (Morange 1990; Thieffry 1996): they suggested that, to understand development, one had to characterize the network of regulatory genes controlling this development, not the details of the functioning of structural genes. It was also implicit in the models elaborated by F. Jacob and J. Monod that modifications of regulatory genes, and regulatory networks, played central roles in the evolution of living beings. This idea was made explicit only later by F. Jacob (1977, 1981) and was reminiscent of the distinction between micromutations and macromutations advocated by the German geneticist Richard Goldschmidt (1940). The existence of such a link between evolution and the modification of regulatory genes was pursued by Mary-Claire King and Allan Wilson (Wilson, Maxson and Sarich 1974; King and Wilson 1975; Wilson, Carlson and White 1977) and, later, by Rudolf

27 Michel Morange

Raff and Thomas Kaufman (1983). Like R. Goldschmidt before them, but in contrast to F. Ja- cob, A. Wilson and M.-C. King correlated the modifications in the developmental regulatory systems to gene and chromosome rearrangements, making this model "heterodox" and unac- ceptable by most molecular biologists (Wilson, Sarich and Maxson 1974; Wilson et al 1975). Further development of this work was limited by the complete absence of molecular tools al- lowing the isolation and characterization of these regulatory genes. This probably explains the caution of F. Jacob, as well as his personal reluctance to rescue a ghost (the macromutations) and to directly fight with the neo-darwinists (Morange in press).

One of the geneticists who best understood the interest of this distinction between structural and regulatory genes was Antonio Garcia-Bellido (Garcia-Bellido, Ripoll and Morata 1973; Garcia- Bellido 1975; Garcia-Bellido 1977; Garcia-Bellido, Lawrence and Morata 1979; Morange 1996). The existence of regulatory genes offered a simple way to explain the puzzling observa- tions on the transdetermination of the cells of the imaginal disks discovered by Ernst Hadorn (Hadorn 1968) and studied by A. Garcia-Bellido during his stay in Hadorn's laboratory in Zu- rich. From these data, a genetic model with bistable control circuits was proposed by Stuart A. Kauffman (1973). In 1975, after a stay in California with Ed Lewis, A. Garcia-Bellido elabo- rated a sophisticated and complete model of the genetics of development of Drosophila (Gar- cia-Bellido 1975, 1977). One of the main components of this model was the existence of a family of genes, the selector genes, which controlled the formation of compartments during Drosophila development. Homeotic genes were selector genes, as were other genes such as the gene. The complexity of the model elaborated by A. Garcia-Bellido, together with the far from simple way in which this model was presented, despite its publicization by Peter Lawrence and Francis Crick (Crick and Lawrence 1975; Morata and Lawrence 1977), its "the- oretical value" in the absence of any structural data on the selector genes, made it such that this model was already forgotten when the tools of genetic engineering allowed the isolation of se- lector genes (and in particular of the homeotic genes) and revealed their extraordinary conser- vation during the evolution of living beings. This should not mask the fact that the models of A. Garcia-Bellido offered the first "genetic framework for Drosophila development" (Baker 1978) and paved the way for the future work of Walter Gehring, Christiane Nüsslein-Volhard and Eric Wieschaus on the characterization of the early developmental genes (Nüsslein-Vol- hard and Wieschaus 1980; Keller 1996).

It is often considered that the importance of C. Nüsslein-Volhard's work derived from her char- acterization of the maternal genes affecting early development, what would have constituted a sort of revenge of the cytoplasm on the nucleus (Keller 1995). In fact, the was awarded in 1995 for the work on the characterization and classification of the genes responsible for the of the Drosophila embryo, performed by C. Nüsslein-Volhard and E. Wie-

28 The Developmental Gene Concept: History and limits schaus when they were at EMBL in Heidelberg. Most of these genes are expressed after fertil- ization, from the zygotic genome, not in the oocyte. The importance of this work lied in the methodology used, saturation mutagenesis, which allowed these authors to demonstrate that the number of genes responsible for segmentation was limited to a few dozen: early Drosophila de- velopment was controlled by a limited number of genes.

In the late seventies and early eighties, all the studies were converging to suggest that develop- ment was due to the action of a limited number of genes controlling the activity of a battery of structural genes. Does this mean that the first molecular characterization of a master gene in 1984 was simply a verification of this well accepted model? The previous presentation of the data is illusory, because the main discovery responsible for the rapid formation and acceptance of the developmental gene concept was the demonstration that these genes had been highly con- served during evolution. This conservation was totally unexpected and never mentioned in pre- vious articles published by developmental biologists: the development of the Drosophila fly seemed to be totally unrelated to the development of a vertebrate. What was sought instead was the conservation of a logic of development, the structure of a developmental program, and not the conservation of the genes active in this developmental process. This appears obvious when one considers the animals chosen in the sixties as model systems to study development. Caenorhabditis elegans, adopted by , is distantly related to vertebrates and mammals, and its pathway of development is totally different. In the sixties and even later, this was not considered as a handicap (Brenner 1984).

The existence of a structural element conserved between different genes involved in Drosophila development was revealed by two different groups in 1984 (McGinnis et al 1984a and b; Scott and Weiner 1984; McGinnis 1994). This conservation was not a surprise, since Ed Lewis had already suggested that the different homeotic genes had formed by successive duplication of an ancestral gene. The demonstration that this conserved motif was found in distantly related or- ganisms - vertebrates such as Xenopus, mouse and humans - was more surprising (McGinnis et al 1984b and c; Carrasco et al 1984). It is possible that the first experiments were performed as negative controls, the presence of this structural element initially being associated in the mind of the experimenters with the segmental nature of insects and arthropods. This same year, it was discovered that the conserved motif corresponded to a DNA-binding structure, allowing the proteins in which it was located to regulate the expression of other genes by binding to their promoters (Sheperd et al 1984; Laughon and Scott 1984).

It was rapidly shown that this high level of conservation was not limited to the homeotic genes. Other motifs, related but not identical to the homeobox found in homeotic genes, were discov- ered in other Drosophila developmental genes and shown to have also been conserved during

29 Michel Morange evolution (De Robertis 1994). These included the otx genes which have a fundamental role in the formation of the head (Finkelstein and Boncinelli 1994; Acampora et al 1995; Matsuo et al 1995; Ang et al 1996), the (Gruss and Walther 1992), the Lim genes (Shawlot and Behringer 1995), and many other genes. The conservation was not limited to the DNA-binding motif, although it was often higher in this motif than in the surrounding parts of the gene. The conservation also extended to the functions of these genes - which can be successfully ex- changed between different organisms such as, for instance, Drosophila and mammals (Malicki, Schughart and McGinnis 1990; McGinnis, Kuziora and McGinnis 1990). Other developmental genes unrelated to the homeobox-containing genes were also characterized and shown to have been conserved during evolution. These genes coded for molecules participating in cell to cell communication, such as secreted proteins, hedgehog in Drosophila and related hedgehog genes in mice, wingless in Drosophila and the corresponding wnt genes in mammals, members of the transforming growth factor families (decapentaplegic in Drosophila and its homolog BMP-4 in Xenopus) or cell membrane receptors (such as Notch), etc. Perhaps even more important than the conservation of developmental genes between very different organisms such as Drosophila and mouse is the conservation of entire pathways or networks of regulatory genes between these organisms (Artavanis-Tsakonas, Matsuno and Fortini 1995; Goodrich et al 1996; Gaunt 1997). The importance of these genes in development has been confirmed in different experimental systems by the correlation between their expression and the inductive processes described more than 50 years ago by Hans Spemann and his co-workers (Spemann 1938; Saha 1991). This con- tinuity is exemplified by the fact that the first Nobel prize awarded for studies in embryology since H. Mangold and H. Spemann was given in 1995 for work on developmental genes, and in particular homeotic genes.

Ironically, however, history seems to stutter. The first studies of Eddy M. De Robertis suggest- ed that the homeobox-containing gene goosecoid played a central role in the functions of Spe- mann's organizer (Cho et al 1991). Further studies revealed the intervention of many other proteins and genes and confirmed the complexity of the inductive processes and the division of Spemann's organizer into independently regulated organizing centers (De Robertis 1995).

However, despite these difficulties, the important result was that, in under five years, a large family of genes, coding for proteins essential for the development of Drosophila, were also dis- covered in other organisms and shown to be expressed at specific positions and times during early embryogenesis.

30 The Developmental Gene Concept: History and limits

II. THE SIGNIFICANCE OF THE DEVELOPMENTAL GENE CONCEPT

Today, biologists widely share the conviction that they have identified at least some of the genes involved in the control of embryogenesis. Even if different names are used to designate these genes, biologists agree on their central significance, on the fact that their characterization is the best way to unravel the complexity of embryogenesis in all animals.

As we saw previously, the characterization of the developmental genes was the result of a long process, from the initial demonstration of the involvement of genes in development to the iso- lation, among the thousands of genes harbored in an organism, of a small group of master genes. But, most of all, the importance attributed to the developmental genes resulted from the unex- pected, unprecedented and unexplained conservation of these genes. The discovery of develop- mental genes is clearly a good example of a discovery due to a new technology, but also to an historical event which is not explainable by "strategies" of the participants. As nicely described by Scott Gilbert, John Opitz and Rudolf Raff (1996), the recent re-discovery of morphological areas of biology which had been marginalized by the development of genetics and the Modern Synthesis of genetics and evolution, was important but was more a consequence of the interest in developmental genes than a cause: it would probably not have been sufficient to give rise to the new present-day developmental biology, if the extraordinary structural and functional con- servation of developmental genes had not convinced biologists of their importance.

Yet the conservation of these developmental genes during evolution, which is certainly the best argument in favor of their primordial role in the development of living organisms, raises a par- adox: the same functionally equivalent genes control the development of very different organ- isms which are built following different pathways and which have different plans of organization (Kenyon 1994).

We will mainly focus our discussion on homeobox-containing genes of the homeotic complex- es: they were discovered first, they are among the best conserved developmental genes and their organization in the genome has been also conserved. Today, more information has been collect- ed on them than on any other developmental genes. The previously described paradox explains the rapid succession of models proposed to interpret the role of these genes in mammals. They are gathered in four homeotic gene complexes, resulting from a two-fold duplication of the complexes present in Drosophila. The first model proposed that these genes had the same func- tions as in Drosophila and were involved in the segmentation of the mammalian embryo, more precisely in the determination of the nature of its segments (Dressler and Gruss 1988). This model was supported by some Knock-Out (K.O.) experiments in mice showing that their inac- tivation led to homeotic transformations (Le Mouellic, Lallemand and Brûlet 1992). This first model considered that segmentation in vertebrates is homologous to segmentation in insects

31 Michel Morange and obeys the same mechanisms, a hypothesis which is not unanimously accepted (Hogan, Hol- land and Schofield 1985). Moreover, homeotic gene complexes were discovered in nonseg- mented animals such as C. elegans (Kenyon and Wang 1991). It was therefore proposed that homeotic genes only played a role in the antero-posterior patterning of the organisms. The ob- servation that homeotic genes are also expressed in the limbs, and are essential for their forma- tion, led to the elaboration of more sophisticated models in which homeotic genes were responsible for regional specification and/or axial patterning, constituting a positional code used during development (Kessel and Gruss 1991; Hunt and Krumlauf 1992; McGinnis and Krumlauf 1992; Wolpert 1995): homeotic genes might simply be generators of spatial diversity. Recently, these models have been criticized in favor of a new vision in which homeotic genes bear the temporal information controlling cell proliferation and the consequent building of the organism (Dollé et al 1993; Duboule 1995).

The same difficulties are met by developmental biologists when they try to correlate the modi- fications in homeotic genes with the morphological modifications which occurred during evo- lution, for instance during the evolution of vertebrates. Preliminary observations were very attractive: modifications of homeotic gene expression led, in some cases, to atavistic transfor- mations, to the reappearance of which had disappeared during further evolution (Kessel, Balling and Gruss 1990; Lufkin et al 1992). Since homeotic genes and their organiza- tion have been extensively conserved during vertebrate evolution, transformations can result from a change in either the regulation of homeotic gene expression, or in the nature of the genes which are regulated by these homeotic genes. Therefore, the important modifications that oc- curred during evolution, such as the transition between fins and limbs, are explained by alter- ation of genes located upstream or downstream of homeotic genes (Sordino, van der Hoeven and Duboule 1995; Carroll 1995), leaving for the latter a badly defined role in development, a kind of empty framework.

Regardless of which interpretation will ultimately be adopted to explain the role of the homeotic genes, two hard facts remain. The first is the role of these genes, revealed by the (often) drastic effects resulting from their inactivation. Rather surprisingly, however, few spontaneous muta- tions in the homeotic genes, either in humans or mouse, have ever been described (Muragaki et al 1996; Mortlock, Post and Innis 1996) whereas many mutations have been discovered in other developmental genes, such as the pax genes. The second observation is the conservation of the structural organization of the homeotic gene complex in the genome during evolution. From Drosophila to man, the position of a gene inside the complex(es) has remained linked with its level and timing of expression in the embryo. The organization of the homeotic genes violates the particulate, "classical" conception of genes. The supra-genetic organization of homeotic genes might play a role in the control of homeotic gene functioning. Denis Duboule has corre-

32 The Developmental Gene Concept: History and limits lated this structural constraint to the embryological constraint which makes similar the devel- opment of all vertebrate embryos after the onset of the gastrulation phase (Duboule 1994). Are the structural constraints of development located in the structure of the genome, in the linear nature of the DNA molecule? As before, it is difficult not to acknowledge that the work of de- velopmental biologists, their construction of models and theories, is constrained not only by the techniques at their disposal and the conceptual tools they use, but most of all, by the surprising results they obtained.

The conservation of developmental genes has revealed a hidden conservation of the body plans and structures between distantly related living beings. Probably Geoffroy Saint Hilaire was right and Cuvier wrong: the bodies of insects and vertebrates are built according to similar in- structions and differ only by an inversion of the dorso-ventral body axis (Arendt and Nübler- Jung 1994; Jones and Smith 1995; De Robertis and Sasai 1996). In a similar way, the develop- ment of the highly diverse arthropod limbs is underlain by common genetic mechanisms (Pan- ganiban et al 1996). The analysis of the genes involved in the construction of the eyes reveals a community (and relationship) of structures, which was unexpected by biologists. It is admit- ted that eyes have been invented many times during evolution (Salvini-Plawen and Mayr 1977). These eyes can adopt similar structures by convergence (such as the cephalopod and vertebrate eyes). On the other hand, the compound eye of the insects is constructed according to principles different from those involved in the construction of vertebrate or cephalopod eyes. Despite these well established facts, similar, related "master" genes seem to be involved in the construc- tion of these different types of eyes. The modification of one of these genes, belonging to the pax gene superfamily, leads to abnormalities in the formation of eyes in Drosophila (eyeless), human (aniridia) or mouse (small eye) (Quiring et al 1994). Over and ectopic expression of this gene leads to the formation of ectopic eyes in Drosophila (Halder, Callaerts and Gehring 1995a). In this latter experiment, the Drosophila gene can be replaced by the mouse gene, re- sulting in the formation of ectopic ... Drosophila eyes. A possible "rational" explanation for the conserved function of this gene is that it was initially essential for the formation of an ancestral sensitive organ, and only later in evolution recruited as the master gene for eye development (Halder, Callaerts and Gehring 1995b; Deutsch and Le Guyader 1995).

The developmental genes are the tools with which living beings have tinkered during evolution. The characterization of structural proteins in distantly related organisms had already shown that the same proteic components have been used many times during evolution, for different pur- poses. For instance, proteins with enzymatic functions have been recruited to form the crystal- lins which give the eye its transparency (Piatigorsky 1992). The same intracellular signalling pathways are used in C. elegans to regulate the formation of the vulva, in Drosophila to regulate the formation of the photoreceptors, and in mammals to control the rate of cell division (Egan

33 Michel Morange et al 1993). In other cases, the same protein has been used several times to fulfil the same pur- pose: the protein rhodopsin is responsible for light collection in primitive bacteria as well as in the highly organized vertebrate eye, as if, once this molecule was invented, it was used over and over, each time it was needed. The conservation of the developmental genes is perhaps no more significant than the conservation of these proteic components: some specific characteristics of these genes were invented once and used repeatedly during evolution to face similar constraints (Dickinson 1995). In agreement with this hypothesis, it is already known that some of the de- velopmental genes are used at many places and times in an embryo to regulate functionally sim- ilar processes: such is the case, for instance, for sonic hedgehog or even, as we saw previously, for the homeotic genes.

It is tempting to correlate this conservation of developmental genes to the recent results on the formation of metazoans. It was known since Darwin that the formation of metazoans probably occurred very rapidly during the early Cambrian geological age (Darwin 1859). The character- ization of well conserved faunas of this period has confirmed the Cambrian explosion and nar- rowed it in time (Morris and Whittington 1979; Gould 1989). Its existence is supported by some (Philippe, Chenuil and Adoutte 1994) - but not all (Wray, Levinton and Shapiro 1996) - recent phylogenetic reconstructions. The studies of faunas have shown that the diversity (disparity) re- sulting from the Cambrian explosion was greater than the diversity which has been retained by further evolution. The diversity of the living beings which were formed during these crucial times perhaps resulted from the rapid acquisition by the first metazoans of the molecular tools, developmental genes, allowing a multicellular organism to build different and complex struc- tures. The animals, but not the green plants since the developmental genes controlling the ho- meotic transformations in green plants are of a different nature (Weigel and Meyerowitz 1994), would be defined by the existence of these highly conserved developmental genes, which would constitute what J. Slack calls the zootype (Slack, Holland and Graham 1993).

III. PHILOSOPHICAL MEANING OF THE DEVELOPMENTAL GENE CONCEPT

The significance of the discovery of developmental genes remains puzzling. If the developmen- tal genes are really the master genes which are responsible for the formation of body plans and different organs, we can truly hope that their characterization will tell us something about the logic driving the formation of the presently known multicellular organisms.

But it is also possible that the developmental genes are only the molecular components of the "instructions" used to build the organisms: their study will not reveal any principle of construc- tion. The higher organisms would be no more contained, programmed and summarized in these different instructions than a bird made out of folded paper is contained in the different instruc-

34 The Developmental Gene Concept: History and limits tions that one has to follow to build it L. Wolpert, personal communication). Such a "solution" for the problem of embryogenesis probably requires that one attributes a behavioral autonomy to the different cells of an organism. Therefore, building of an organism would be similar to the assembly of a nest by wasp colonies: it results only from the answers of individuals to local con- figurations and is written nowhere, and in particular not in the individual wasps (Theraulaz and Bonabeau 1995). It remains to be seen whether such a molecular description of development will be considered as a satisfactory explanation of the structure adopted by the organisms.

Moreover, the existence of a family of developmental genes appears more and more dubious: if a gene codes for a proteic component involved in the regulation of cell division by growth factors, but also essential for the formation of the vulva in C. elegans or the photoreceptors in Drosophila, can we call this gene a developmental gene? How must we consider a gene in- volved in the control of the most central part of , the regulation of glycogen metab- olism, but which also plays a major role in the definition of the primary morphogenetic axis of the embryo (He et al 1995; Pierce and Kimelman 1995; Welsh, Wilson and Proud 1996)? As we saw, the value of the developmental gene concept is already vanishing to the benefit of the developmental gene pathway concept. Will this new vision better support the wealth of data provided by the tools of molecular biology? Maybe the repeated use of a developmental gene pathway during evolution is simply a higher form of tinkering (Jacob 1977) and, no more than the conservation of individual genes, a mark of homology relationships (Gaunt 1997).

REFERENCES

Acampora, D., Mazan, S., Lallemand, Y., Avantaggiato, V., Maury, M., Simeone, A., and Brûlet, P. 1995. Forebrain and midbrain regions are deleted in Otx2-/- mutants due to a de- fective anterior neuroectoderm specification during gastrulation. Development 121:3279- 3290.

Allen, G. E. 1974. Opposition to the Mendelian-chromosome theory: the physiological and de- velopmental genetics of Richard Goldschmidt. J. Hist. Biol. 7:49-92.

Allen, G. E. 1978. : The man and his science. Princeton: Princeton Uni- versity Press, Princeton, 1978, chapter IX.

Ang, S.-L., Jin, O., Rhinn, M., Daigle, N., Stevenson, L., and Rossant, J. 1996. A targeted mouse Otx2 mutation leads to severe defects in gastrulation and formation of axial meso- derm and to of rostral brain. Development 122:243-252.

Arendt, D., and Nübler-Jung, K. 1994. Inversion of dorsoventral axis? Nature 371:26.

Artavanis-Tsakonas, S., Matsuno, K. and Fortini, M. E. 1995. Notch Signaling. Science 268:225-232.

35 Michel Morange

Baker, W. K. 1978. A genetic framework for Drosophila development. Annu. Rev. Genet. 12:451-470.

Balkaschina, E. L. 1929. Ein Fall der Erbhomöosis (die Genovariation "Aristopedia") bei Drosophila melanogaster. Wilhelm Roux's Arch. Entwicklungsmech. 115:448-463.

Brenner, S. 1984. Nematode research. TIBS 9:172.

Brenner, S., Dove, W., Herskowitz, I., and Thomas, R. 1990. Genes and Development: molec- ular and logical themes. Genetics 126:479-486.

Bridges, C. B. 1917. Deficiency. Genetics 2:445-465.

Bridges, C. B., and Dobzhansky, Th. 1933. The mutant "" in Drosophila melan- ogaster - a case of hereditary homoösis. Wilhelm Roux's Arch. Entwicklungsmech. 127:575- 590.

Burian, R. M., Gayon, J., and Zallen, D. T. 1991. Boris Ephrussi and the synthesis of genetics and embryology. In Developmental Biology, op. cit., pp. 207-227.

Carrasco, A. E., McGinnis, W., Gehring, W. J., and De Robertis, E. M. 1984. Cloning of an X. laevis gene expressed during early embryogenesis coding for a peptide region homologous to Drosophila homeotic genes. Cell 37:409-414.

Carroll, S. B. 1995. Homeotic genes and the evolution of arthropods and chordates. Nature 376:479-485.

Chesley, P. 1935. Development of the short-tailed mutant in the house mouse. J. Exp. Zoöl. 70:429-459.

Cho, K. W. Y., Blumberg, B., Steinbeisser, H., and De Robertis, E.M. 1991. Molecular nature of Spemann's organizer: the role of the Xenopus Homeobox gene goosecoid. Cell 67:1111- 1120.

Crick, F. H. C., and Lawrence, P.A. 1975. Compartments and polyclones in insect develop- ment. Science 189:340-347.

Darwin, C. R. 1859. by means of natural selection. London: Murray.

De Robertis, E. M. 1994. The homeobox in cell differentiation and evolution. In Guidebook to the Homeobox Genes, ed. Denis Duboule, pp. 13-23. Oxford: Oxford University Press.

De Robertis, E. M. 1995. Dismantling the organizer. Nature 374:407-408.

De Robertis, E. M., and Sasai, Y. 1996. A common plan for dorsoventral patterning in Bilateria. Nature 380:37-40.

Deutsch, J., and Le Guyader, H. 1995. Le fond de l'oeil: l'oeil de la drosophile est-il homologue de celui de la souris? Médecines/Sciences 11:1447-1452.

36 The Developmental Gene Concept: History and limits

Dickinson, W. J. 1995. Molecules and morphology: where's the homology? TIG 11:119-121.

Dollé, P., Dierich, A., LeMeur, M., Schimmang, T., Schuhbaur, B., Chambon, P., and Duboule, D. 1993. Disruption of the Hoxd-13 gene induces localized leading to mice with neotenic limbs. Cell 75:431-441.

Dressler, G. R., and Gruss, P. 1988. Do multigene families regulate vertebrate development?. TIG 4:214-219.

Duboule, D. 1994. Temporal colinearity and the phylotypic progression: a basis for the stability of a vertebrate Bauplan and the evolution of morphologies through heterochrony. Develop- ment Supplement:135-142.

Duboule, D. 1995. Vertebrate Hox genes and proliferation: an alternative pathway to ? Current Opinion in Genetics and Development 5:525-528.

Egan, S. E., Giddings, B. W., Brooks, M. W., Buday, L., Sizeland, A. M., and Weinberg, R. A. 1993. Association of Sos Ras exchange protein with Grb2 is implicated in tyrosine kinase signal transduction and transformation. Nature 363:45-51.

Finkelstein, R., and Boncinelli, E. 1994. From fly head to mammalian forebrain: the story of otd and Otx. TIG 10:310-315.

Garcia-Bellido, A., Ripoll, P., and Morata, G. 1973. Developmental compartmentalisation of the wing disk of Drosophila. Nature New Biol. 245:251-253.

Garcia-Bellido, A. 1975. Genetic control of wing disc development in Drosophila. In Cell Pat- terning, CIBA Found. Symposium 29, pp. 161-182. Amsterdam: Associated Scientific Pub- lishers.

Garcia-Bellido, A. 1977. Homeotic and atavic mutations in insects. Amer. Zool. 17:613-629.

Garcia-Bellido, A., Lawrence, P. A., and Morata, G. 1979. Compartments in animal develop- ment. Sci. Am. 241:90-98.

Gaunt, S. J. 1997. Chick limbs, fly wings and homology at the fringe. Nature 386:324-325.

Gilbert, S. F. 1988. Cellular politics: Ernest Everett Just, Richard B. Goldschmidt and the at- tempt to reconcile embryology and genetics. In The American Development of Biology, eds. R. Rainger, K. R. Benson and J. Maienschein, pp. 311-346. Philadelphia: University of Pennsylvania Press.

Gilbert, S. F. 1991a. Induction and the origins of developmental genetics. In Developmental Bi- ology, vol. 7, ed. S. F. Gilbert, pp. 181-206. New York: Plenum Press.

Gilbert, S. F. 1991b. Commentary: cytoplasmic action in development. The Quarterly Review of Biology 66:309-316.

Gilbert, S. F., Opitz, J. M., and Raff, R. A. 1996. Resynthesizing evolutionary and developmen- tal biology. Dev. Biol. 173:357-372.

37 Michel Morange

Gilbert, S. F. Bearing crosses: the historiography of genetics and embryology. In press.

Gluecksohn-Schoenheimer, S. 1938. The development of two tailless mutants in the house mouse. Genetics 23:573-584.

Gluecksohn-Schoenheimer, S. 1940. The effect of an early lethal (t°) in the house mouse. Ge- netics 25:391-400.

Gluecksohn-Schoenheimer, S. 1949. The effects of a lethal mutation responsible for duplica- tions and twinning in mouse embryos. J. Exp. Zool. 110:47-76.

Goldschmidt, R. B. 1938. Physiological Genetics. New York: McGraw Hill.

Goldschmidt, R. 1940. The Material Basis of Evolution. New Haven: Press.

Goodrich, L. V., Johnson, R. L., Milenkovic, L., McMahon, J. A., and Scott, M. P. 1996. Con- servation of the hedgehog/patched signaling pathway from flies to mice: induction of a mouse patched gene by Hedgehog. Genes Dev. 10:301-312.

Gould, S. J. 1989. Wonderful life: the Burgess shale and the nature of History. New York: Norton.

Gruss, P., and Walther, C. 1992. Pax in development. Cell 69:719-722.

Hadorn, E. 1968. Transdetermination in cells. Sci. Am. 219:110-120.

Halder, G., Callaerts, P., and Gehring, W. J. 1995a. Induction of ectopic eyes by targeted ex- pression of the eyeless gene in Drosophila. Science 267:1788-1792.

Halder, G., Callaerts, P., and Gehring, W. J. 1995b. New perspectives on eye evolution. Cur- rent Opinion in Genetics and Development 5:602-609.

Harwood, J. 1993. Styles of scientific thought: the German genetics community, 1910-1933. Chicago: The University of Chicago Press.

He, X., Saint-Jeannet, J.-P., Woodgett, J. R., Varmus, H. E., and Dawid, I. B. 1995. Glycogen synthase kinase 3 and dorsoventral patterning in Xenopus embryos. Nature 374:617-622.

Hogan, B., Holland, P., and Schofield, P. 1985. How is the mouse segmented? TIBS 10:67-74.

Hunt, P., and Krumlauf, R. 1992. Hox codes and positional specification in vertebrate embry- onic axes. Annu. Rev. Cell. Biol. 8:227-256.

Jacob, F., and Monod, J. 1959. Gènes de structure et gènes de régulation dans la biosynthèse des protéines. C. R. Acad. Sci. Paris 249:1282-1284.

Jacob, F., and Monod, J. 1961. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3:318-356.

38 The Developmental Gene Concept: History and limits

Jacob, F., and Monod, J. 1963. Genetic repression, allosteric inhibition and cellular differenti- ation. In Cytodifferential and Macromolecular Synthesis, pp. 30-64. New York: Academic Press.

Jacob, F. 1977. Evolution and tinkering. Science 196:1161-1166.

Jacob F. 1981. Le jeu des possibles. Paris: Fayard.

Jones, C. M., and Smith, J. C. 1995. Revolving vertebrates. Current Biology 5:574-576.

Kauffman, S. A. 1973. Control circuits for determination and transdetermination. Science 181:310-318.

Keller, E. F. 1995. Refiguring Life: metaphors of 20th century biology. New York: Columbia University Press.

Keller, E. F. 1996. Drosophila embryos as transitional objects: the work of Donald Poulson and Christiane Nüsslein-Volhard. HSPS 26:313-346.

Kenyon, C., and Wang, B. 1991. A cluster of Antennapedia-class homeobox genes in a nonseg- mented animal. Science 253:516-517.

Kenyon, C. 1994. If birds can fly, why can't we? Homeotic genes and evolution. Cell 78:175- 180.

Kessel, M., Balling, R., and Gruss, P. 1990. Variations of cervical vertebrae after expression of a Hox-1.1 transgene in mice. Cell 61:301-308.

Kessel, M., and Gruss, P. 1991. Homeotic transformations of murine vertebrae and concomitant alteration of Hox codes induced by retinoic acid. Cell 67:89-104.

King, M.-C., and Wilson, A. C. 1975. Evolution at two levels in humans and chimpanzees. Sci- ence 188:107-116.

Kohler, R. E. 1991. Systems of Production: Drosophila, Neurospora and Biochemical Genetics. HSPS 22:87-130.

Landauer, W. 1944. Length of survival of homozygous creeper fowl embryos. Science 100:553- 554.

Laughon, A., and Scott, M. P. 1984. Sequence of a Drosophila segmentation gene: protein structure homology with DNA-binding proteins. Nature 310:25-31.

Le Mouellic, H., Lallemand, Y., and Brûlet, P. 1992. Homeosis in the mouse induced by a null mutation in the Hox-3.1 gene. Cell 69:251-264.

Lewis, E. B. 1951. Pseudoallelism and gene evolution. Cold Spring Harbor Symposium on Quantitative Biology 16:159-174.

Lewis, E. B. 1963. Genes and developmental pathways. Am. Zool. 3:33-56.

39 Michel Morange

Lewis, E. B. 1978. A gene complex controlling segmentation in Drosophila. Nature 276:565- 570.

Lewis, E. B. 1992. Clusters of master control genes regulate the development of higher organ- isms. J. Am. Med. Assoc. 267:1524-1531.

Li, J.-C. 1927. The effect of chromosome aberrations on development in Drosophila melano- gaster. Genetics 12:1-58.

Lufkin, T., Mark, M., Hart, C. P., Dollé, P., LeMeur, M., and Chambon, P. 1992. Homeotic transformation of the occipital bones of the skull by ectopic expression of a homeobox gene. Nature 359:835-841.

Malicki, J., Schughart, K., and McGinnis, W. 1990. Mouse Hox-2.2 specifies thoracic segmen- tal identity in Drosophila embryos and larvae. Cell 63:961-967.

Matsuo, I., Kuratani, S., Kimura, C., Takeda, N., and Aizawa, S. 1995. Mouse otx2 functions in the formation and patterning of rostral head. Genes and Dev. 9:2646-2658.

McGinnis, W., Levine, M. S., Hafen, E., Kuroiwa, A., and Gehring, W. J. 1984a. A conserved DNA sequence in homoeotic genes of the Drosophila Antennapedia and bithorax complex- es. Nature 308:428-433.

McGinnis, W., Garber, R. L., Wirz, J., Kuroiwa, A., and Gehring, W. J. 1984b. A homologous protein-coding sequence in Drosophila homeotic genes and its conservation in other meta- zoans. Cell 37:403-408.

McGinnis, W., Hart, C. P., Gehring, W. J., and Ruddle, F. H. 1984c. Molecular cloning and chromosome mapping of a mouse DNA sequence homologous to homeotic genes of Droso- phila. Cell 38:675-680.

McGinnis, N., Kuziora, M. A., and McGinnis, W. 1990. Human Hox-4.2 and Drosophila De- formed encode similar regulatory specificities in Drosophila embryos and larvae. Cell 63:969-976.

McGinnis, W., and Krumlauf, R. 1992. Homeobox genes and axial patterning. Cell 68:283-302.

McGinnis, W. 1994. A century of homeosis, a decade of . Genetics 137:607-611.

Monod, J., and Jacob, F. 1961. Teleonomic mechanisms in cellular metabolism, growth and dif- ferentiation. Cold Spring Harbor Symposium on Quantitative Biology 26:389-401.

Morange, M. 1990. Le concept de gène régulateur. In Histoire de la génétique: pratiques, tech- niques, théories, pp. 271-291. Paris: ARPEM et Sciences en situation.

Morange, M. 1996. Construction of the developmental gene concept. The crucial years: 1960- 1980. Biol. Zent. bl. 115:132-138.

Morange, M. History of Molecular Biology, chapter 21. Cambridge (Mass.): Harvard Univer- sity Press. In press.

40 The Developmental Gene Concept: History and limits

Morata, G., and Lawrence, P. A. 1977. Homeotic genes, compartments and cell determination in Drosophila. Nature 265:211-216.

Morgan, T. H. 1934. Embryology and Genetics. New York: Columbia University Press.

Morris, S. C., and Whittington, H. B. July 1979. The animals of the Burgess shale. Sci. Am. 241:110-120.

Mortlock, D. P., Post, L. C., and Innis, J. W. 1996. The molecular basis of hypodactyly (Hd): a deletion in Hoxa13 leads to arrest of digital arch formation. Nature Genetics 13: 284-289.

Muragaki, Y., Mundlos, S., Upton, J., and Olsen, B. R. 1996. Altered growth and branching pat- terns in synpolydactyly caused by mutations in HOXD13. Science 272: 548-551.

Nüsslein-Volhard, C., and Wieschaus, E. 1980. Mutations affecting segment number and po- larity. Nature 287:795-801.

Panganiban, G., Sebring, A., Nagy, L., and Carroll, S. 1996. The development of crustacean limbs and the evolution of arthropods. Science 270:1363-1366.

Philippe, H., Chenuil, A., and Adoutte, A. 1994. Can the Cambrian explosion be inferred through molecular phylogeny? Development Supplement:15-25.

Piatigorsky, J. 1992. Lens crystallins: innovation associated with changes in gene regulation. J. Biol. Chem. 267:4277-4280.

Pierce, S. B., and Kimelman, D. 1995. Regulation of Spemann organizer formation by the in- tracellular kinase Xgsk-3. Development 121:755-765.

Poulson, D. F. 1937. Chromosomal deficiencies and the embryonic development of Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 23:133-137.

Quiring, R., Walldorf, U., Kloter, U., and Gehring, W. J. 1994. Homology of the eyeless gene of Drosophila to the Small eye gene in mice and aniridia in humans. Science 265:785-789.

Raff, R. A., and Kaufman, T. C. 1983. Embryos, genes and evolution: the developmental-ge- netic basis of evolutionary change. New York: Macmillan.

Saha, M. 1991. Spemann seen through a lens. In Developmental Biology, op. cit., pp. 91-108.

Salvini-Plawen, L. v., and Mayr, E. 1977. On the evolution of photoreceptors and eyes. Evol. Biol. 10:207-263.

Sapp, J. 1983. The struggle for authority in the field of heredity, 1900-1932: New perspectives on the rise of genetics. J. Hist. Biol., 16:311-342.

Sapp, J. 1987. Beyond the gene: inheritance and the struggle for authority in genetics. Oxford: Oxford University Press.

41 Michel Morange

Scott, M. P., and Weiner, A. J. 1984. Structural relationships among genes that control devel- opment: sequence homology between the Antennapedia, Ultrabithorax, and fushi tarazu loci of Drosophila. Proc. Natl. Acad. Sci. USA 81:4115-4119.

Shawlot, W., and Behringer, R. 1995. Requirement for Lim1 in head-organizer function. Nature 374:425-430.

Sheperd, J. C. W., McGinnis, W., Carrasco, A. E., De Robertis, E. M., and Gehring,W. J. 1984. Fly and frog homoeo domains show homologies with yeast mating type regulatory proteins. Nature 310:70-71.

Slack, J. M. W., Holland, P. W. H., and Graham, C. F. 1993. The zootype and the phylotypic stage. Nature 361:490-492.

Sordino, P., van der Hoeven, F., and Duboule, D. 1995. expression in teleost fins and the origin of vertebrate digits. Nature 375:678-681.

Spemann, H. 1938. Embryonic Development and Induction. New Haven: Yale University Press.

Theraulaz, G., and Bonabeau, E. 1995. Coordination in distributed building. Science 269:686- 688.

Thieffry, D. L. 1996. E. coli is a model system to study cell differentiation. Hist. Phil. Life Sci., in press.

Waddington, C. H. 1939. Preliminary notes on the development of the wings in normal and mu- tant strains of Drosophila. Proc. Natl. Acad. Sci. USA 25:299-307.

Waddington, C. H. 1962. New Patterns in Genetics and Development, New York: Columbia University Press.

Weigel, D., and Meyerowitz, E. M. 1994. The ABCs of floral homeotic genes. Cell 78:203-209.

Welsh, G. I., Wilson, C., and Proud, C. G. 1996. GSK3: a shaggy frog story. Trends Cell Biol. 6:274-279.

Wilson, A. C., Maxson, L. R., and Sarich, V. M. 1974. Two types of molecular evolution. Ev- idence from studies of interspecific hybridization. Proc. Natl. Acad. Sci. USA 71:2843-2847.

Wilson, A. C., Sarich, V. M., and Maxson, L. R. 1974. The importance of gene rearrangement in evolution: evidence from studies on rates of chromosomal, protein, and anatomical evo- lution. Proc. Natl. Acad. Sci. USA 71:3028-3030.

Wilson, A. C., Bush, G. L., Case, S. M., and King, M.-C. 1975. Social structuring of mamma- lian populations and rate of chromosomal evolution. Proc. Natl. Acad. Sci. USA 72:5061- 5065.

Wilson, A. C., Carlson, S. S., and White, T. J. 1977. Biochemical evolution. Annu. Rev. Bio- chem. 46:573-639.

42 The Developmental Gene Concept: History and limits

Wolpert, L. 1995. Bet on positional information. Nature 373:112.

Wray, G. A., Levinton, J. S., and Shapiro, L. H. 1996. Molecular evidence for deep precambrian divergences among metazoan phyla. Science 274:568-573.

ACKNOWLEDGEMENTS:

We are indebted to Rosemary Sousa Yeh, François Jacob, André Adoutte and Hervé Le Guy- ader for critical reading of the manuscript. We thank Scott Gilbert for providing bibliographic references and interesting suggestions.

Comments by Scott F. Gilbert

Thank you for the privilege of responding to Dr. Morange's paper on the developmental gene concept. Michel has arranged his paper to show the need during the 1960s and 1970s to postu- late the notion of developmental genes as opposed to structural genes. This dichotomy, howev- er, may be breaking down, and he leads us to a bifurcation--either the developmental genes are really the master genes of body formation and once invented, have been conserved across the phyla; or they are merely members of a toolbox with which evolution can reshuffle the specifi- cation of body parts. They are either master regulators or regulated tools. I hope to show that these points of view (a) are not mutually self-exclusive; and (b) to show that they are both nec- essary if animals arose by descent with modification through embryos that must form epigenet- ically at each generation.

I agree with much of what Michel has written here, and I also have the advantage of seeing a preprint of the excellent companion article that will be published in Biologisches Zentralblatt. However, Michel has focused on only one set of postulated regulators--the homeotic (Hom-C/ Hox ) genes. One cannot make comments about regulatory genes using only one case, and a highly specialized case. Just as Drosophila is a very derived insect, so was research on Hox a very special case of developmental genetics. I would claim that it is the exceptional case, and not the usual one. Most developmental genetics research sought the causes of cell differentia- tion. This meant that one looked to see why a given cell became a muscle cell and not a fat cell, a skin cell and not a neuron, a blood cell and not a lymphocyte. The research on Drosophila , segmentation, and homeotic genes was not in this category. It sought the means by which parasegments, segments, and compartments were specified. Each of these paraseg- ments had nerves, blood cells, hypodermis, etc., but arranged differently. The homeotic genes did not concern cell differentiation, they regulated segment identity. So Drosophila research was studying a higher plane of development than most developmental geneticists studying cell differentiation1 (see Emmons, 1996).

I would like to take a step back, and return, with Dr. Morange, to the Jacob and Monod operon. One of the things the operon model did was to inaugurate an incredible research program--the search for the eukaryotic operon. Sol Spiegelman and others had convinced biologists that dif-

1 Christianne Nüsslein-Volhard's address is worthy of noting: The Friedrich-Miescher Laboratorium on Spe- mannstraße. It combines the institutional authority of DNA (the lab named for Mieschner is part of the Max- Planck Institute) and the epigenesis of Spemann. The perfect place for the molecular biology of development!

43 Michel Morange ferentiation was nothing but changes in protein synthesis, and the operon gave the first testable model concerning how different cells made different proteins. (Lewis [1963] used the operon model very differently--as a mechanism for sequential gene activation). Britten and Davidson's 1969 developmental operon hypothesis became one of the most quoted papers in all biology, and it predicted regulatory elements (sensors) in the DNA and diffusible regulators that would bind to them. This model would explain not only differential protein synthesis but also coordi- nated protein synthesis. This program to find the eukaryotic operon can be divided into two main fronts--first, the search for the eukaryotic promoter, and second, the search for eukaryotic regulatory proteins. Both branches would be remarkably successful, and their first sucess was the discovery of MyoD--the "master regulatory gene" of muscle development.

The discovery of MyoD, unlike the discovery of the homeotic genes, was not a surprise. How- ever, it did have its origins from a relatively unappreciated source--somatic cell genetics. This was one of Boris Ephrussi's branchildren, and it flourished in during the early 1970s, meeting an untimely demise by the 1980s (see Burian et al., 1991). Its technique was to fuse different types of cells together and look at the state of differentiation. Most cell types (liver, neurons, melanocytes) lost their differentiated phenotypes when fused with other cells, so some diffus- ible negative regulator was hypothesized for the disappearance of their specific differentiated functions (see Davidson, Ephrussi, and Yamamoto, 1968). If chromosomes were lost that cor- related with the state of differentiation, then regulatory elements would be postulated to exist on those chromosomes (hepatic aminotransferase inducibility--Weiss and Chaplain, 1971; kid- ney-specific esterase-- Klebe et al., 1970). The exception to this rule was the skeletal muscle cell. First, it retained its differentiated state in culture better than other cells; second, proliferat- ing myoblasts that had not yet made contractile protein retained this committment in culture (Konigsberg, 1963); third, when fused with other cells, Blau, Wright, and others (mentioned in Pinney, 1990) found that not only did the muscle cells retain their differentiated state, but they could cause the nucleus of the other cell type to make muscle-specific proteins. There appeared, then, to be a positive regulator of muscle gene transcription. This fit in with other experiments that showed coordinate transcriptional control of muscle gene expression (Devlin and Emerson, 1978; 1979).

In 1984, Konieczny and Emerson showed that the mouse embryonic cell line C3H10T1/2 (gen- erally called the "T-one halfs") could be converted into stable proliferating myogenic, chon- drogenic, or adipogenic cell lines following treatments that inhibited DNA methylation. They predicted that the high rate of myogenic phenotypes resulted from the activation of one or a very few regulatory loci. In 1986, Emerson's laboratory and Weintraub's laboratory both reported that the transfection of C3H10T1/2 cells with cDNA from either cultured muscle cells or from 5-azaC-treated C3H10T1/2 cells would transform the cells into myocytes (Konieczny et al., 1986; Lassar et al., 1986). The Weintraub group (Davis et al., 1987; Weintraub et al., 1989) made cDNA copies of the mRNA, cloned them, and transfected the clones individually into the C3H10T1/2 cells. One of these clones, MyoD, was found to convert the C3HT101/2 cells solely into myoblasts, and at high frequency. Moreover, it converted freshly cultured endodermal gut cells, ectodermal neurons, and other cells as well, into skeletal muscle.

MyoD turned out to be a muscle-cell specific . It controls cell determination and differentiation. It binds to regions of the DNA that predeed several muscle-specific protein- encoding genes. It also binds to its own promoter to retain its own transcription, and it binds to the promoters or enhancers of other muscle-specific transcription factors to activate them, as well. (E-box was discovered through a collaboration between Weintraub's laboratory and 's group, 1989).

44 The Developmental Gene Concept: History and limits

This research program was occuring at the same time as the Drosophila research program. Charles Emerson is a well known muscle developmental biologist; Hal Weintraub--until his re- cent death-- was a major investigator of chromatin structure and transcription factors. Notice that nowhere in this research program was a mutant used.1 This research program was strictly epigenetic. It was based on phenotype analysis--the appearance of muscle contractile proteins. When recombinant DNA became available, it was used to see if the cloned gene encoded a pro- tein capable of changing the phenotype of the cell. Also, although other organisms were found to make MyoD as a muscle-specific transcription factor, that data did not play any major role in forming the research program or (at least initially) strengthening the program.

So now we have two contemporaneous examples, Hox genes and MyoD. Are they "master regulators"? Yes--Hox genes can convert a haltere segment into a wing or an antenna into a leg; MyoD can convert a neuron into a muscle if need be. Are they regulated themselves? Yes--Hox genes such as abd-A are regulated to make the parapods in the abdomi- nal segments of caterpillars (Carrol et al., 1995); Hox genes such as Ubx are regulated tempo- rally to distinguish the third thoracic from the first abdominal segment (Castelli-Gair and Akam, 1996); Hox genes such as the Abd-B-like mab-5 in C. elegans is regulated by the cell lineage the gene is in. In this last case, cell lineage plays a greater role than cell region in deter- mining the gene's expression (Salser and Kenyon, 1996). Similarly, Emerson claims that MyoD is a "Master control gene" and also the most tightly controlled gene in the genome.

So here's a new principle of development: All "master control genes" are under masterful reg- ulation. There can no top of the hierarchy in a life cycle. MyoD is such a powerful protein that the cell must control it at all ectopic times and places so that it is not expressed in the wrong cell or at the wrong time. If it is on even in small amounts, that cell will become muscle. So MyoD is regulated at transcription, RNA processing, and by two post-translational regulators (see Gilbert, 1997). Governors govern the governor. Regulators must be regulated by factors that are themselves both regulated and regulators. Moreover, MyoD regulation works within a field--the limb field or the somite field--because the regulators are soluble proteins coming from outside the cell: BMP-4, Wnt-4, FGFs (Vaidya et al., 1989; Li et al., 1992; Kopan et al., 1994). The basic state of MyoD and the Hox genes is to be inhibited. Like so much in develop- mental biology. Activation is the inhibition of the inhibitor; suppression is the inhibition of the inhibitor of the inhibitor.

So these developmental genes have to be both regulators and regulatees. The things they regu- late and the things that regulate them are part of a pathway. In the end, it is not the conservation of the gene that is important, but the conservation of these developmental pathways that include them. The use of a gene can depend on its context. In one cell, enolase is a glycolytic enzyme, while in the lens cell, it's a structural crystallin. The GSK-3 gene can play a role in the Wnt pathway for fly segmentation or frog neural axis formation, or it can help regulate glycolysis. Beta-catenin can hold cells together as part of the desmosome or it can be a developmentally critical transcription factor (Piatigorsky and Wistow, 1991; He et al., 1995; Schneider et al., 1996). This is to be expected from our knowledge of evolution. As Jacob (1977) noted, nature should use what it has before inventing something new. Proteins have multiple sites. The fact that a gene can be used for different purposes within the body should not be troubling except by those people trying to name the gene. b

1 It would have been extremely difficult to discover MyoD by mutational analysis. There is overlapping redun- dancy in the myogenic bHLH transcription factors and Myf-5 can compensate for the absence of MyoD (Rud- nicki et al., 1993; Wang et al., 1996).

45 Michel Morange

Michel Morange has pointed out a paradox that forces us to acknowledge that evolution is not rational and that neat essentialist catalogings of nature into types of genes and type of proteins has as little to do with nature as the essentialist types of animals did prior to Darwin.

REFERENCES

Britten, R. J. and Davidson, E. H. 1969. Gene regulation for higher cells: A theory. Science 165: 349 - 357.

Burian, R., Gayon, J. and Zallen, D. T. 1991. Boris Ephrussi and the synthesis of genetics and embryology. In S. Gilbert (ed.), A Conceptual History of Modern Embryology. Plenum, New York, pp 207-227.

Castelli-Gair, J. and Akam, M. 1995. How the Hox gene Ultrabithorax specifies two different segments: the significance of spatial and temporal regulation within metameres. Develop- ment 121: 2973 - 2982.

Carroll, S. B., Weatherbee, S. D., and Langeland, J. A. 1995. Homeotic genes and the regulation and evolution of insect wing number. Nature 375: 58 - 61.

Davidson, R. L., Ephrussi, B., and Yamamoto, K. 1968. Regulation of melanin synthesis in mammalian cells as studied by somatic cell hybridization I. Evidence for negative control. J. Cell Physiol. 72: 115 - 127.

Davis, R. L.., Weintraub, H., and Lassar, A. B. 1987. Expression of a single transfected cDNA converts fibroblasts into myoblasts. Cell 51: 987 - 1000.

Devlin, R. B. and Emerson, C. P. Jr. 1978. Coordinate regulation of contractile protein synthesis during myoblast differentiation. Cell 13: 599 - 611.

Devlin, R. B. and Emerson, C. P. Jr. 1979. Coordinate regulation of contractile protein mRNAs during myoblast differentiation. Devel. Biol. 69: 202 - 216.

Emmons, S. W. 1996. Simple worms, complex genes. Nature 382: 301 - 302.

Gilbert, S. F. 1997. Developmental Biology. Fifth edition. Sinauer Associates. Sunderland.

He, X., Saint-Jeannet, J.-P., Woodgett, J. R., Varmus, H. E., and Dawid, I. B. 1995. Glycogen synthase kinase-3 and dorsoventral patterning in Xenopus embryos. Nature 374: 617 - 622.

Jacob, F. 1977. Evolution and tinkering. Science 196: 1161 - 1166.

Klebe, R. J., Chen, T. R., and Ruddle, F. H. 1970. Mapping of a human genetic regulator ele- ment by somatic cell genetic analysis. Proc. Natl. Acad. Sci. USA 66: 1220 - 1227.

Konieczny, S. F. and Emerson, C. P. Jr. 1984. 5-Azacytidine induction of stable mesodermal stem cell lineages from 10T1/2 cels: Evidence for regulatory genes controlling determina- tion. Cell 38: 791 - 800.

46 The Developmental Gene Concept: History and limits

Konieczny, S. F., Baldwin, A. S., and Emerson, C. P. Jr. 1986. Myogenic determination and differentiation of 10T1/2 cells lineages: Evidence for a single genetic regulatory system. Mol. Cell Biol. 29: 21 - 34.

Konigsberg, I. R. 1963. Clonal analysis of myogenesis. Science 140: 1273-1284.

Kopan, R., Nye, J. S., and Weintraub, H. The intracellular domain of mouse Notch: a constu- tively actived repressor of myogenesis directed at the basic helix-loop-helix region of My- oD. Development 120: 2421 - 2430.

Lassar, A. B., Paterson, B. M. and Weintraub, H. 1986. Transfection of a DNA locus that me- diates the conversion of 10T1/2 fibroblasts into myoblasts. Cell 47: 649-656.

Lewis, E. B. 1963. Genes and developmental pathways. Amer. Zool. 3: 33 - 56.

Li, L., Zhou, J., Guy, J., Heller-Harrison, R., Czech, M. P. and Olson, E. N. 1992. FGF inacti- vates myogenic helix-loop-helix proteins through phosphorylation of a conserved protein ki- nase C site in their DNA-binding domains. Cell 71: 1181-1194.

Morange, M. 1996. Consrtruction of the developmental gene concept. The crucial years: 1960 - 1980. Biol. Zentr. .bl 115: 132 - 138.

Piatigorsky, J. and Wistow, G. 1991. The recruitment of crystallins: New functions precede gene duplication. Science 252: 1078 - 1079.

Pinney, D. F., de la Brousse, F. C., and Emerson, C. P. Jr. 1990. Molecular genetic basis of skeletal myogenic lineage determination and differentiation. In Mahowald, A. (ed.) Genetics of Pattern and Growth Control. Wiley-Liss. Pp. 65 - 89.

Rudnicki, M. A., Schnegelsberg, P. N. J., Stead, R. H., Braun, T., Arnold, H.-H. and Jaenisch, R. 1993. MyoD or Myf-5 is required in a functionally redundant manner for the formation of skeletal muscle. Cell 75: 1351-1359.

Salser, S. J. and Kenyon, C. 1996. A C. elegans Hox gene switches on, off, and on again to reg- ulate proliferation, differentiation, and . Development 122: 1651 - 1661.

Schneider, S., Steinbeisser, H., Warga, R. M., and Hausen, P. 1996. b-catenin translocation into nuclei demarcates the dorsalizing centers in frog and fish embryos. Mech. Devel. 57: 191 - 198.

Vaidya, T. B., Rhodes, S. J., Taparowsky, E. J. and Konieczny, S. F. 1989. Fibroblast growth factor and transforming growth factor-b repress transcription of the myogenic regulatory gene MyoD1. Mol. Cell Biol. 9: 3576-3579.

Wang, Y., Schnegelsberg, P. N. J., Dausman, J., and Jaenisch, R. 1996. Functional redundancy of the muscle-specific transcription factors Myf5 and myogenin. Nature 379: 823 - 826.

Weintraub, H., Tapscott, S. J., Davis, R. L., Thayer, M. J., Adam, M. A., Lassar, A. B. and Mill- er, D. 1989. Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of MyoD. Proc. Nat. Acad. Sci. USA 86: 5434-5438.

47 Michel Morange

Weintraub, H., Tapscott, S. J., Davis, R. L., Thayer, M. J., Adam, M. A., Lassar, A. B. and Mill- er, D. 1989. Activation of muscle-specific genes in pigment, nerve, fat, liver, and fibroblast cell lines by forced expression of MyoD. Proc. Nat. Acad. Sci. USA 86: 5434-5438.

Weiss, M. C. and Chaplain, M. 1971. Expression of differentiated function in hepatoma cell hybrids III. Reappearance of tyrosine aminotransferase inducibility after loss of chromo- somes. Proc. Natl. Acad. Sci. USA 68: 3026 - 3031.

48 DECODING THE GENETIC PROGRAM (or, some circular logic in the logic of circularity)

Evelyn Fox Keller

Not long ago, I gave a lecture on the continuing conceptual gaps between genetics and devel- opment, in which I suggested that we still have no adequate theory to explain the emergence of phenotype from genotype, i.e., no adequate theory for developmental biology. Whereupon a prominent biologist in the audience, unmistakably angry, stood up to say, "We most certainly do -- It is called 'Development'!" We laugh, of course. But if he had said "it is called the genetic program," we wouldn't regard his response as quite so much of a joke. Quite simply, the con- cept of a developmental program written in the genome --i.e., of the genetic program -- has come to be widely regarded as a fundamental explanatory concept in developmental biology. Indeed, it is a mainstay of the molecular biology of development, referred to ubiquitously in both the popular and the scientific literature. My fear, however, is that should we ask, What exactly is a genetic program? , we would soon find ourselves in much the same boat as if we had taken "development" as our explanation, confusing 'explanation' with that which is to be explained.

So far as I have been able to determine, the term "genetic program" appears neither in indexes, in dictionaries, nor as a keyword in library data bases. Indeed, it seems not ever to be defined. We have no trouble finding definitions for the individual components -- since 1953, we have come to understand the term 'genetic' as the sequence of nucleotide bases, and even without the help of computer science, Webster defines "program" as "a plan of procedure." But taken as a composite, the term seems somehow not to require definition, as if its meaning were self-evi- dent. Indeed, the meaning of the composite term does not depend on definition, but rather on tacit assertion: the very juxtaposition of the two terms 'genetic' and 'program' brings with it the particular presumption that the developmental 'program' or 'plan of procedure' for development is itself written in the sequence of nucleotide bases. Is this presumption correct? Certainly, it is widely (almost universally) taken for granted, but I want to argue that, unfortunately, it must at best be said to be misleading, and at worst, simply false: To the extent that we may speak at all of a developmental program, or of a set of instructions for development, current research obliges us to acknowledge that these 'instructions' are not written into the DNA itself (or at least, are not all written in the DNA), but rather are distributed throughout the fertilized egg. To be Evelyn Fox Keller sure, the informational content of the DNA is essential -- without it development (life itself) cannot proceed -- but for many developmental processes, it is far more appropriate to refer to this informational content as data than as program (Atlan and Koppel, 1990). Indeed, I want to suggest that the notion of genetic program depends on a fundamental category error.

Development results from the temporally and spatially specific activation of particular genes, which, in turn, depends on a vastly complex network of interacting components including not only the 'hereditary codescript' of the DNA, but also a densely interconnected cellular machin- ery made up of proteins and RNA molecules.1 Necessarily, each of these systems functions in relation to the others alternatively as data and as program. If development cannot proceed with- out the "blueprint" of genetic memory, neither can it proceed without the "programs" embodied in cellular structures the elements of which are, to be sure, fixed by genetic memory, but the assembly of which are dictated by cellular memory.2 As has put it,

The linear sequence of nucleotides in DNA is used by the machinery of the cell to deter- mine what sequences of amino acids is to be built into a protein, and to determine when and where the protein is to be made. But the proteins of the cell are made by other proteins, and without that protein-forming machinery nothing can be made. There is an appearance here of infinite regress..., but this appearance is an artifact of another error of vulgar biology, that it is only the genes that are passed from parent to offspring. In fact, an egg, before fer- tilization, contains a complete apparatus of production deposited there in the course of its cellular development. We inherit not only genes made of DNA but an intricate structure of cellular machinery made up of proteins. (Lewontin, 1992, p.33) Of course, none of this is news, and hardly depends on the extraordinary techniques now avail- able for molecular analysis. Yet, however surprisingly, it is only within the last decade or two that the developmental and evolutionary implications of so called "maternal effects" has begun to be appreciated. Current research is now providing us with the kind of detail about the mech- anisms involved in the processing of genetic data that make the errors of what Lewontin calls "vulgar biology" manifest. Yet, even when elaborated by the kind of detail we now have avail- able, such facts are still not sufficient to dislodge the confidence that many distinguished biol- ogists continue to have in both the meaning and explanatory force of the genetic program. The question I want therefore to ask is twofold: first, when and how did the presumption built into the very term 'genetic program' come to seem so self-evident? And second, what grants it its apparent explanatory force, even in the face of such obvious caveats as those above?

1 For a related but more general critique of the very concept of a "program" for development, see Stent (1985); see also Newman (1988); Oyama (1989); Moss (1992); de Chardarevian (1994); Gaudillere (1994); Perlman (1996). 2 A vivid demonstration of this interdependency was provided in the 1950's and 1960's with the development of techniques for interspecific nuclear transplantation. Such hybrids almost always fail to develop past gastrula- tion, and in the rare cases when they do, the resultant embryo exhibits characteristics itnermediate between the two parental species. This dependency of genomic function on cytoplasmic structure follows as well from the asymmetric outcomes of reciprocal crosses demonstrated in earlier studies of inter-specific hybrids (Markert and Ursprung, 1971, p. 135-7).

50 Decoding the Genetic Program

Let me begin with the first question.

Francois Jacob may have been the first to use the concept; certainly, he contributed crucially to popularizing it. In The Logic of Life, published in 1970, Jacob describes the organism as "the realization of a programme prescribed by its heredity" (p. 2), claiming that, "when heredity is described as a coded programme in a sequence of chemical radicals, the paradox [of develop- ment] disappears" (p. 4). For Jacob, the genetic program, written in the alphabet of nucleotide, is what is responsible for the apparent purposiveness of biological development; it and it alone gives rise to "the order of biological order." (p. 8) He refers to the oft quoted characterization of teleology as a mistress biologists "could not do without, but did not care to be seen with in public," and writes, "The concept of programme has made an honest woman of teleology." (pp. 8-9) Although Jacob does not exactly define the term, he notes that "The programme is a model borrowed from electronic computers. It equates the genetic material of an egg with the mag- netic tape of a computer." (p. 9)

However, equating the genetic material of an egg with the magnetic tape of a computer does not in itself entitle us to regard that material as encoding a 'program'; it might just as well be thought of as encoding 'data' to be processed by a cellular 'program'. Or by a program residing in the machinery of transcription and translation complexes. Or by extra-nucleic chromatin structures in the nucleus. Computers have provided a rich source of metaphors for molecular biology, but they cannot by themselves be held responsible for the notion of 'genetic program'. Indeed, by 1970, other, quite different, uses of the program metaphor for biological develop- ment were already in conspicuous use. One such use was in the notion of a "developmental program" -- a term that surfaced repeatedly through the 1960's, and one that stands in notable contrast to that of a "genetic program".

Let me give an example of this alternative use. In 1965 a young graduate student steeped in information theory and cybernetics teamed up with the developmental biologist Lewis Wolpert to argue for a direct analogy not between computer programmes and the genome, but between computer programs and the egg:

if the genes are analogous with the sub-routine, by specifying how particular proteins are to be made..., then the cytoplasm might be analogous to the main programme specifying the nature and sequence of operations, combined with the numbers specifying the particular form in which these events are to manifest themselves.... In this kind of system, instructions do not exist at particular localized sites, but the system acts as a dynamic whole. (Apter and Wolpert, 1965, p. 257) The following year, Apter published a book length elaboration of the argument (based on his doctoral dissertation in psychology) under the title Cybernetics and Development (1966). This work may well have been the first comprehensive application of automata theory to biological

51 Evelyn Fox Keller development, and, ironically perhaps, it can be seen as an important precursor to current work on genetic algorithms and Artificial Life (of which more later). But it was not unique; during the 1960's, a number of developmental biologists attempted to employ ideas from cybernetics to illuminate development, and almost all shared Apter's starting assumptions (see Keller, 1995, Chap. 3 for examples) -- i.e., they located the program (or "instructions") for development in the cell as a whole.

The difference in where the program is said to be located is crucial, for it bears precisely on the controversy over the adequacy of genes to account for development that had been raging among biologists since the beginning of the century (see Keller, 1995, Chap. 1). By the beginning of the 1960's, this debate had subsided, largely as result of the eclipse of embryology as a disci- pline during the 40's and 50's. Genetics had triumphed, and after the identification of DNA as the genetic material, the successes of molecular biology had vastly consolidated that triumph. Yet the problems of development, still unresolved, lay dormant. Molecular biology had re- vealed a stunningly simple mechanism for the transmission and translation of genetic informa- tion, but, at least until 1960, it had been able to offer no account of developmental regulation, of how different genes come to be activated at different times and different places in the devel- oping embryo.

James Bonner, professor of biology at Cal Tech (and brother of John Tyler Bonner, a develop- mental biologist at Princeton), in an early attempt to bring molecular biology to bear on devel- opment, put the problem well. Granting that "the picture of life given to us by molecular biology ... applies to cells of all creatures," he goes on to observe that this picture:

is a description of the manner in which all cells are similar. But higher creatures, such as people and pea plants, possess different kinds of cell. The time has come for us to find out what molecular biology can tell us about why different cells in the same body are different from one another, and how such differences arise. (Bonner, 1965, p. v) Bonner's own work was on the biochemistry and physiology of regulation in plants, in an insti- tution well known for its importance in the birth of molecular biology (see, e.g., Kay, 1992). Here, in this work, published in 1965, like Apter and a number of others of that period, he too employs the conceptual apparatus of automata theory to deal with the problem of developmen- tal regulation. But unlike them, he does not locate the 'program' in the cell as a whole, but rath- er, in the chromosomes, and more specifically in the genome. Indeed, he begins with the by then standard credo of molecular biology, asserting that "We know that ... the directions for all cell life [are] written in the DNA of their chromosomes" (p. v). Why? An obvious answer is suggested by his location. Unlike Apter and unlike other developmental biologists of the time, Bonner was situated at a major thoroughfare for molecular biologists, and it is hard to imagine that he was uninfluenced by the enthusiasm of his colleagues at Cal Tech. In any case, Bonner's

52 Decoding the Genetic Program struggle to reconcile the conceptual demands posed by the problems of developmental regula- tion with the received wisdom among molecular biologists is at the very least instructive, espe- cially given its location in time, and I suggest it is worth examining in some detail for the insight it has to offer on our question of how the presumption of a "genetic program" came -- in fact, over the course of that very decade -- to seem self-evident. In short, I want to take Bonner as representative of a generation of careful thinkers about an extremely difficult problem who opt- ed for this (in retrospect, inadequate) conceptual shortcut.

From molecular biology, Bonner inherited a language encoding a number of critical if tacit pre- suppositions. That language shapes his efforts in decisive ways. Its principal keywords --prior to the term 'genetic program' -- are "information," "instruction", and "code". Summarizing the then current understanding of transcription and translation, he writes:

Enzyme synthesis is therefore an information-requiring task and ... the essential informa- tion-containing component is the long punched tape which contains, in coded form, the in- structions concerning which amino acid molecule to put next to which in order to produce a particular enzyme. (Bonner, 1965, p.3) To his credit, Bonner recognizes that pre-molecular genetics had already provided language to much the same effect: In lieu of 'information' and 'code', geneticists had used the term "specify", and in lieu of "instruct", the term "supervise" comes ready to hand:

We have in a sense known this since 1941 when Beadle and Tatum showed that for each enzyme ... there is a gene ... which specifies that enzyme. Or , to put it the other way round, that the function of each gene is to supervise the production of a particular kind of en- zyme.(Bonner, 1965, p.4, my italics) Although Bonner fails to make note of the conceptual work performed in the easy transition from 'information' to 'instruction', or from 'specify' to 'supervise', he does clearly recognize that, to date, only the composition of the protein had been accounted for, and not the regulation of its production required for the formation of specialized cells, i.e., cell differentiation remained unexplained. As he wrote,

Each kind of specialized cell of the higher organism contains its characteristic enzymes but each produces only a portion of all the enzymes for which its genomal DNA contains in- formation. (Bonner, 1965, p. 6) But he continues:

Clearly then, the nucleus contains some further mechanism which determines in which cells and at which times during development each gene is to be active and produce its char- acteristic messenger RNA, and in which cells each gene is to be inactive, to be repressed. (Bonner, 1965, p. 6)

53 Evelyn Fox Keller

Two important moves have been made here. On the one hand, Bonner argues that something other than the information for protein synthesis encoded in the DNA is required to explain cell differentiation (and this is his main point), but on the way to making this point, he has placed this "further mechanism" in the nucleus, with nothing more by way of argument or evidence than his "Clearly then". Why does such an inference follow? And why does it follow "clearly"? Perhaps the next paragraph will help:

The egg is activated by fertilization... As division proceeds cells begin to differ from one another and to acquire the characteristics of specialized cells of the adult creature. There is then within the nucleus some kind of programme which determines the property [sic] se- quenced repression and derepression of genes and which brings about orderly develop- ment. (Bonner, 1965, p. 6) Here, the required 'further mechanism' is explicitly called a "program" and once again, it is lo- cated in the nucleus. But this time around, a clue to the reasoning behind the inference has been provided in the first sentence, "The egg is activated by fertilization." This is how I believe the (largely tacit) reasoning goes: If the egg is "activated by fertilization", the implication is that it is entirely inactive prior to fertilization. What does fertilization provide? The entrance of the sperm, of course, and unlike the egg, the sperm has almost no cytoplasm: it can be thought of as pure nucleus. Ergo, the active component must reside in the nucleus and not in the cyto- plasm. Today, the supposition of an inactive cytoplasm would be challenged, but in Bonner's time, it would have been taken for granted as a carryover from what I have called "the discourse of gene action" of classical genetics (Keller, 1995). And even then, it might have been chal- lenged had it been made explicit, but as an implicit assumption encoded in the language of 'ac- tivation", it would almost certainly have gone unnoticed by Bonner's readers as by Bonner himself.

Bonner then goes on to ask the obvious questions:

What is the mechanism of gene repression and derepression which makes possible devel- opment? Of what does the programme consist and where does it live? (Bonner, 1965, p. 6) And answers them as best he can:

We can say that the programme which sequences gene activity must itself be a part of the genetic information since the course of development and the final form are heritable. Fur- ther than this we cannot go by classical approaches to differentiation. (Bonner, 1965, p. 6) In these few sentences, Bonner has completed the line of argument leading him to the conclu- sion that the program must be part of the genetic information, i.e., to the "genetic program". And again, we can try to unpack his reasoning. Why does the of the course of de- velopment and the final form imply that the program must be part of the genetic information? Because -- and only because -- of the unspoken assumption that it is only the genetic material

54 Decoding the Genetic Program that is inherited. The obvious fact -- that the reproductive processes passes on (or transmits) not only the genes but also the cytoplasm (the latter through the egg for sexually reproducing organisms) -- is not mentioned. But even if it were, this fact would almost certainly be regarded as irrelevant, simply because of the prior assumption that the cytoplasm contains no active com- ponents. The conviction that the cytoplasm could neither carry nor transmit effective traces of intergenerational memory had been a mainstay of genetics for so long that it had become part of the 'memory' of that discipline, working silently but effectively to shape the very logic of in- ference employed by geneticists.

Yet another ellipsis becomes evident (now, even to the author) as Bonner attempts to integrate his own work on the role of histones in genetic regulation. Not all copies of a gene (or a ge- nome) are in fact the same: Because of the presence of proteins in the nucleus, capable of bind- ing to the DNA, "in the higher creature, if it is to be a proper higher creature, one and the same gene must possess different attributes, different attitudes, in different cells" (p. 102). The dif- ference is a function of the histones. How can we reconcile this fact with the notion of a "ge- netic program"? There is one simple way, and Bonner takes it -- namely, to elide the distinction between genome and chromosome. The "genetic program" is saved (for this discussion) by just a slight shift in reference: now it refers to a program built into the chromosomal structure -- i.e., into the complex of genes and histones, where that complex is itself here referred to as the 'ge- nome'.

But the most conspicuous inadequacy of the location of the developmental program in the ge- netic information becomes evident in the chapter in which Bonner attempts to sketch out an ac- tual computer program for development; it is called "Switching Networks for Developmental Processes." Here, the author undertakes to reframe what is known about the induction of de- velopmental pathways in terms of a "master program", proposing to "consider the concept of the life cycle as made up of a master programme constituted in turn of a set of subprogrammes or subroutines" (p. 134). Each subroutine specifies a specific task to be performed -- for a plant, his list includes: cell life, embryonic development, how to be a seed, bud development, leaf de- velopment, stem development, root development, reproductive development. Within each of these subroutines is a list of cellular instructions or commands, such as, e.g., "divide tangential- ly with growth"; "divide transversely with growth"; "grow without dividing"; "test for size or cell number"; etc. (p. 137) He then asks the obvious next question: "[H]ow might these sub- routines be related to one another? Exactly how are they to be wired together to constitute a whole programme?" (p. 135) Yet nowhere in the text is this question answered. We might say, fortunately not. For if had been, the answer would have necessarily undermined Bonner's core assumption. To see this, two points emerging from the discussion need to be underscored: First, the list of subroutines, although laid out in a linear sequence -- as if following from an initial

55 Evelyn Fox Keller

'master program', actually constitute a circle, as indeed they must if they are to describe a life cycle. The 'master program' is in fact nothing but this composite set of programs, wired together in a structure exhibiting the characteristic cybernetic logic of "circular causality".

The second point bears on Bonner's earlier question, "Of what does the programme consist and where does it live?" The first physical structures that were built to embody the logic of com- puter programs were built out of electrical networks1 (hence the term "switching networks"), and this is Bonner's frame of reference. As he writes, "That the logic of development is based upon [a developmental switching] network, there can be no doubt." (p. 148) But what would serve as the biological analogue of an electric (or electronic) switching network? How are the instructions specified in the subroutines that comprise the life cycle actually embodied? Given the dependence of development on the regulating activation of particular genes, Bonner reason- ably enough calls the developmental switching network a "genetic switching network". But this does not only not quite answer our question, it actually obfuscates it. The clear implication is that such a network is constituted of genes (and only genes), but in fact, many other kinds of entities also figure in this network, all playing critical roles in the control of genetic activity. Bonner himself writes of the roles played by histones, hormones, and RNA molecules; today, the list has expanded considerably to include enzymatic networks, metabolic networks, tran- scription complexes, signal transduction pathways, etc., with many of these additional factors embodying their own "switches". We could of course still refer to this extraordinarily complex set of interacting controlling factors as a 'genetic switching network' -- insofar, i.e., that the reg- ulation of gene activation remains central to development --but only if we can manage to avoid the implication (an implication tantamount to a category error) that that network is embodied in and by the genes themselves.

Indeed, it is this "category error" which confounds the very notion of a 'genetic program'. If we were now to ask Bonner's question, "Of what does the programme consist and where does it live?", we would have to say, just as Apter intuited long ago, that it consists not of particular gene entities, and lives not in the genome itself, but of and in the cellular machinery integrated into a dynamic whole. In current lingo, it would seem more reasonable to describe the fertilized egg as a massively parallel processor in which 'programs' (or networks) are distributed through- out the cell.2 The roles of 'data' and 'program' here are relative, for what counts as 'data' for one 'program' is often the output of a second 'program,' and the output of the first is 'data' for yet another 'program,' or even for the very 'program' that provided its own initial 'data'. Thus, for some developmental stages, the DNA might be seen as encoding 'programs' or switches which

1 In modern computers such networks are electronic. 2 Supplementing Lenny Moss' observation that a genetic program is "an object nowhere to be found" (1992, p. 335), I would propose the developmental program as an entity that is everywhere to be found.

56 Decoding the Genetic Program process the data provided by gradients of transcription activators, or alternatively, one might say that DNA sequences provide data for the machinery of transcription activation (some of which is inherited directly from the cytoplasm of the egg). In later developmental stages, the products of transcription serve as data for splicing machines, translation machines, etc. In turn, the output of these processes make up the very machinery or programs needed to process the data in the first place. Sometimes, this exchange of data and programs can be represented se- quentially, sometimes as occurring in simultaneity.

When, in the mid 1960's, Bonner, Apter, and others were attempting to represent development in the language of computer programs, automata theory was in its infancy, and cybernetics was at the height of its popularity. During the 70's and 80's, these efforts lay forgotten: cybernetics had lost its appeal to computer scientists and biologists alike, and molecular biologists found they had no need of such models. The mere notion of a 'genetic program' sufficed by itself to guide their research. Today, however, stimulated in large part by the construction of hard-wired parallel processors, the project to simulate biological development on the computer has re- turned in full force, and in some places has become a flourishing industry. It goes by various names -- Artificial life, adaptive complexity, or genetic algorithms. But What is a genetic al- gorithm? Like Bonner's subroutines, it is "a sequence of computational operations needed to solve a problem" (e.g. Emmeche, 1994). And Once again, we need to ask, why 'genetic'? Fur- thermore, not only are the individual algorithms referred to as 'genetic', but "In the fields of ge- netic algorithms and artificial evolution, the [full] representation scheme is often called a 'genome' or 'genotype'" (Fleischer, 1995 (p. 1)). And, in an account of the sciences of Complex- ity written for the lay reader, Mitchell Waldrop quotes Chris Langton, the founder of Alife, as saying:

[Y]ou can think of the genotype as a collection of little computer programs executing in parallel, one program per gene. When activated, each of these programs enters into the log- ical fray by competing and cooperating with all the other active programs. And collective- ly, these interacting programs carry out an overall computation that is the phenotype: the structure that unfolds during an organism's development. (in Waldrop, 1992, p.194) Workers in the field well understand, and if pressed, would readily acknowledge, that the bio- logical analogues of these computer programs are not in fact "genes" (at least as the term is used in biology), but complex biochemical structures or networks constituted of proteins, RNA mol- ecules, and metabolites which often, although certainly not always, executing their tasks in in- teraction with particular stretches of DNA.1 Alife's "genome" typically consists of instructions such as, e.g., "reproduce", "edit", "transport" or "metabolize", and the biological instantiation

1 Executing a task means processing data provided both by the DNA and by the products of other programs -- i. e., by information given in nucleotide sequences, chromosomal structure, gradients of proteins and RNA mol- ecules, the structure of protein complexes, etc.

57 Evelyn Fox Keller of these algorithms are found not in the nucleotide sequences of DNA, but in specific kinds of cellular machinery such as transcription complexes, spliceosomes, and metabolic networks. Why then are they called 'genetic', and why is the full representation called a "genome"? I sug- gest that the primary justification for such terminology is merely that it so readily follows from the usage the term 'genetic program' had already acquired in genetics.

In short, words have a history, and their usage depends on this history, as does their meaning. But history does not fix meaning; rather, it builds into words a kind of memory. In the field of genetic programming, "genes" have come to refer not to particular sequences of DNA, but to the computer programs required to execute particular tasks (as Langton puts it, "one program per gene"); yet, at the same time, the history of the term guarantees that the word "gene" con- tinues to carry its original meaning, even as used by computer scientists. And perhaps most im- portantly, that earlier meaning remains available for deployment whenever it seems convenient to do so.

Much the same can be said for the use of the terms "gene" and "genetic programs" by geneti- cists. I have taken some time in examining Bonner's argument for 'genetic programs', not be- cause his book played a major role in establishing the centrality of this notion in biological discourse, but rather because of the critical moment in time at which it was written and because of the relative accessibility of the key kinds of slippage on which his argument depends. The very first use of the term 'program' that I have been able to find in the molecular biology liter- ature had appeared only four years earlier.1 In 1961, Jacob and Monod, published a review of their immensely influential work on a genetic mechanism for enzymatic adaptation in E. coli. This was the first mechanism for regulation to be identified, and even though pertaining only to the regulation of protein synthesis in a single celled organism that does not undergo devel- opmental differentiation, it was (with the explicit encouragement of the authors) widely ac- claimed as a "resolution" of the paradox of differentiation that had for so long divided embryology from genetics (Monod and Jacob, 1961, p.397). In this work, Monod and Jacob had found a genetic structure which could indeed be characterized as a molecular 'switch', trig- gered by the presence or absence of the product of a particular gene, and they rapidly assimilat- ed all possible regulatory mechanisms to this model. The introduction of the term "program" appears in their concluding sentence:

1 Simultaneously, and probably independently, introduced the notion of "program" in his 1961 arti- cle on "Cause and Effect in Biology" (1961, adapted from a lecture given at MIT on Feb. 1, 1961). There he wrote, "The complete individualistic and yet also species-specific DNA code of every zygote (fertilized egg cell), which controls the development of the central and peripheral nervous system... is the program for the be- havior computer of this individual" (p. 1504).

58 Decoding the Genetic Program

The discovery of regulator and operator genes, and of repressive regulation of the activity of structural genes, reveals that the genome contains not only a series of blue-prints, but a coordinated program of protein synthesis and the means of controlling its execution. (Jacob and Monod, 1961, p. 354) Three decades later, Sydney Brenner refers to the belief "that all development could be reduced to [the operon] paradigm" -- that "It was simply a matter of turning on the right genes in the right places at the right times" -- in rather scathing terms. As he puts it, "Of course, while abso- lutely true this is also absolutely vacuous. The paradigm does not tell us how to make a mouse but only how to make a switch." (Brenner, et al, 1990, p.485)1 And even in the first flush of enthusiasm, not everyone was persuaded of the adequacy of this particular regulatory mecha- nism to explain development.2 Lewis Wolpert, e.g., wrote in 1969: "Dealing as it does with intracellular regulatory phenomena, it is not directly relevant to problems where the cellular bases of the phenomena are far from clear." (1969, p. 2-3) In those days, Wolpert seemed cer- tain that an understanding of development required a focus not simply on genetic information, but also on cellular mechanisms. But by the mid 1970's, even Wolpert had been converted to the notion of a 'genetic program' (see, e.g., Wolpert and Lewis, 1975).

What carried the day? Certainly not more information about actual developmental processes. Rather, I suggest, it was the consonance of this formulation with the prior history of genetic dis- course, fortified both by the rhetorical links forged with the new science of computers and by frequent reassertion by figures of authority. For this work, Jacob's Logic of Life was of key im- portance. To provide support for the concept of "genetic program," Jacob invokes the authority of both Schroedinger and Wiener, managing the transition from past to future metaphors with finesse. "According to , there is no obstacle to using a metaphor 'in which the organism is seen as a message'" (pp. 251-2). And two pages later,

According to Schroedinger, the chromosomes contain in some kind of code-script the en- tire pattern of the individual's future development and of its functioning in the mature state... The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law-code and executive power --or, to use another simile, they are architect's plan and builder's craft all in one. Jacob, 1976, p. 254)

1 As Soraya de Chadarevian points out (1994), Brenner had taken a critical stance toward the use of the operon model for development as early as 1974 (see his comments in Brenner (1974). 2 Or even of the appropriateness of the nomenclature. Waddington, e.g., noted not only that it "seems too early to decide whether all systems controlling gene-action systems have as their last link an influence which im- pinges on the gene itself," but also redescribed this system as "genotropic" rather than "genetic" in order "to indicate the site of action of the substances they are interested in." (1962:23)

59 Evelyn Fox Keller

His direct inference is that "The order of a living organism therefore is based on the structure of a large molecule. ...Heredity functions like the memory of a computer." (254) For Jacob, as indeed, for anyone steeped in the metaphors of pre-molecular genetics, it is a short step from Schroedinger's metaphor of "law-code and executive power" to the new metaphors of "infor- mation and instruction" (or of "codes and programs"). He concludes:

Everything then leads one to regard the sequence contained in genetic material as a series of instructions specifying molecular structures, and hence the properties of the cell; to con- sider the plan of an organism as a message transmitted from generation to generation; to see the combinative system of the four chemical radicals as a system of numeration to the base four. In short, everything urges one to compare the logic of heredity to that of a com- puter. Rarely has a model suggested by a particular epoch proved to be more faithful. (Ja- cob, 1976, pp. 264-5) Yet even Jacob recognizes a problem with the model: Towards the end of this treatise that has been designed to demonstrate that the DNA contains the program needed "to direct the synthe- sis of proteins and guide their organization" (p. 275), he confronts the problem head on: The genetic program is, he acknowledges a rather peculiar kind of program: it is "a program which needs the products of its reading and execution to be read and executed." Indeed, he admits that "the genetic message can do nothing by itself.... Only the bacterium, the intact cell, can grow and reproduce, because only the cell possesses both the programme and the directions for use, the plans and the means of carrying them out." (p. 278) And with even greater clarity, he writes in the final chapter,

The only elements that can interpret the genetic message are the products of the message itself. The genetic text makes sense only for the structures it has itself determined. There is thus no longer a cause for reproduction, simply a cycle of events in which the role of each constituent is dependent on the others. (Jacob, 1976, p. 297) Jacob's integrity, it would seem, has brought him to an impasse. Must he -- with this recognition -- give up on the picture of development as a linear process, beginning with fertilization and culminating in the mature organism, with the program for that process encoded in the genome? Well, no. His explicit acknowledgement of the circularity of the developmental process, here, at the end of his text, is promptly transformed into that proverbial "but" which inevitably antic- ipates the "nevertheless" that follows: this seemingly conclusive acknowledgement immediate- ly segues into a reassertion of the primacy of the genetic program -- a program made special by the very fact that it is written in the linear text of the genome:

Organization could reproduce itself and living organisms emerge, because the complexity of structures in space happened to be generated by the simplicity of a linear combinative system ... (Jacob, 1976, p. 297, my italics)

60 Decoding the Genetic Program

In the end, by anticipating the inevitable objection, Jacob's acknowledgement of a problem which turns out not to be one serves only to strengthen the force of the 'genetic program' as the fundamental explanatory concept; indeed, so strong is it that (at least by implication) it can even 'explain' its own rebuttal. By turning the very logic into a circle, a linear causal scheme can be invoked to explain a cycle of events that has no causal starting point.

Coda

Words matter. They shape the ways we think, and how we think shapes the ways we act. In particular, The use of the term 'genetic' to describe developmental instructions (or programs) encourages the belief even in the most careful of readers (as well as writers) that it is only the DNA that matters, to lose sight of the fact that, if that term is to have any applicability at all, it is primarily to refer to the entities upon which instructions directly or indirectly act, and not of which they are constituted. The necessary dependency of genes on their cellular context, not simply as nutrient but for causal agency, is all too easily forgotten -- in laboratory practice, in medical counseling, and in popular culture. To illustrate, let me close with a recent quote on the future prospects of molecular biology from Harvey Lodish at MIT:

It will [soon] be possible, by sequencing important regions of the mother's DNA, to infer important properties of the egg from which the person develops.... [The information that results] will be transferred to a supercomputer, together with information about the envi- ronment... The output will be a color movie in which the embryo develops into a fetus, is born, and then grows into an adult, explicitly depicting body size and shape and hair, skin, and eye color. Eventually the DNA sequence base will be expanded to cover genes impor- tant for traits such as speech and musical ability; the mother will be able to hear the embryo -- as an adult -- speak or sing. (Lodish, 1995)

REFERENCES

Apter, M. J. 1966. Cybernetics and Development. Oxford: Pergamon Press.

Apter, M. J. and Wolpert, L. 1965. Cybernetics and Development. J. Theor. Biol. 8:244-257.

Atlan, H. and Koppel, M. 1990. The Cellular Computer DNA: Program or Data. Bulletin of Math. Biol. 52(3):335-348.

Bonner, J. 1965. The Molecular Biology of Development. Oxford: OUP.

Brenner, S. 1974. New Directions in Molecular Biology. Nature 248:785-787.

Brenner, S., Dove, W., Herskowitz, I., and Thomas, R. 1990. Genes and Development: Molec- ular and Logical Themes, Genetics 126:479-486. de Chardarevian, S. 1994. Development, Programs and Computers: Work on the Worm (1963- 1988). Studies in History and Philosophy of Biological and Biomedical Sciences 29:81-105

61 Evelyn Fox Keller

Emmeche, C. 1994. The Garden in the Machine. Princeton: Princeton University Press.

Fleischer, K. 1995. A Multiple-Mechanism Developmental Model for Defining Self-Organiz- ing Geometric Structures, Ph. D. Thesis, CIT.

Gaudilliere, J.-P. Regulation of Information: The Rhetoric and Practice of Molecular Biology in and out of the Pasteur Institute. Unpubl. mss.

Jacob, F. 1976. The Logic of Life. N.Y.: Vanguard. (Orig. publ in French, Editions Gallimard, 1970).

Jacob, F. and Monod, J. 1961. Genetic Regulatory Mechanisms in the Synthesis of Proteins. Journal of Molecular Biology 3:318-56.

Kay, L. 1992. The Molecular Vision of Life. Oxford: OUP.

Keller, E. F. 1995. Refiguring Life. New York: Columbia Univ. Press.

Lewontin, R. 1992. The Dream of the Human Genome. New York Review of Books. May 28, pp. 31-40.

Lodish, H. 1995. Science 267:1609.

Markert, C. L. and Ursprung, H. 1971. Developmental Genetics. Englewood Cliffs, N.J.: Pren- tice-Hall.

Mayr, E. 1961. Cause and Effect in Biology. Science 134:1501-1506.

Monod, J. and F. Jacob 1961. General conclusions: Teleonomic mechanisms in ceullular me- tabolism, growth, and differentiation, Cold Spring Harbor Symposia Quantitative Biology. 26:389-401.

Moss, L. 1992. A Kernel of Truth? On the Reality of the Genetic Program. Philosophy Science Association 1:335-48.

Newman, S. A. 1988. Idealist Biology. Perspectives in Biology and Medicine 31(3):353-68.

Oyama, S. 1989. and the central dogma: Do we need the concept of genetic program- ming in order to have an evolutionary perspective?. In Gunar, M. R. and Thelen, E. (eds.) Systems and Development. Hillside, N J: Lawrence Erlbaum Assoc. Pp. 1-34. .

Perlman, R. L. 1996. Targeted gene disruptions provide new insights into regulatory mecha- nisms in developing organisms. unpubl. mss.

Stent, Gunther S. 1985. Hermeneutics and the Analysis of Complex Biological Systems. In D. J. Depew and B. Weber (eds.) Evolution at a Crossroads. Cambridge, MA: MIT Press. Pp. 209-225

Waldrop, J. M. 1992. Complexity. N.Y.: Simon and Schuster.

62 Decoding the Genetic Program

Wolpert, L. 1968. Positional Information and the Spatial Pattern of Cellular Differentiation. Journal Theoretical Biology 25 1-48.

Wolpert, L. 1969. Positional Information and the Spatial Pattern of Cellular Differentiation. Journal Theoretical Biology 25 1-48.

Wolpert, L. and Lewis, J. H. 1975. Towards a theory of development. Federation Proceedings 34 14-20.

Comments by Thomas Fogle

Evelyn wishes to address two related questions. "When and how did the presumption built into the very term genetic program come to seem so self-evident? Second, "What grants it its ap- parent explanatory force...?" She presents a context for why the notion of a genetic program is inadequate. through the work of Atlan and Koppel who point out that DNA can, for many bio- logical processes, be considered as data. Through an analysis of the rhetoric of James Bonner and Francois Jacob, she unpacks inconsistencies residing in the central claims for a biological program. The conclusion which implicitly flows from the language of "programs" in genetics is that cells are directed by instructions from DNA. The model of a genetic program located in DNA, with its presumption of primacy, is ill suited to address the dynamics of cellular interac- tions. Evelyn effectively demonstrates the unsuccessful struggle by Bonner and Jacob to ade- quately reconcile these difficulties. She sees the "explanatory force" for a genetic program as misguided and certainly not self-evident. There is a considerable undercurrent of associated meanings to many terms in genetics and it matters that we focus attention on what they mean and what they imply. The other question Evelyn addresses, on the reason genetic programs seeming so "self-evident", requires a broader vision. Tracking the origin and development for the concept for a biological "program" in genetics is an important task. The conceptual frame- work promoted by geneticists to abstract biological details into a coherent picture of living sys- tems has had a powerful influence on social and political issues throughout this century. Evelyn wishes to remind us of the "dependency of genes on cellular context." The success by practi- tioners of molecular genetics to address problems of biological or biomedical interest, reinforc- es the "genetic program" as the operating guide to understanding the process of development. The quote from Harvey Lodish is a remarkable statement of faith on the ability of reductive sci- enceto reassemble empirical evidence into a completely comprehensive vision of the biology of living systems. When I came across that statement in Science, I was taken aback as well, but not at all surprised, for many biologists attempt to rhetorically knit empirical outcomes together to reform a whole. Metaphors such as "genetic program" or "genetic blueprint" are a reflection of this thinking. No self-respecting geneticist, such as Bonner or Jacob or Lodish, claims that the DNA is solely required for a causal explanation of cellular processes. They simply leave the rest of the protoplasm in the backdrop, a supporting character in the living drama of func- tionally integrated systems. Center stage is the protagonist, the DNA, manipulating and control- ling cellular events.

But why is it this way? Why not promote the genome as an integrated, networked system of mutual causation between the genotype and cellular architecture? Why has the genetic system acquired such primacy? Within the genetics community, the linearity of causality was not al- ways promoted in this fashion. I would like to step back historically and lay out a few additional

63 Evelyn Fox Keller points that relate to Evelyn's question about when this linear mode of thinking became self-ev- ident.

Genetics and embryology quickly went their separate ways in the decades following the redis- covery of Mendel's work. Genetics focused on the search for genic units, their function, and, ultimately their material construction. Embryology was influenced by the importance of cellu- lar induction, cellular interactions, and the developmental fates of tissues and organ systems. Although it's true that the discipline of embryology had faded considerably by the 1950's, their intellectual successors, the developmental biologists, such as Paul Weiss and others, located causality simultaneously at many levels of biological organization. But their discipline and their influence on elaborating a general model for development did not cross fertilize to a great extent with geneticists working in the 50's and early 60's. Each group extended the thrust of their separate intellectual heritages that had begun to split so many years before. The context for the tradition among geneticists, the ones responsible for the popularization of the genetic program, requires closer scrutiny.

Since its beginnings, genetics has always been very atomistic, but its interpretation of the causal contribution of genetics on development evolved into a surprisingly balanced perspective. In the years immediately prior to the Watson and Crick model, the chromosome was seen as a fixed entity that changed only when something abnormal occurred, such as a gene mutation or chromosome rearrangement. The one-gene one-enzyme model provided a glimpse into the po- tential biochemistry of gene action. Genes individually expressed their products and regulation of the genes themselves operated through the protoplasm back to the individual genes. The pre- vailing hypothesis was that the cellular environment acted on the activity of genes, rather than differentia activation in different stages of differentiation. For example, Snyder in 1947 states:

...we may... assume that initial differences in the protoplasm of the egg ( which are known to exist) may affect the activity of the genes, which in turn affect the protoplasm, thus ini- tiating and perpetuating a series of reciprocal reactions by means of which gradual differ- entiation occurs. (p. 357)

Here we see a cycle of cause and effect by both the gene and it cellular system that places ge- netic information in balance. With little molecular detail with which to articulate a developmen- tal mechanism, genes do not stand out in stark relief as the protagonist. The deterministic role of genes to produce traits may have been taken for granted by many, but their control, regula- tion, and biological context at the cellular level was viewed as interactive.

But things would soon change. In 1954, one year after the publication of the model of the dou- ble helix, Francis Crick published a paper in Scientific American that signals a shift in the bal- ance toward DNA as casually primary. Tucked into a detailed explanation of the chemistry of DNA, Crick rhetorically asks how DNA can exert it hereditary influence. His response is re- markable. "A genetic material must carry out two jobs: duplicate itself and control the develop- ment of the rest of the cell in a specific way." (p. 60) For Crick, the delicate causal circle is already beginning to look directional. He chooses to use the term "control", almost anticipating the Central Dogma model of information flow that would come a few years later. Two sentenc- es later he explains that he suspects the DNA sequence "acts as a kind of genetic code." He then attempts to convey the magnitude of information packed in DNA as follows:

If we imagine that the pairs of bases correspond to the dots and dashes of the Morse code, there is enough DNA in a single cell of the human body to encode about 1,000 large text- books. (p. 60)

64 Decoding the Genetic Program

By itself, one might interpret this sentence as simply a statement of scale. DNA has the poten- tial to support a great deal of information in its structure. But taken together with Crick's pre- vious statements that DNA "controls" development, and the code lies within DNA in a simple structure of dots and dashes, we might readily conclude that the book is a metaphor for instruc- tions, not just information. Crick has already outlined the framework for a genetic program by 1954.

Several years later, at a conference in London in 1957, Crick presents his views on the impor- tance of genetic information, again hypothesizing a model that is the Central Dogma. The take home message is that genetic information flows from DNA to proteins but not in the reverse direction. Francois Jacob, who attended the talk, claims in his autobiography that this point made a strong impression on him. A year later, he would make a startling proposal of his own. While mulling over his research data on the genetic regulation of viruses in bacteria and viruses, he has a sudden moment of realization. The interpretation of his empirical data, which had proved refractory until then, became clear. The control of the phage and bacterial systems un- der his investigation could both be explained by regulating the DNA itself. That DNA could have a sites of self-regulation was an unusual proposal because the genome was presumed to be static.

When the Jacob and Monod model of the lactose operon was published in 1961, its widespread implications for gene regulation quickly spread through the genetics community. Part of what made it appealing was the presentation of the model as a "switching" diagram that resembled adiagram for an electrical circuit. The simplicity and clarity of the system made it easy to gen- eralize. By the mid 1960s, a number of attempts were made to model specific genetic systems. For example, the alpha and beta hemoglobin locus in eukaryotes was proposed to undergo co- ordinate regulation in a manner analogous to the . The operon was, in effect, the only detailed model of regulation at the level of gene action, and therefore rife for speculation about the general mechanism for gene regulation.

The genetic program metaphor took root along with the model and by 1965, the genetics com- munity, not just those steeped in molecular biology like Bonner, gave it considerable credence. One can see the seeds of transition in genetics texts published at this time. Srb, Owen, and Edg- er was in its second edition in 1965 and had retained a very classical style from the first edition published in 1952. In the opening introduction of the chapter on the role of genes in develop- ment, on can see vestiges of the pre-molecular style of thinking:

From the viewpoint of gene action in development, epigenesis must mean that different genes take effect at different times and places, and that their effects must depend in part on what has gone before. It all begins with the egg. (p. 353) Here, selective gene expression is advanced, a step forward from the pre-molecular era. The sense of flow between the DNA and the cell system is maintained. But twelve pages later lies a sub-section on the operon model that concludes:

The operon model suggests a way in which much of the orderly sequence of development, the 'turning on and off' of genes in space and time, could in fact be encoded in the genome, much like a computer program. (p. 365) Toward the end of the 1960's, additional molecular details were in place that seemed to demand accounting in any gene regulatory scheme. The entire story of the colinearity of genes and pro- teins and the revelation of the genetic code were advanced. This collection of molecular devel- opments boosted the primacy of the gene and, with it, the need for a gene regulatory scheme to

65 Evelyn Fox Keller tie it all together. The genes of eukaryotes, unlike prokaryotes, are dispersed in the genome. To incorporate these unique features, Britten and Davidson (1969) proposed a speculative cir- cuit diagram of their own that was intended to outline a mode of regulation for eukaryotes to parallel what the operon accomplished for prokaryotes. The web of possible interactions was noticeably greater than for the operon, fitting the assumption that the eukaryotic program was more complex.

The Britten and Davidson model was widely cited and, in conjunctio with the growth of empir- ical studies in genic expression, gave renewed legitimacy to the genetic program metaphor. Perhaps this is where Evelyn and I differ. I see the progression of intellectual achievement from the start of the molecular era as building, contributing, and steadily reinforcing the conceptua- lization of a program into present day. Once the precursor for the genetic program was set by the Central Dogma, namely that information moves out from the DNA, the formal concept of a genetic program was an easy conceptual jump. The reinforcement of the idea was distributed widely because it fit the times, not because a few central players repeatedly claimed it to be true. Evelyn seems to be saying that Jacob himself is one of the central reasons that the program met- aphor has such deep roots and long legs into modern times. I contend that little would have changed had Jacob never written the Language of Life in 1970. His publication in 1961 had already planted the seed for the program metaphor, there was little need to water it nine years later when it had already fully flowered.

As an interesting sidelight, Barbara McClintock instantly recognized the parallels between her own work with corn, already published in detail some years earlier, and that of the bacterial op- eron. She even published a paper on the comparison in 1961, repeatedly describing each exam- ple as a "control system" and never used the word "program". She does note, however, that it is likely that some basic means of control similar to those in corn and bacteria operates in higher organisms. One might justly ask why it was that the operon story was overwhelmingly cited in the 1960's and beyond as the exemplar for how genetic regulation might occur in higher organ- isms while McClintock's system was not. There may be a number of answers to that question, but I suspect that part of it is that the operon system came at the right time, following on the heels of Crick's emphasis on information flowing out from DNA. And it probably should not be overlooked that the powerful imagery of the circuit diagram of the operon grabbed consid- erable attention.

Surely one of the tasks that needs ongoing attention is the meaningful articulation of the role of genetic information in biological systems. Through Evelyn's work, we received a rationale for continuing that discussion and a clear articulation of the difficulties that plague attempts to pro- vide a causal claim for the priority of genes. What is particularly valuable is Evelyn's demon- stration that the program metaphor prevails despite the off-handed caveats of geneticists that, of course, the cellular system matters too. As she states, words matter. And there is urgent need to challenge some of the words bandied about these days.

66 Decoding the Genetic Program

REFERENCES

Britten, R.J., and E.H. Davidson, 1969, Gene regulation for higher cells: A theory. Science 165:349-357

Crick, F.H.C., 1954. The structure of the hereditary material. Scientific American 191 (4):54-61

Jacob, F. 1988. The Statue Within. English translation. Basic Books, Inc.: New York.

McClintock, B. 1961. Some parallels between gene control systems in maize and in bacteria. American Naturalist 95:265-277.

Srb, A.M., R.D. Owen, and R.S. Edger, 1965. General Genetics. 2nd ed. W.H. Freeman and Company: San Francisco.

67

THE DISSOLUTION OF PROTEIN CODING GENES IN MOLECULAR BIOLOGY

Thomas Fogle

INTRODUCTION

The gene concept undergoes continuous transformation to accommodate novel structures and modes of action. A little more than a decade after the rediscovery of Mendel's work in 1900, new analytical strategies emerged for mapping loci in a linear array on a chromosome. During the 1940's, the one gene-one enzyme model connected genes to a specific cellular product, the precursor to the science of molecular genetics. In the years that followed, the gene underwent further change. First, the double helix model of DNA made famous by Watson and Crick re- vealed the physical structure for particulate inheritance. Later efforts clarified the biochemistry of gene expression. Today, in the era of genomic sequencing and intense effort to identify sites of expression, it seems self-evident that the element of heredity sought is a gene. Ironically, the sharper resolving power of modern investigative tools make less clear what, exactly, is meant by a molecular gene.

The legacies of particulate inheritance, localization through mapping and the Central Dogma, shape current perceptions of the gene. Although the empirical details are much elaborated to- day, molecular genes retain an imprint from the past. In a previous paper (Fogle 1990) I ana- lyzed the difficulty with continued attempts to bridge the gap between the Mendelian gene as a "unit of inheritance" and molecular genetics. Text-style definitions strain to find coherence when they incorporate language from both eras. Generic definitions, and hence what I termed "generic" genes, lack internal consistency.

Here, I view the problem through a different lens. The identification of a molecular gene does not stem from definitions. It is a methodological process. Genes are recognized by formally or informally comparing elements of structure, expression, and function to those previously doc- umented. Properties and physical elements for the molecular gene concept have broad social acceptance. For example, detection of an RNA product serves as strong evidence that a site of transcription, a gene, acts to generate the RNA. RNA is one component from a collection of consensus features found commonly among well described genes.

The criteria necessary to anoint new genes require research programs to adopt a community structure that places value on particular chemical states, events, and conditions while accepting considerable flexibility on how to apply them. Flexibility is essential because the large (and growing) array of molecular conjunctions prevents a strict application of rules for the molecular Thomas Fogle characterization of a gene. The need to bring a set of empirical results in line with other claims for genes forces research programs to emphasize different features in different situations or for different purposes. Molecular genes, then, are best understood as a general pattern of biochem- ical architecture and process at regions that actively transcribe, the product of an on-going de- velopment of consensus building for the molecular gene concept in the face of rapidly changing empirical evidence. Hence, I term this shared interpretation to be a "consensus" gene.

At present, there is strong momentum to absorb new molecular revelations into the consensus gene rather than effect a more fine-grained description of molecular parts and processes. The problem is analogous to that of evaluating when a related group of organisms should be clus- tered into one taxonomic group or splintered into several. The outcome, sometimes conten- tious, rests on the analysis of shared characters in relation to established taxa. A desirable outcome is to achieve a widely accepted taxonomic solution for the purpose of efficiently char- acterizing the biology of that group. In taxonomy, lumping different elements into a single tax- on may impede deeper biological/evolutionary insights. Similarly, forcing diverse molecular phenomena into a single Procrustean bed, i.e., the gene, implies that genes have a universal con- struction. Therefore, the gene as a molecular vehicle for causation is an ambiguous referent. I explore the difficulties arising from the embrace of the consensus gene and discuss heuristic limitations of the gene concept.

WHAT IS A MOLECULAR GENE?

My objective is to critically examine the contemporary view of the molecular gene. Molecular genes act through production of RNA products which may or may not be translated into polypeptides. Function and structure are inseparable. Even when genes are identified strictly from physical read outs of the DNA sequences, functional significance is inferred by analogy to more fully characterized loci that have similar organizational motifs. For example, detection of a common promoter sequence known as a TATA box, a binding site for the enzyme neces- sary to initiate transcription, signifies a site of expression. By inference, the neighboring DNA sequence harbors the potential to code for one or more functional products. Hence, the TATA box is a structural component, a consensus feature, contained within a gene. A consensus se- quence in the parlance of molecular biology is one that is highly conserved in evolution. Here, I use the term to mean one that is conserved in methodological renditions about molecular ev- idence.

It is important to point out that the consensus gene is an abstraction of molecular detail, a so- cially generated model for what a gene is supposed to be, formed through the expected parts and processes that empiricists associate with it. In fact, genes are identified by seeking a fit us-

70 The Dissolution of Protein Coding Genes in Molecular Biology ing the empirical evidence at hand, against the backdrop of an idealized construct. The consen- sus gene is the claim for a gene using an appeal to a stereotypical type that has broad acceptance. The process allows different entities with shared properties to each be a gene.

A consensus gene, in its stereotypical format, is a localized segment of DNA that forms the tran- scribed region. Additional nucleotide strings (elements) may reside externally or internally with respect to the site of transcription. Elements are essential DNA domains for gene activa- tion and regulation. Among other roles, domains bind RNA polymerase, the enzyme that cop- ies one of the two strands of DNA to form a complementary sequence of RNA. Eukaryotic cells can process the newly formed messenger RNA by cutting and removing internal sections known as introns. Most eukaryotic genes have introns, sometimes several dozen. Coding re- gions, termed exons, are spliced into a contiguous piece of mature RNA that is ready for trans- lation into a polypeptide at a ribosome. Bordering the coding region, or open reading frame (ORF) of the mature RNA, is an untranslated leader sequence at one end and a trailer sequence at the other. Start and stop codons flank the coding message. Many genes undergo alternative splicing, uniting combinatorial subsets of exons. Alternative spliced messenger RNAs can en- code different protein products.

At first glance, it would seem easy to define a gene from structural and physiological properties. The consensus gene, a summary of the cellular route to expression, implies a high degree of uni- formity among genes. However, no one description embodies all the molecular genes claimed by empiricists (see also Kitcher 1982; Falk 1986; Fogle 1990; Carlson 1991; Portin 1993). Therefore, it is impossible to retreat to abstraction about genes without masking the diversity within. The consensus gene is a framework, not a full elaboration of detail. To what extent does an outline of its principle components and interactions generalize?

An overview of problems with molecular characterization follows. I will show that consensus mode of molecular biology struggles uncomfortably to unite disparate phenomena under one banner, the gene.

GENES AND THEIR PRODUCTS

Many molecular genes code for more than one polypeptide. One way this can occur is if the gene has sliding edges.

Some genes have two or more staggered promoter sites. Alternative promoters encode polypeptides that differ at the front end but not the terminus. The human dystrophin gene has at least seven promoters; each regulates expression in a tissue specific manner, leading to pro-

71 Thomas Fogle duction of polypeptides that vary markedly in size. Each variant promoter generates a unique product of expression. Each localizes and acts through the usual transcription and translation steps. The many products stem from one gene, not a set of different genes that share many parts.

The relative location of the variant region does not change the outcome. The calcitonin-CGRP locus in mammals provides an example of multiple domains positioned at the other end of the locus. Alternative polyadenylation signals, a short sequence of nucleotides that specifies the position of the end of the transcript, cause multiple coding sequences. Like the dystrophin gene described above, the calcitonin-CGRP gene is a site of multiple gene products. In deference to the Mendelian tradition, there is strong resistance among geneticists to subdivide a region into multiple genes when the variant products share some functional relatedness and occupy a single site. Functional evaluation and localization are strongly embedded consensus features.

Despite the two different routes for multiple polypeptide products at one locus, the result is the same; strong emphasis is placed on localization of a transcription site to tolerate fuzzy borders that alter internal coding sequences.

In addition to sliding edges on the transcript, multiple products can be formed by alternatively spliced messenger RNA molecules. Many examples of combinatorial splicing among subsets of exons are known (see Hodges and Bernstein 1994). The splicing pattern generates multiple polypeptides. Even when alternatively spliced products are functionally very different, the lo- cus retains its status as one gene.

Despite their differences, these examples share much of their biochemistry in common. The relationship between DNA coding and polypeptide formation occurs through a recognizable and common set of events. The continuity of pattern and mode binds production of many prod- ucts under one linguistic construct, the gene. The embedded familiarity reinforces the central framework of the consensus gene.

The biochemistry of inside out genes (Tycowski, Mei-Di and Steltz 1996) are remarkably un- usual, subverting the basic tenet of the consensus gene that spliced exons contain coded infor- mation and introns are non-functional. The transcript of the U22 snoRNA host gene is processed to remove the introns (nine in the human form and ten in the mouse form) and splice the exons into a segment of mature RNA. Eight of the introns are functional RNA constituents of the nucleolus, a nuclear structure that participates in the assembly of the ribosome. The spliced RNA has no protein coding regions. Unlike all other genes studied to date, processed introns are functional and spliced exons are not. The structure of the gene is otherwise unre-

72 The Dissolution of Protein Coding Genes in Molecular Biology markable, having a promoter similar to those of ribosomal-protein genes. Inside out genes share some features with sites of transcription that produce RNA products that do not translate, and protein coding genes that transcribe and do translate.

The inside out gene retains nearly all the structural and biochemical events of protein coding genes except that it does not enter into translation and it produces many types of functional RNA. It localizes and it produces functional products, but it is a different kind of gene, recog- nized through consensus features. The uniqueness of inside out genes widens the biochemical modes of expression attributed to the molecular gene. As the consensus gene accommodates new molecular events for protein coding genes the biology of the gene becomes more obscure.

SOLUTIONS TO THE ONE LOCUS – MULTIPLE PRODUCT DILEMMA?

The molecular revelations from multiple products and biochemical novelties suggest two alter- native solutions. Either enlarge the constellation of biochemistry that belongs within a gene concept or propose narrower guidelines for genic ascription. Even prior to the discovery of in- side out genes, there was no agreement in the literature on whether multiple functional products from a localized segment of DNA should be considered more than one gene. Lewin (1995 and earlier editions) argued that we can reverse the usual statement "one gene-one polypeptide" to "one polypeptide-one gene". He is emphatic in stating that these are overlapping' or alterna- tive' genes.

Lewin's claim is a re-evaluation of the meaning of the gene, yet he is uncommitted to pursuing its implications or upsetting the current paradigm. The implications to the molecular genetics community are substantial. Taken at face value, Lewin's proposal would require a revision of the nomenclature system for thousands of loci as a consequence of his call for a more refined relationship between functionality and a locus. It would also profoundly influence estimates of gene number for humans and most other eukaryotes. Lewin does not discuss either the meth- odological or ontological consequences. He is clearly ill at ease with the consensus gene which readily accepts multiple products. I suspect that he is applying a bandage to a problem, one that he considers worthy of further reflection, but not one that he takes too seriously.

A more widely held perspective is that polypeptide "isoforms" originate from a single gene (for example, Strachen and Read 1996). For those cases in which polypeptides are very different, an indicator of functional divergence, some authors recommend subdividing a locus into sepa- rate genes (Alberts et al 1994). How different do the polypeptides have to be to split the locus into more than one gene? Molecular biologists do not quantitatively evaluate polypeptide di- vergence for this purpose. Like Lewin's call for gene splitting of alternatively spliced RNA

73 Thomas Fogle products, the recommendation to discriminate types using the polypeptide and/or function is an ad hoc solution to situations that do not fit a one gene-one product model. The solution is of- fered more as a helpful suggestion than a committed proposal for change of the gene concept.

Defining genes by working backwards from the polypeptide is a slippery venture. Many polypeptides undergo post-translational modifications into a functional form. Conventionally, genic identity correlates with the primary product of translation; post-translational changes in structure are secondary effects of cytoplasmic interactions with the polypeptide. If function be- comes a dominant criterion for the task of mapping the locus, as Strachen and Read recommend, then translation is no longer critical as a boundary condition. This is not the intended conse- quence of the proposal. Their hope is to clarify the parameters for a gene. Instead, they ex- panded the possible interpretations for the site of the gene locus. As we will see, the consensus gene is their salvation. By advancing the importance of function and only imposing it as a tool for evaluating loci when needed, they can effectively sidestep the problems that result if one hardens the rules and applies them to every case.

A variety of post-translational modifications have been documented. After translation, some polypeptides, particularly neuropeptides or hormones, subdivide by proteolytic cleavage. These polyproteins are consistently regarded as products of one gene, whether or not they cleave into identical or divergent forms. For example, the locus for the alpha factor regulates mating behavior in yeast (Fuller, Brake and Thorner 1986). The transcript encodes a translated polypeptide clipped into four identical peptides. In contrast, a locus in silkworms produces five functionally distinct products (a diapause hormone, pheromone biosynthesis activating neu- ropeptide, and three other neuropeptides) cleaved from a 192 amino acid precursor (Xu et al 1995). Each is an independent functional unit.

Cleavage of polyprotein precursors can form either identical polypeptide subunits or divergent forms. What makes function an important criterion for evaluating polypeptides formed through alternative splicing, but not from post-translational modifications? The consensus gene model explains why. Each case is a different and overlapping cluster of consensus components. Mul- tiple polypeptides formed through either pre- or post- translational means derive from a com- mon transcriptional and translational apparatus. Alternative spliced variation takes place after transcription but prior to translation, two tightly entrenched processes for the protein coding model of the gene. Function is the only means available for evaluating the number of genes. In contrast, post-translationally formed polyproteins lie beyond the the physiological boundary of gene associated biochemistry. Function fills a pragmatic need, to aid in genic identity. For polyproteins, function is of little importance because translation is a boundary condition that makes functional distinctions unnecessary. Both lumpers and splitters of gene loci draw the

74 The Dissolution of Protein Coding Genes in Molecular Biology same sharp line in the sand. Polypeptides formed directly from translation are qualitatively dif- ferent from polypeptides that undergo post-translational modifications. Both views cling tight- ly to biochemical mechanisms to locate the gene. Function is not a universally important criterion; it gains importance against the backdrop of other elements of consensus (structural and/or biochemical).

The comparison between the loci encoding polyproteins and alternative spliced products ap- pears to generate a set of parameters for interpreting well-characterized sites of the consensus gene. Three properties with variable weight designate a molecular gene: localization to a tran- script-generating segment of DNA, physiological boundaries located at pre-translational (alter- native splicing) compared to post-translational (polyprotein cleavage) activity, and an investigator-based assessment of functional divergence among products. But even this vague triangulation is insufficient. Translation does not cleanly divide the origin for variation be- tween pre- and post-translation.

A many-to-one relationship between the molecular phenotype and the locus can take place dur- ing translation. The mammalian locus governing S-adenosylmethionine decarboxylase (AdoMetDC) has two ORFs. The short form codes for a hexapeptide within the leader sequence of the larger AdoMetDC coding section (Hill and Morris 1993). The hexapeptide down-regu- lates AdoMetDC translation in a tissue specific fashion. The investigators avoid confusion about which is the 'real' gene by subordinating the smaller ORF as a regulatory element of AdoMetDC. Once again, the chosen rhetoric is consistent with the consensus view that readily accommodates novelties of form and process in the gene concept. In this instance it also forces the investigators to choose referents for the gene. That is, the AdoMetDC coding region could be semantically repositioned as the trailer sequence for the production of the hexapeptide. Hill and Morris select the AdoMetDC coding region as the gene because they place greater func- tional importance upon it.

Both polypeptides are primary products of translation and could be viewed as separate genes. The investigators seem to unpack the consensus gene as follows. The two polypeptide products originate from a common locus and common transcript that, until translation, represents a com- mon biochemistry of structure and action. The multiple polypeptide products, through their combined presence, effect one functional end. Regulatory regions effect a localized function on translation. The larger polypeptide is accorded the role of the principal gene product, a de- carboxylase enzyme, through its affect on other aspects of cellular physiology.

75 Thomas Fogle

In this instance, the genic claim is, in a classic sense, a unit of function. The mechanics of tran- scription and translation are sufficiently similar to those of other genes to warrant additional support for the case. By appealing to the consensus gene model, Hill and Morris reconfigure the coding loci for two polypeptides into one gene. Many other similar cases exist.

The assignment of one gene for the CCAAT/enhancer binding protein (C/EBP) of vertebrates entails an even more careful choreography of semantics. One messenger RNA contains three ORFs. The long coding region (C/EBP) has an 18 nucleotide ORF located in its leader se- quence. C/EBP has two start sites for translation governed in a fixed ratio by the smaller ORF (Calkhoven et al 1994). When Calkhoven et al discuss the nucleotide sequence specific for one of the two overlapping ORFs they choose the term "cistron", a unit of function. This allows them to avoid attaching a gene label to the coding regions of the three ORFs, thereby eliminat- ing the need to state whether they are working with three genes or one. Their strategy is a scheme to fit the conditions to a consensus model. The outcome is to stuff the locus under an already bulging genic tent. Once again, function, against the backdrop of a common molecular biology, provides a serviceable means for the end. The interpretative rendering by Calkhoven et al and others demonstrates how context, a normative mode for the consensus gene, impacts what is reported to be a gene and how difficult it would be to develop an internally consistent systematic taxonomy for genes.

In this section I have attempted to show how multiple products from one locus seem to conspire to force arbitrary decisions about whether one or more genes are represented at a locus. The real problem is that there has been a steady creep of new genetic twists that must either be accepted as part of the structure and biology of the gene or abandoned in favor of an alternate description of molecular events. The reluctance to abandon the molecular gene, and instead, work around problems as they arise, erodes coherence. One might ask, when told of a newly discovered mo- lecular gene, which one? - the one that produces a single product? multiple products? multiple products that have very different functions? or functional isoforms? multiple products formed during transcription? or processing? or translation?

GENES AND CODING

The translational assembly interprets the genetic code. After transcriptional processing re- moves introns and splices exons, messenger RNA is read in tandem triplets of codons. Each codon specifies an amino acid in the growing polypeptide. For some messenger RNAs there are other mechanisms for readout (Gasteland, Weiss and Atkins 1992). In some cases, the ri- bosomal assembly skips a base, shifting the reading frame. Or it jumps a stretch of as much as 50 nucleotides before continuing to read the RNA sequence. In other instances, the meaning of

76 The Dissolution of Protein Coding Genes in Molecular Biology the code changes to read, say, a stop codon in place of an amino acid. A particular physiological state, not just the transcript itself, causes the translational change. Either form of recoding par- tially shifts informational specificity for the product into the cytoplasmic space, removed from its usual habitat in the sequence of nucleotides of DNA.

The translational machinery is often likened to a computer reading software. The metaphor is an ungainly one with only superficial similarities. The DNA has a master copy of information copied and sent to the site of translation. The messenger RNA threads the sequence through the ribosome reading nucleotides in consecutive blocks of three, analogous to a bit stream of com- puter code read by hardware. With cellular recoding, the cytoplasm is re-writing the software program of the DNA. The cellular architecture itself contains an information coding ability that becomes apparent during translation.

The coding regions are not the only portion of the transcript that direct the form and function of the translated product. Leader and trailer sequences border the ORF of a transcript, are cru- cial in RNA recruitment for translation (Sonenberg 1994) and also regulate activation or rates of translation. Human and rat cells have an insulin-like growth factor gene (IGF-II) that produc- es two mature transcripts with an identical coding region and trailer sequence but leader se- quences of different lengths (Nielsen and Christiansen 1995). The shorter transcript participtes in protein synthesis while the longer form complexes with protein to become a ribonucleopro- tein particle. One functional product is translated, the other is not. The determinant is posi- tioned outside the coding region, quite unlike most other genes. Alternative leader sequences dictate the functional fate instead of the ORF. The IGF-II locus produces qualitatively different functional products, an RNA and a polypeptide, that have a common transcript and DNA locus. For the IGF-II locus, function plays no part in genic enumeration.

The IGF-II locus demonstrates a different set of problems and a similar solution to identifying a gene. Radically different end products - polypeptide and RNA - necessitates that other physio- chemical aspects compensate to weld them as components of one gene. Moreover, the location of the site controlling these products lies in the transcript outside the coding sequence. Both de- viations from the standard molecular story occur at translation. Other consensus features are el- evated in importance to overlook these shortcomings. Localizaton and transcription takes prominance. Except for the influence of the leader sequence in translation, the physical struc- ture of the locus and the biochemistry of expression are unremarkable. IGF-II can be a norma- tive gene by ignoring conflicts with the standard outline for translation. Distinctive features of IGF-II biology are hidden, blended into a consensus gene. Through abstraction, much has been lost. When told that IGF-II is a gene, one would have no way of knowing how peculiar it is.

77 Thomas Fogle

The cellular biochemistry of gene structure and expression consists of a set of contingent state- ments substantially larger than molecular biologists, such as Lewin or Strachen and Read, or Alberts et al seem to admit. Gene-splitters run the risk that any post-transcriptional modifica- tions of RNA altering the polypeptide product, any novel variation of translation, any post- translational chemical modifications of a polypeptide, map to a separate gene. Equally difficult to justify are conservative renditions for gene enumeration that de-emphasize function and read different physiological constructs of RNA or polypeptides as members of one gene. Either way, there is a price to pay. In trying to unambiguously resolve the properties of genes, we are caught in a tangle of inconsistencies. Practitioners of molecular research adapt to genetic quirks by picking and choosing among constraints for a partial fit to a consensus gene. The genes of re- search programs, as opposed to generic descriptions in texts, form a continuum of material forms and processes. There are no discrete functional packets or molecular mechanisms in the protoplasm to serve as guides for delimiting a gene.

From this unsettling outcome, a molecular gene lacks demarcation without at once specifying the temporal and spatial cyto-complex of the system. Accordingly, the dynamics of the system, more than just the sequence and a vague notion about function, characterize a locus. The route geneticists choose to move past this roadblock to gene enumeration, as we have seen, is to craft ad hoc solutions to subsets of problems (such as Lewin's one polypeptide-one gene proposal). In this way, flexibility is maintained and genic definitions should be read only as statements about common gene patterns, e.g. most genes have introns that are non-functional and most have exons that are functional.

One purpose for a flexible gene concept is to link molecular phenomena and Mendelian genes. Mendelian analysis does not depend on knowledge from molecular biology. Molecular biolo- gy of Mendelian units of inheritance reveals that Mendelian genes consist of multiple kinds of genes. The gene model of the Mendelian era imagined genes as beads on a string. The beads coded for different phenotypes but looked the same, like a string of pearls on a necklace. Mo- lecular analysis showed not only that the beads vary in size, shape and style, sometimes they are hard to recognize at all.

Next we turn to the DNA segment, the physical locale for the gene itself.

GENES AND THEIR BORDERS

Many strategies exist for locating the residence for a gene. Figuring out the property lines is much harder.

78 The Dissolution of Protein Coding Genes in Molecular Biology

In some cases, the introns removed in one splicing configuration are retained in the mature RNA of another. A non-coding intron becomes a coding exon, preempting a simple means for crafting a molecular referent for the physical structure of the gene through exons alone. And the presence of sliding edges can complicate the location for gene borders. Finally, the presence on the transcript of critical regulatory and functional elements not part of the polypeptide cod- ing sequence suggests that the edges of the transcript should serve as landmarks for the begin- ning and end of the gene. Anything more refined creates conflicting rules for assigning gene edges. However, the transcript itself is regulated by neighboring domains that are arguably part of the physical locale of the gene.

By the 1970s, a locus of expression subdivided into a region of transcription that was "the mo- lecular gene" and ancillary domains (or elements) that regulate expression. Beads were no longer a useful metaphor. The gene came to be seen as a DNA sequence, the transcript formed the edges of the gene. The availability of new technologies developed since the 1970s led to discoveries of genic structure comprised of interdependent domains of DNA sequences that spread through the zone of transcription and beyond. Moreover, genes were found that lie nest- ed within the borders of other genes (usually in an intron) or overlap on opposite strands.

The consensus model and the "beads" of Mendelian genetics symbolize units. The Mendelian version represents a more literal form of the unit while the molecular version is orchestrated by choosing on a case by case basis those aspects of the consensus model that best fit. Function, for example, can be either a crucial component or irrelevant to establishing a gene. One might hope that the DNA segment, rather than expression products, would provide an easier handle for sorting out genes as units of structure. Recent understanding of the contributions of DNA elements are difficult to reconcile with a simple localization of the gene on DNA.

Many types of domains have been described (for example, silencers and enhancers); each is a short length of DNA that affects the timing or rate of transcription. Their position and number per gene is highly variable. The activation of protein coding genes is a stepwise series of inter- actions between protein and multiple DNA elements upstream from the start of the transcribed region. An assembly of more than a dozen globular proteins attracts RNA polymerase to the promoter to initiate transcription. The interplay of multiple DNA elements and multiple pro- teins is a key regulatory mechanism for gene expression. Therefore, many recent descriptions for the molecular gene include DNA elements within the borders for the gene, even at the ex- pense of clarity about limits and boundaries. Lodish et al (1995) state that a gene is the "entire DNA sequence necessary for the synthesis of a functional polypeptide or RNA molecule." Sim- ilarly, Alberts et al (1994) consider the gene to include the "entire functional unit, encompass- ing coding DNA sequences, noncoding regulatory DNA sequences, and introns." Note the

79 Thomas Fogle juxtaposition of the Mendelian language of "functional unit" and the molecular language of "DNA sequences". The consensus gene is a struggle to hold on to the past and represent the present. Methodologically, domains are treated in much the same way as other aspects of the consensus model. Despite clear proposals either for or against inclusion of elements as part of the gene, they are included or excluded as needed to justify the case.

A gene concept that includes all DNA domains connected to the function of the molecular gene is not meant to be taken literally. Surely the inclusiveness is not intended to mean every gene is part of the substantial fraction of the genome responsible for producing the transcription/ translational apparatus. The claim being made is much more restricted and localized; the mo- lecular gene produces a transcript together with other regional domains. Empirical limits com- plicate the proposal. The individual or synergistic effects of elements on expression, like a rheostat dialed to the lowest active setting, can produce transcriptional effects barely above a detectable level.

We are left with a sketchy framework for determining when a DNA segment is part of a gene. Elements are often judged to be part of a gene if they affect expression and act in a local manner. If one steps back and looks more broadly at a larger swath of the genome, the problems associ- ated with elements becomes more evident.

The sharing of regulatory elements contributes to the problem of finding physical borders for genes. The beta globin gene cluster in humans produces five related polypeptides that form part of the hemoglobin protein. The locus control region (LCR), located upstream of all of the genes, regulates their expression in a developmental-specific manner (Wood 1996). The LCR orches- trates the timing of transcription activation and rate. Embryonic tissue produces high levels of epsilon globin and low levels of beta, A-gamma, and G-gamma chains. Fetal cells have large quantities of A-gamma and G-gamma globin and small quantities of beta globin. By adulthood, a small amount of delta globin can be detected and beta globin production predominates. A genic model that includes regulatory sequences can not deny the LCR as part of its structure. It is clear from the literature, however, that this is not the case. The LCR is presented as a separate domain, neither a component of any molecular genes nor a gene itself. For multiple local tran- scripts, like the beta globin cluster, regulatory elements are attributed responsibility for func- tional coordination of globin production, separate in kind from any gene. Separating the LCR from the gene more accurately conveys the functional relationships among the domains com- prising the cluster. It is also contradicts definitions which embed domains within genes and re- veals that the physical dimensions of genes are in dispute.

80 The Dissolution of Protein Coding Genes in Molecular Biology

The various modes of loci that yield multiple products lead to differences of agreement for the number of units of activity at a locus while DNA elements are inconsistently placed inside the physical borders of the gene. Neither the edges of the gene, its relationship to function, nor its biochemistry of expression are constants that can aid the formulation of a finely characterized molecular gene. That genes do localize is an important part of the genic claim.

GENES AND THE LOCUS

Most expression systems operate in a local manner - a segment of DNA transcribes an RNA copy destined for translation into a polypeptide. Yet even this highly schematic vision is not always true.

Expression in Trypanosomes, protozoan parasites, and nematode worms commonly requires trans-splicing a short and a long RNA transcribed from different regions of the genome. The spliced leader is less than a few dozen nucleotides long and not part of the coding region for a polypeptide. Maroney et al (1995) find that trans-splicing in nematodes is essential for "trans- lational efficiency," subordinating the smaller entity as a contributor to the effectiveness of the larger, coding RNA. The smaller locus forming the trans-spliced transcript does not have pro- tein coding function, although this is not unusual. Many "genes" transcribe RNA that do not translate. Loci forming the leader have the hallmarks of gene structure and expression without the title, treated as a buttress for the integrity of the larger RNA companion.

Bacteria use an unusual system for degrading abnormal polypeptides when the end of the cod- ing sequence is missing on the transcript. Two transcripts will code for one translated product. The second RNA, transcribed elsewhere in the genome, joins the ribosomal complex and re- mains unbound to the first RNA that contains the incomplete message. It attaches a linker ami- no acid and proceeds to code an additional 10 amino acids from its nucleotide sequence (Keiler, Waller and Saur 1996). The 11 amino acid tag signals the cell to dispose of the polypeptide. The authors consider each locus, independently transcribed, and co-translating a polypeptide, to be a separate gene.

The organizational and functional construction of the bacterial and trypanosome loci do not re- veal why the former should be a two gene system and the latter a one gene system. Both trans- pliced RNA and the bacterial degradation system bring together products from two loci into the translational machinery. The bacterial degrading system and trypanosome loci coding each contain a structural and regulatory domain on the transcript. Each could be interpreted as one gene or two. The transcript is in two pieces for bacteria degradating loci, both of which are

81 Thomas Fogle translated into one polypeptide. Either the single polypeptide product or the double transcripts could be used as referents for the genic identity. The same view holds for the trypanosome sys- tem which fuses transcripts from two expressed loci.

Loci are not evaluated by appealing to codified rules. Experimenters address a specific scenar- io against the backdrop of a crudely outlined consensus gene. The conclusions drawn are not always consistent. The genic systems of bacteria and trypanosomes lead to opposite interpre- tations. For the former, the presentation feels more coherent when two loci are treated as two genes, for the latter two loci represent one gene and a regulator.

Opposite conclusions are reached to give explanatory flow to the empirical evidence. Trans- spliced sections take on a parochial role as a regulator of translation; the larger transcript of the two becomes the central object of inquiry, promoting its functional importance.. On the other hand, the degradating system serves a global function to the cellular system by purging abnor- mal polypeptides. Therefore both the RNA that does the degrading and the RNA product that needs to be degraded, are functionally independent, the product of two genes. Both mechanisms regulate aspects of the cellular system, but the loci serve different rhetorical ends.

GENES AND PSEUDOGENES

Gene ascriptions are a case by case interpretation of the consensus gene. The perceived signif- icance of a locus to the cell can be critical to this process. When function is unknown, molec- ular biologists sometimes postulate a contribution to the cellular system from contextual cues. Functional effect operates through expression, the formation of a transcript. Therefore the abil- ity to transcribe is a crucial property for a gene.

Pseudogenes are categorized separately from true genes. Alu sequences, as one example,are short pseudogenes found in large copy number in the human genome. They have the signature of genes, many having one or more ORFs that can transcribe small amounts of RNA. The tran- scripts do not undergo translation. Alu structure, localization in DNA, and capability for ex- pression into RNA are consistent with genic properties outlined earlier. What critically distinguishes pseudogenes from true genes is their low level of transcription. Real genes func- tion and a failure to transcribe at a normal level signifies a lack of potential for a functional af- fect on cellular activity.

82 The Dissolution of Protein Coding Genes in Molecular Biology

The dividing lines between whether something is or is not a gene can be thin. Pseudogenes point to a minimum level of expression as necessary but not sufficient. In practice, there is an inves- tigator driven interpretation of the functional impact through a measure of the expression level. Very low levels of transcription equate to low functional effect and a failure to fit the consensus gene model.

GENES AND GENE STATES

The chemical structure of DNA can act like a toggle switch, alternating between two states that influence gene expression. A string of nucleotides that acquires methyl groups on the cytosine bases can activate or repress transcription. In some cases the methylated pattern, known as im- printing, is preferentially inherited through one sex, resulting in the maternal or paternal expres- sion of specific loci. In others, the pattern of methylation changes between tissues or states of development.

Methylation usually represses transcription. Bartolomei et al (1993) report a domain of meth- ylation surrounding the transcribed region and promoter of the H19 gene, beyond which they did not detect repression. Imprinting of the H19 gene functions similar to a regulatory element, a domain often positioned within a gene. Methylation patterns differ from regulatory elements in two ways. They are not sequence specific sites and they can spread over the entire face of the locus, as in the case of the H19 region.

Two types of DNA loci are present - a locus of methylation and a locus of base sequence. Me- thylation is not included within the conventional genic architecture that associates expression only with sequence variation of nucleotide strings. The temporal shifts between methylated and unmethylated states adds a new dimension to the structural gene.

Methylation is not part of the gene concept, at least not in any usual sense. Temporal stability of DNA is strongly associated with hereditary factors. Conceivably methylation could be ab- sorbed within the consensus gene, another novelty from the expanding base of molecular knowledge. This seems misguided. It is unlikely that a thoroughly elaborated set of physical domains and biochemical contingencies for genic specification is realistic. It seems much more likely that molecular genes will dissolve into an emphasis on configurations of domains as the operational units for structure, biochemical action, and functional relevance.

DNA action and function become meaningful in the context of a cellular system. Coding in- formation in the DNA is necessary but insufficient for the operation of living systems. The mu- tual dependency of DNA and protoplasmic interactions bedevils a simplistic labeling scheme for expressed segments of hereditary information. The more molecular biology that is unpa-

83 Thomas Fogle kked the greater the need to acknowledge the mutuality of the component parts, forcing arbi- trary choices about the physical edges or the physiological properties of the gene. The consensus gene model presently in vogue, performs an end run around mutual dependency by embracing a loose and changing confederation of properties to locate hereditary units. As a re- sult, the research enterprise can succesffully search for genes so long as there is no demand for a rigorous underpinning for their specification.

SEARCHING FOR GENES

Much of molecular genetics research focuses on individual loci that do not depend on tight matching to a universal construct. Liberal applications of the gene concept weld research pro- grams into a community dedicated to a common mission. For this purpose, a molecular gene is a useful instrumental tool.

There are, however, consequences for vague notions about genes. Talk of genes plays a major role in the intellectual advancement of evolutionary biology and organismal development. If loci are contextually dependent for structural and functional evaluation, then it is unclear how a fully realized, or at least richly detailed, theoretical presentation would be possible using genes as an explanatory manipulative.

Coding information acts within a co-dependent cellular setting; localized sites of expression in- teracting among DNA domains and contingent upon genomic composition. Here, the term ge- nome means more than the collective set of molecular genes of the organism; it refers to the rich tapestry of DNA domains that weave a pattern of expression. Genetic information is layered within ordered, structured chromosomes. Genomic analysis is expanding rapidly, and will un- veil integration among domains positioned far apart. From this wider vantage point on the en- tirety of genetic information in an organism, domains of action and regulation connect locally and distantly positioned DNA loci into a functional network.

There are many examples, too numerous to document here, that demonstrate positional and contextual integration of function at all levels of genomic organization: coordinated regulation of gene families, loops of chromatin that regulate clusters of genes, distinctive sequence pat- terns within chromosome banding patterns, and regional functions at the tips and centromeres of whole chromosomes. How these levels of organization cooperatively orchestrate informa- tion has yet to be explored, largely because of experimental limitations. Cellular activity re- quires this hierarchy of genomic information in addition to each locally acting hereditary site (a gene, however defined). This is more than just a problem for the molecular taxonomy of

84 The Dissolution of Protein Coding Genes in Molecular Biology genes; sites of interaction interpenetrate the genome at many levels, many of which are left on the sidelines when gene number is equated with genomes and genomes with information con- tent.

A gene is a coarse parameter for genomic analysis. It may be possible to count the number of ORFs or the number of alternatively spliced products or the number loci producing primary transcripts, but it will not be possible to conduct an accurate gene count. The molecular gene in vogue today utilizes a consensus framework, a highly plastic abstraction, poorly suited for detailed genomic analysis of higher organisms.

The goal of the is to find the 60-80,000 genes, a number based on three methods of estimation. Each method plucks some parameter from the consensus gene as a tool for estimation. But, because the consensus gene is a fluid concept, the derived values are them- selves a crude statement about the genome. With about 1% of the total DNA sequenced at the time of this writing, one can extrapolate from an average density of one gene per 20,000 bases to arrive at an estimate for the total number of such sites (70,000 genes). A second measure assesses the number of kinds of expressed RNAs (cloned as complementary copies of cDNA) in different tissues to determine gene number (65,000 genes). Since no one tissue expresses all genes and, as we have seen, alternative splicing and other mechanisms can produce more RNAs than local expression sites, this too is a rough estimate. The third method relies on counting the number of CpG islands, regions often surrounding promoters that have a higher density of neighboring CG bases than the rest of the genome. Slightly more than half of all expressed loci have CpG islands. The counting method offers no indication of the number of gene products or their function. Estimates for the number of genes (80,000) are consistent with the other methods. The three methods, collectively, suggest that somewhere between 65-80,000 loci in the human genome fit the loose standards of the consensus gene.

Reporting gene counts, particularly for the human genome, is more than an empirical exercise. It is intended as a scale of information content essential for normal function. The thinking goes like this: genes are functional units; thousands of functional units are present; the expression of the phenotype is significantly impacted by these thousands of units of genetic activity; the set of these genes tightly mirrors what is meant by a genetic contribution to the phenotype.

The failure to successfully proscribe universal genic borders or events of expression calls into question the significance of gene counts for higher organisms. What new meaning would result from discovering that there are twice as many genes as thought, or half as many? On the other hand, knowing how many domains of a particular type are present might be helpful (e.g. how many CpG islands), indicative of the cellular-wide importance of a specific mode of molecular

85 Thomas Fogle interaction. Mosaic architecture and activity among claimed genes greatly limits meaningful inference about information content, molecular activity, and functional effect. Suppose, for the moment, a complete human DNA sequence were available It would be possible to scan the ge- nome through a computer search for the number copies of particular domains and collections of domains, some of which might match those DNA strings currently recognized as genes. It would require many hair-splitting choices to reach a precise value for the number of genes using the consensus gene as a guide. And to what end? The real advantage to detailed genomic se- quencing will be to make sense of the functional contribution from combinations of domains, not to label lots of loci as genes. Genes are a low resolution description of common and over- lapping sets of consensus features. As valuable as these loci can be for reductionist evaluations of the genetic contribution to a trait, they limit the potential to integrate large scale complexity within living systems. Genomic analysis will lead to further insight about the distribution of expression sites, their relationship to non-coding chromatin, the affects of chromosome archi- tecture on RNA activity and much more. With a genome-wide perspective, the intact organi- zation takes on a larger and more important biological role than do genes alone.

The gene concept receives heavy attention in molecular genetics. The analysis of genomes, the next horizon in genetic research, will seek functional interconnections for the entire DNA read- out irrespective of size, location, or match to a consensus gene model. Gene could become a quaint term of the past (at least in molecular biology circles) replaced by language that more accurately conveys relationships among domains contributing to phenotypic effects.

There is already an explosion of terminology to bypass shortcomings of the consensus gene. Alternative splicing, for example, has been termed a "complex transcription unit". The LCR and globin genes have been described as a "holocomplex." Fogle (1990) proposed that a focus on domain sets for active transcription (DSATs) in their various configurations would be a more refined means to represent regions of expression. DSATs are an assemblage of the structural building blocks that could be used to constructed a taxonomy for expression loci.

Along a similar vein, Portin (1993) suggested a classification scheme of nine genic subtypes. Both proposals steps away from the one-term-fits-all model pervasive today. Neither is a solu- tion for sorting out the proper locale in the cytoplasm to map function to DNA domains, also a critical problem for the consensus gene model. Nor does either method prescribe a set of pre- cise rules for delimiting the physical borders of genes. This is due to the inseparable intertwin- ing between the genotype and the biochemical phenotype in causation. But that problem will always exist and should not deter attempts to seek a closer match between empirical outcomes and the explanatory units necessary to effectively communicate them.

86 The Dissolution of Protein Coding Genes in Molecular Biology

The consensus gene, an eclectic mix of parts and biochemical operations, unites research pro- grams under one umbrella. It simultaneously generates a false impression that the molecular gene concept is a coherent unit of expression in cellular systems. Genomic referencing, the de- velopment of systemic relationships among DNA domains, will more fully interconnect molec- ular genetics into biology than the molecular gene alone. Advances in understanding regulation and expression of DNA and the current interest in large scale sequencing will necessarily su- pervene on much of the attention currently bestowed on molecular genes.

REFERENCES

Alberts, B., Bran, D., Lewis, J., Martin, R., Roberts, K., and Watson, J. D. 1994. Molecular Bi- ology of the Cell. New York: Garland.

Bartolomei, M. S., Webber, A. L., Brunkow, M. E., and Tilghman, S. M. 1993. Epigenetic mechanisms underlying the imprinting of the mouse H19 gene. Genes and Development 9:1663-1673.

Calkhoven, C. F., Bouwman, P. R., Snippe, L., and Ab, G. 1994. Translation start site multi- plicity of the CCAAT/enhancer binding protein alpha mRNA is dictated by a small 5' open reading frame. Nucleic Acids Research 25:5540-5547.

Carlson, E. A. 1991. Defining the gene: An evolving concept. American Journal of Human Ge- netics 49:475-487.

Falk, R. 1986. What is a gene? Studies in the History and Philosophy of Science 17:133-173.

Fogle, T. 1990. Are genes units of inheritance? Biology and Philosophy 5:349-371.

Fuller, R., Brake, A., and Thorner, J. 1986. The Saccharomyces cerevisiae KEX2 gene, re- quired for processing prepro-‡ -factor, encodes a calcium-dependent endopeptidase that cleaves after lys-are and arg-arg sequences. In Lieve, L. (ed.) Microbiology. Washington: American Society for Microbiology. Pp. 273-278.

Gesteland, R. F., Weiss, R. B., and Atkins, J. F. 1992. Recoding: Reprogrammed genetic de- coding. Science 257:1640-1641.

Hill, J. R., and Morris, D. R. 1993. Cell-specific translational regulation of S-adenosylmethion- ine decarboxylase mRNA. Dependence on translation and coding capacity of the cis-acting upstream open reading frame. Journal of Biological Chemistry 268:726-731.

Hodges, D., and Bernstein, S. I. 1994. Genetic and biochemical analysis of alternative splicing. Advances in Genetics 31:207-281.

Keiler, K.C., Waller P. R. H., and Sauer, R. T. 1996. Role of a peptide tagging system in deg- radation of proteins synthesized from damaged messenger RNA. Science 271:990-993.

87 Thomas Fogle

Kitcher, P. 1982. Genes. The British Journal for the Philosophy of Science 33:337-359.

Lewin, B. 1995. Genes V. New York: Oxford University Press.

Lodish, H., Baltimore, D., Berk, A., Zipursky, S.L., Matsudaira, P., and Darnell, J. 1995. Mo- lecular Cell Biology. New York: WH Freeman.

Maroney, P. A., Danker, J. A., Darzynkiewicz, E., Laneve, and R., Nilsen, T. 1995. Most mR- NAs in the nematode Ascaris lumbricoides are trans-spliced: a role for spliced leader addi- tion in translational efficiency. RNA 1:714-723.

Nielsen, F. C. and Christiansen J. 1995. Posttranscriptional regulation of insulin-like growth factor II mRNA. Scandinavian Journal of Clinical and Laboratory Investigation, Supple- ment 22:37-46.

Portin, P. 1993. The concept of the gene: Short history and present status. The Quarterly Review of Biology 56:173-223.

Strachen, T. S. and Read, A. P. 1996. Human Molecular Genetics. New York: Wiley-Liss.

Sonenberg, N. 1994. mRNA translation: Influence of the 5' and 3' untranslated regions. Current Opinions in Genetics and Development 4:310-315.

Tycowski, K.T., Mei-Di, S., Steltz, J. A. 1996. A mammalian gene with introns instead of exons generating stable RNA products. Nature 379:464-466.

Wood, W. G. 1996. The complexities of globin gene regulation. Trends in Genetics 12:204- 206.

Xu, W. H., Sato, Y., Ikeda, M., Yamashita, O. 1995. Molecular characterization of the gene en- coding the precursor protein of diapause hormone and pheromone biosynthesis activating neuropeptide (DH-PBAN) of the silkworm, Bombyx mori and its distribution in some in- sects. Biochemica et Biophysica Acta 1261:83-89.

Comments by Fred Gifford

INTRODUCTION

Fogle argues that the great complexity of biological detail that we find in the genetic material, and the various, rather arbitrary ways in which scientists have come to deal with these facts, suggests that we should give up the gene concept at the molecular level.

Molecular biologists have used a general framework about what a gene is -- what Fogle calls the "consensus gene" -- based on protein coding genes but without an explicit definition, which they have used to conceptualize new discoveries. But the recent discoveries in molecular biol- ogy do not fit this model well, so that continuing to think and act as if they do is inappropriate.

88 The Dissolution of Protein Coding Genes in Molecular Biology

There are a number of different sorts of conclusions that might be drawn from such a situation. Certainly Fogle is saying that that the traditional concept no longer holds up and that intellec- tually honest and ultimately more fruitful response is to give up on the idea that there is some one thing that genes are. And he says that the gene turns out not to be a useful level of analysis. He does not couch his conclusion in terminology such as instrumentalism or metaphysical plu- ralism. His central concern is a methodological thesis -- a thesis about the direction of research.

Some of the claims with which he concludes his article are that "Genes are a low resolution de- scription of common and overlapping sets of consensus features" and that "'Gene' could become a quaint term of the past (at least in molecular biology circles) replaced by language that more accurately conveys relationships among domains contributing to phenotypic effects." I believe he means his conclusion to be normative as well as descriptive. It is not simply a pre- diction that -- though it is that -- but normative thesis, I presume, about what it would be rea- sonable for investigators to do. Fogle gives a large amount of evidence for this thesis about genes and geneticists, cases where applying our normal definitions doesn't get the right (or a clear or consistent) answer -- and cas- es where these difficulties are papered over by the investigators.

There are many more cases here than I can begin to examine here, and I find the examples and much of the reasoning compelling. Thus I will simply describe a few of the cases, and then move on to some more general comments. I will try to suggest various things we need to think about further as we consider how to interpret these examples and what general conclusions should be drawn from them, and what some responses might be.

FOGLE'S EXAMPLES

It is impressive and revealing that the complications with characterizing the molecular gene are of so many different types and arise at so many levels or points along the causal process: fea- tures at the DNA level, features concerning transcription, concerning translation, etc. I will dis- cuss just a few of them here.

Some of the facts blur or make complicated the borders of genes. These include cases of over- lapping genes, sometimes on opposite strands, as well as genes located inside the introns of an- other gene, and the sharing of regulatory elements.

Some of the cases involve a given segment of DNA coding not for one protein, but several dif- ferent ones, depending on the circumstances. Coding can start or stop at different places, de- pending on, e.g., different promotor sites, and there is also alternative splicing of the RNAs. The latter can be the result of different combinations of exons, and indeed something's being an in- tron or exon is not a permanent or context-independent fact about a given stretch of DNA. Note that in this case, it's precisely the same location or stretch of coding DNA that has different polypeptide effects.

There are cases of cellular recoding -- both cases of alternative mechanisms for read-out (skip- ping one or a large number of bases and thus altering the reading frame), and cases where the meaning of the code changes. One of the features of such cases is that informational specificity for the product is shifted outside the coding region and sometimes out of the genome and into the cytoplasm. Another is simply the difficulty in applying the view that a gene is a coding re- gion for a given protein.

89 Thomas Fogle

Such cases as these deviate from a sort of standard picture that we (and scientists) have of how genes work, or what genes are, and they deviate in so many different ways that it appears they cannot be put under a common view in a clear way.

Another of Fogle's claims is that the practitioners clearly feel uncomfortable with the situation, and he gives as evidence for this various ad hoc maneuvers that they engage in when confronted with hard cases. For example, as a case of translation "producing a many-to-one relationship between the molecular phenotype and the locus", he cites the example of AdoMetDC, where there are two different open reading frames, one inside the leader sequence of the other. One of them is chosen as "the gene"-- apparently the investigators feel compelled to answer the question -- but the grounds are ultimately arbitrary (a matter of which is viewed as having the more central function). Practitioners interpret the cases in such a way as to reduce the degree of conceptual oddity in the particular case (p. 75).

In another example, Lewin, facing the problem that a given localized segment of DNA can make different products in different contexts, takes an explicit position that we should just give up the one-gene-one-polypeptide view and simply say that these are different but overlapping genes. (Lewin 1995) Significantly, while making this change in our concepts, Lewin does not pursue the consequences of such a move; he ignores the fact that it would wreak havoc with our nomenclature system and with our estimates of gene number (p. 73). Fogle cites this as evidence that scientists in this field do not concern themselves with generating an overall conceptually clear system, but merely concern themselves locally.

These few examples will have to suffice here to represent the broader story.

WHAT DO THE EXAMPLES SHOW?

Fogle's wide variety of examples would appear to be devastating to the gene concept. It does seem naive and dogmatic to assume that there is some one gene concept that will make sense of all this. One can certainly understand the idea that, while we may have assumed that there were simply genes out there to be found, the more closely we look, the less the old picture seems to apply. Thus we may be led to the view that we should discard the (molecular) gene concept, and say that there are no genes, only genetic material with a variety of patterns.

But let us look more carefully at how the argument is supposed to go and what some responses might be.

One source of hesitation arises from Rheinberger's suggestion that it is acceptable and indeed even a virtue for our scientific concepts to remain vague and fuzzy, and thus that we need not try to come up with a precise definition for our concepts. (Rheinberger, this volume) Perhaps it is quite appropriate that we have generic, open, definitions.

Further, even if we agree that increased fuzziness increases the prima facie reasonableness of abandoning a concept, it is important to see that the argument is incomplete. For to say that it counts as some reason (ceteris paribus) doesn't tell us how strong a reason it is, and whether it would be enough to outweigh what might be reasons on the other side.

This raises certain questions. First, how much difficulty or conceptual problems is needed in order to show that we should give up on the gene concept (or stop using the term, or decide that

90 The Dissolution of Protein Coding Genes in Molecular Biology genes don't exist)? We might even put this as what the criteria are for determining the point at which we should give up the gene concept. Even if we don't put the question this way, we are led to see the importance of looking more carefully at how the argument is to go. There are a number of things to tease apart. One thing we need to consider further is the "consensus gene" idea, so as to clarify the nature of the problem.

Presumably the reasonablenss of not continuing to rely on the molecular gene concept will be in part a function of there being some alternative, one which claims to solves the problems bet- ter. This leads us to the question of what the alternative is. What can we or should we or is it proposed that we do instead, and would that really be a preferable situation? Fogle describes two possibilities. One is to move to the entire genome, whether it be by engaging in "genomic referencing" ("the development of systemic relationships among DNA domains") (p. 87) or something else. The other involves developing a taxonomy of more specific types of entities, something he had proposed in (Fogle 1990) and Portin had suggested in (Portin 1993).

THE CONSENSUS GENE

One thing we need to do is elaborate on the consensus gene concept and on how it fits into the story Fogle tells about our present awkward situation.

Rather than having some definition (a set of necessary and sufficient conditions) for what a gene is -- for what they mean when they use the term 'gene' -- molecular biologists have in their minds (and embodied in their practice) an entity less strict than this. Fogle calls it the "consen- sus gene". He explains this by citing a number statements describing consensus features of "protein-coding genes":

First, he says that "A consensus gene, in its stereotypical format, is a localized segment of DNA that forms the transcribed region." He then adds, "Additional nucleotide strings (elements) may reside externally or internally." (p. 71) Then part of the description given involves how tran- scription and translation occur. Indeed, Fogle then describes the consensus gene as including the whole process by which genes are expressed.

This is said to be a "collection of consensus features" and "a methodological model". We (or molecular biologists) determine whether something is a gene by comparing certain features of that entity (elements of structure, expression and function) with the features that make up the "consensus gene".

So Fogle's claim is this: There is no clear and accepted criterion or definition of what counts as a gene. This is presumably because there couldn't be such a definition that would in fact ac- count for all the kinds of cases that we are as a matter of empirical fact encountering as we learn more about the genetic material and the wide variety of complicated and integrated ways in which it is organized. The sorts of cases discussed above appear to violate some of the proper- ties had by the traditional concept of the gene -- and perhaps any single such concept.

Note that there are a number of features that make a single definition impossible, or that in any case conflict with at least some of the features held by biologists:

There are tensions or conceptual puzzles that arise from conflicts between different compo- nents of the consensus gene. These show that you cannot have a straightforward definition, such that entities will be seen to be genes by virtue of having all these properties. For instance,

91 Thomas Fogle there are tensions that arise from functional and structural ways of construing or individuating genes. There is also context-sensitivity, and integration of different parts of the genome, and a lack of sharp distinction between cases which makes it hard to apply definitions. Apparently there are only patterns, not universal generalizations.

Then there are the ways these facts are responded to by investigators. The response apparently has to involve some sort of pattern-matching or analogical reasoning. It is done on an ad hoc, case by case or local basis. Rather than appeal to some well-worked out, coherent model that applies in all cases, practitioners reason by analogy to other cases -- so that it's analogical rather than deductive (or principle-based) reasoning.

So, Fogle argues that when scientists are confronted with a new system, they are required to make (somewhat arbitrary) choices about whether X is a gene, or which entity in the system is the gene. Since the scientists can't be simply following some clear explicit definition, what they are forced to do is something like reasoning by analogy from other cases that have already been labeled genes. This reasoning requires making choices about what's more similar to the cases had in mind, and these can only be local decisions, and cannot (or at least are not) made in an objective and precise manner.

One implication of this is that those from different fields might make those judgments differ- ently. For first, those with different backgrounds may see the similarity relationships different- ly. And second, those from different subfields will plausibly have different exemplar cases in mind.

It is important to stress that the decisions to be made require "judgment" rather than just rote application of an explicit definition. Fogle makes a claim at one point (p. 75 ) that is particularly useful in helping us see why: "Three properties with variable weight designate a molecular gene: localization to a transcript-generating segment of DNA, physiological boundaries located at pre-translational (alternative splicing) compared to post-translational (polyprotein cleavage) activity, and an investigator-based assessment of functional divergence among products."

Each of these three properties may be a bit imprecise, and so require some judgment. Signif- icantly, the third one is directly dependent on the judgment of the investigator. More impor- tantly, perhaps, there is apparently no way to specify how the three should be weighted, so, if these properties pull in different directions, different investigators may weight them differently due to their disciplinary backgrounds or other factors.

A feature of the consensus gene that arises out of this is that it is social. Different members of the scientific community in fact have different such concepts; hence they cannot -- or at least have not -- come to agreement about just what it means.

Now, actually, this feature might be seen to be a way in which the consensus gene plays a pos- itive role. The consensus gene allows binding together of different research programs under one umbrella. On the face of it, it would seem to allow communication and cooperation, under circumstances where the scientists can't be expected to come to explicit agreement about the definition.

On the other hand, it might be said to have some bad effects as well. For instance, there can be communication problems resulting from the fact that people only think they understand each other, and that they are living under a myth. Presumably Fogle takes these costs to outweigh the benefits. A full account would need to make this case. Without denying that Fogle's thesis

92 The Dissolution of Protein Coding Genes in Molecular Biology is plausible, I think that this would be a sizable task. The matter is rather complex, itself requir- ing judgment (as well as empirical data) about the positive and negative influences this has on communication and the ability to make scientific progress. But in any case, this helps us see how to analyze the issue. (Note that, while F does not elaborate this here, these positive and negative effects on communication at issue concern both (a) between different scientists, and (b) between scientists and the public.)

SOME RESPONSES

With this in mind, let me consider further the question of whether the facts of messiness, excep- tions, and the lack of some way to unify all the phenomena under one concept requires us to give up the gene concept, or makes doing so our best option.

There are a variety of possible responses to the complicated situation Fogle illustrates, respons- es that might suggest that we do not need to take the step of abandoning the gene concept.

First, it might be asked whether this is really so different a situation from several others in bi- ology. One obvious case (and one mentioned briefly by Fogle) is the species concept. The question of whether two organisms count as members of the same species is sometimes a diffi- cult and controversial one. Different methodologists (with different explanatory and other sci- entific interests) have different views about what the criterion should be, such that there are disagreements over whether we should use, for instance, the biological species concept, the evolutionary species concept, or the ecological species concept (Kitcher 1984). Some might argue that species are not real, that the concept should be viewed instrumentally, but this is hardly a typical view. And there are not many who say that we should stop using the concept of species and simply describe the phenotypic variation that exists in other ways.

Of course, ultimately we would have to examine carefully various questions concerning any such examples: First, is holding on to the term or concept in those other cases in fact the right strategy? Or, for example, should the term 'species' be given up? Second, are the complica- tions, or divergences in meaning, in fact as great in these other cases? They surely do seem quite severe in the case of 'species'. Or are there perhaps other sorts of differences? For exam- ple, are there different sorts of things at stake in those other cases?

A different sort of response to Fogle's challenge would be the following. Schaffner (1986, 1993) provides an account of theory structure in the biomedical sciences that might be of use here, for it suggests that this more messy state is the norm and represents a perfectly good sort of theory structure. According to Schaffner, theories in the biomedical sciences, such as theo- ries of the immune response or the operon theory of gene regulation, tend to be what he calls "theories of the middle range". Unlike certain of the general and mathematical theories in the physical sciences (or population genetics or the theory of the genetic code), which appear to have universal laws, they are best understood as "an overlapping series of interlevel temporal models of limited scope". They are typically presented via a description of a given model or system, a characterization of a set of causal entities that undergo a causal process. Further, this prototype or "exemplar" (sometimes corresponding to the "wild-type") stands at the center of a set of analogous models that vary from one another in certain features, each such model having a fairly narrow scope of direct application. This differs from the typical or tradition view of what a theory is (a view based on physical science), with a set of universal laws from which all else can be derived, but the claim is that this more complicated picture is in fact the norm in biology (and medicine), and that it does not represent a worrisome situation.

93 Thomas Fogle

The fact that one of Schaffner's central examples here, in his account of exemplars and variants, is the operon model of gene regulation, makes it seem likely this will be relevant to our purpos- es.

This may suggest giving further credence to the idea that the messy situation concerning appli- cation of the gene concept is not so unique. Of course that by itself does not show that it is le- gitimate or shouldn't be altered or worried about. And there might be some important differences between this and the gene case. These things would need to be explored more fully -- perhaps by means of exploring how explanation and methodology proceed within such a framework as Schaffner describes.

It does seem that it would show us that we have less reason to worry about practitioners' rea- soning by analogy. That may be the way much scientific reasoning proceeds in any case, rather than a symptom of a dysfunctional situation.

Another way to address the normative issue would be along the lines of Rheinberger's claim that there are indeed positive virtues of having such a fuzzy concept. Indeed, it may be that Rheinberger's and Schaffner's accounts, both referring to sophisticated reasoning tools such as fuzzy logic, would dovetail.

ALTERNATIVES

With this in mind, let us consider briefly the question of what is to be done instead. As men- tioned earlier, Fogle describes two kinds of proposals, moving all the way to the genome, and having a larger taxonomy of entities. With respect to the latter, it is noted that over the course of evolution, natural selection has created "a variety of stable formats to package and process hereditary information."

In response to this latter proposal, a variety of questions would have to be addressed. First, how many such entities will there be? Is there some clear objective way to do it? Or, putting it more practically, will we be able to come to agreement about this (any better than we can come up with a single definition for 'gene')?

Relatedly, might we not have basically the same situation afterward -- on a smaller scale? There may still be embarrassing questions about which cases really fit together as of the same type. The classes might well still be spread out and fuzzy. Inherent biological variation is of course a fact of, as it were, life. That might suggest that rather than assume that we're at the wrong level of description when there are all these exceptions, etc., perhaps we should accept or construct a different model of generalization and theory, allowing that there are genes, or these paradigm cases of genes, and then there are all these variants and complications.

I suggest that all of these things need to be explored more fully in order for us to see better what the implications are of Fogle's examples.

Note that taking seriously the overlapping models picture of biological theories does not spe- cifically argue against Fogle's (specific and comparative) argument that it will be more fruitful to take the "genomics" perspective. It really addresses the claim that the present situation in unacceptable since extremely messy.

94 The Dissolution of Protein Coding Genes in Molecular Biology

Also, the overlapping models picture does not justify all of the behavior by scientists under the influence of the consensus gene concept -- the behavior Fogle criticizes. For instance, it would not condone the view that one had to choose one segment over another as "the gene" in the AdoMetDC case. What is legitimated is the theory structure based on exemplars and analogical reasoning.

REFERENCES

Fogle, T. 1990. Are genes units of inheritance? Biology and Philosophy 5:349-371.

Kitcher, P. 1984. Species. Philosophy of Science 51: 308-333.

Lewin, B. 1995. Genes V. New York: Oxford University Press.

Portin, P. 1993. The concept of the gene: Short history and present status. The Quarterly Re- view of Biology 56: 173-223.

Schaffner, K. 1986. Exemplar reasoning about biological models and diseases: a relation be- tween the philosophy of medicine and philosophy of science. The Journal of Medicine and Philosophy 11:63-80.

Schaffner, K. 1993. Discovery and Explanation in Biology and Medicine. Chicago: University of Chicago Press.

95

A UNIFIED VIEW OF THE GENE, OR HOW TO OVERCOME REDUCTIONISM

Peter John Beurton

ABSTRACT

The conceptual history of the gene, beginning with its inference from the observation of Men- delian regularities in inheritance early this century and reaching an apex with the unravelling of its biochemical structure by Watson and Crick in 1953, seemed for a long time a standard ex- ample of successful reduction in the life sciences. However, since the 1960s new findings have begun to turn the gene into an increasingly problematic entity. As a result of the discovery of overlapping genes, introns and exons, alternative splicing, and the like, a conceptual crisis about the gene has arisen during the last twenty years. What originally looked like a particulate gene, may be more or less scattered across parts of the genome and be temporarily "gathered together" by the "clever" genome. Because the genome seems to define the gene in various ways during development, genes are frequently treated as anonymous stretches of DNA which the experimenter, depending on how he chooses to manipulate the genome, makes use of in var- ious ways and calls genes according to his own ends. This has lead to the wide-spread opinion that the gene is devoid of special reality or is just a word.

I contest this view and argue against the present mainstream of thought for a unified gene con- cept. I define the gene as the genetic underpinning of the smallest possible difference in adap- tation which selection can detect. Differences in adaptation among individuals, by directing natural selection towards the genetic underpinning of such differences, are seen as instrumental in the formation of genes. Within this scheme, the empirical evidence pointing towards the dis- integration of the gene may really turn out to be evidence in favor of the evolutionary stability of the gene. I interpret genes as having a history of coming-into-being by evolutionary mecha- nisms at work in populations, and this should also empty the problem of reductionism of all con- tent.

1. THE RISE OF REDUCTIONISM

Most people consider matter as something fundamentally particulate. This is an ancient notion and has its origins with Democritus. Matter is taken to consist of "last particles" the unpacking of which will lead to the disclosure of the innermost secrecies of this world. Though today such a body of thought is often criticized as reductionistic, over long periods of time it proved suc- cessful as a strategy in the sciences. When in the second half of the last century people became inspired by the prospects of empirical verification of such entities in the life sciences, they Peter John Beurton coined terms like "gemmules," "anlagen," "pangenes," or "determinants." The so-called redis- covery of Mendel's laws at the beginning of this century was taken by many as an ultimate proof for the existence of such particles. They were inferred from the observation of independent as- sortment of characters, or particulate inheritance. Johannsen (1909) proposed the word gene as a new name, and from then on it seemed possible to rest content with the knowledge that the Mendelian recurrence of characters was sufficient proof that there were some anlagen - genes - which must be responsible for those characters. Though Johannsen and a large community of embryologists conceived of "genes" rather in terms of a holistically designed heredity potential which was somehow secreted by the whole organism, the idea that all things are in a deep and important sense particulate was so pervasive and overriding that it gradually took hold of large parts of the scientific community.

Between 1910 and 1920 these developments received overwhelming support from the unravel- ling of the mechanics of heredity by Morgan and his group. (Sturtevant, 1915) . Nobody knew yet anything about the internal make-up and structure of genes when Muller (1922), who was the theoretically most gifted among Morgan‘s collaborators, came forth with two desiderata genes would have to satisfy to function as "ultra-microscopic particles" in the evolutionary con- text. First, a gene must be autocatalytic, that is, it must possess a structure which induces the production of copies of itself; and second, it must also be autocatalytic in the more refined sense that it produces copies of whatever mutations occur inside itself. A third characteristic occa- sionally mentioned by Muller was the gene's heterocatalytic potential of giving rise to all those products other than itself which materialize in the developing organism. Genes were envisaged by Muller as compact little entities which in virtue of these three-fold capacities would provide the essential key towards understanding the nature of life. Among the most important biochem- ical work which led towards understanding the fine-structure of those "ultramicoscopic parti- cles" were the. one gene - one enzyme hypothesis in the early '40s, (Beadle and Tatum, 1941), the discovery of DNA as the sole material of which genes are composed (Avery et al., 1944), and finally the disclosure of gene structure by Watson and Crick (Watson and Crick, 1953). Genes were shown to be contiguous stretches of double-stranded polynucleotides that formed an intertwined double-helical structure. This structure immediately suggested a mechanism of autocatalysis by separation of the two strands which then function as a templates for the assem- bly, by each, of a new strand through complementary base pairing. Hence, also mutations (ir- regular substitutions, insertions, or losses of nucleotides) would continue to be copied thus ensuring also Muller‘s second desideratum. By the early '60s the colinear relationship between such a DNA segment and the encoded polypeptide as well as the mRNA that mediated this pro- cess were essentially elucidated by Nirenberg and Matthei (1961) thus also providing insight into the heterocatalytic functioning of DNA.

98 A Unified View of the Gene, or how to Overcome Reductionism

Muller was enthusiastic about the vindication of his speculations: "Here at last was the down- to-earth, detailed chemical structure of the gene material..." and "it is evident what main fea- tures of this conformation give the material its virtually unique and truly fateful three faculties - those of 'reproducing itself' and its mutants, and of influencing other materials - the three fac- ulties which, when in combination underlie the possibility of all biological evolution." (Muller, 1966 p.504 and 505). With this reduction of the major aspects of life to one kind of chemical substance, DNA, molecular genetics had reached its "heyday of precocious simplicity" (Rheinberger, this volume).

The case of the gene is often treated as a standard case of successful reduction. Indeed, why criticize reductionistic strategies if they lead to such wonderful results? However, the issue of reductionism is not whether genes are real things; it rather pertains to a particular way of view- ing the relation between underlying particles (genes in the present case) and the objects (organ- isms) which are, in some sense, composed of these particles. Reductionist theories hold that organisms are totally determined by their genes below, without any back-flow. This demands some explanation. The strongest import of reductionism came from population genetics the way it was invoked as the neo-Darwinian explanatory model for evolution in the 1920s and '30s. This was the time when mathematical calculations made it obvious that genes of only a very minute degree of superiority may spread by natural selection even through a large population within no more than a few hundred generations (Fisher, 1930). Such an insight made it over- whelmingly plausible that phenotypes played no role of their own in the evolutionary context and that the evolution of species consisted in nothing but an ever-repeated process of species- wide replacement of "not-quite-so-good" by slightly "better" genes, each adding an indepen- dent and minute evolutionary increment of its own to the whole process. Fisher, in fact, spoke of genes sweeping across the species population. I will frequently return to this dynamic (giv- ing it finally a different interpretation) and will call it the horizontal dimension of natural selec- tion in populations.

Of course everybody knew that each individual organism passed through a complex life cycle and responded actively in unique ways to different situations in accordance with its more im- mediate needs. Nevertheless, in terms of evolution, whatever organisms did during their life- time seemed now unimportant because it could not be inherited, or if heritable, was seen simply as an expression of what was there in the genes anyway. Also, there was a sense in which pop- ulation geneticists knew that genes weren't in possession of rigid selective values like inborn properties, but depended for their future on how they interacted with many other genes and, ul- timately, the whole genetic and physiological background of all the individuals they came to inhabit . Genes may be positively selected against one genetic background but selected against in another context. Hence it was possible to imagine how different kinds of harmonious genetic

99 Peter John Beurton interactive systems were built up by natural selection in which any one gene acquired a partic- ular selective value only as part of the whole (Wright, 1931; Mayr, 1963). But the Fisherian perspective was permeated by a self-reinforcing argument that made its proponents feel quite immune against such reasoning: One can take for granted that there are always some genes which ultimately spread species-wide, because this seems the natural inference to be drawn from the fact of wholesale species change through time. Moreover, all those genes which final- ly make their way through the species population must have enjoyed an above-average selective value from the outset, for otherwise they would have disappeared early on. These assumptions, however, make the conclusion inevitable that gene interactions always cancel each others' ef- fects in the long-run with the net result of some genes prevailing ultimately - that is, as a con- sequence of their internal goodness in the first place, or independent of interaction. This is why Fisher (though not ignoring the bearing of interaction on a gene's selective value in any one sit- uation) felt himself in a strong position when totally discounting interaction as an evolutionarily important factor and relying solely on their additive effects for explaining the evolutionary pro- cess. I will call this argument for the purposes of this paper Fisher's cogency argument.

As a consequence, just as evolution became defined in terms of changes in gene frequency, phe- notypes were reduced to the passive expression of gene functions. This was the mode of evo- lutionary thought that gave rise to and accompanied Muller's quest for a gene structure that could satisfy the totality of auto- and heterocatalytic functions all in one. Another important reinforcement of this kind of evolutionary thinking that may be mentioned at this point was the central dogma of molecular biology which stated that the transfer of genetic information is from nucleic acid to protein, but never the other way around. This dogma was formulated by Crick (1958) during his search for the heterocatalytic capacities of DNA. Neo-Darwinian evolution- ary thinking and the discoveries of molecular biology went hand-in-glove.

Fisher's philosophy has been carried on, for example, by G. C. Williams. Though admitting that "it is [obviously] unrealistic to believe that a gene actually exists in its own world with no com- plications other than abstract selection coefficients and mutation rates," he sees grounds to con- tinue like this:

No matter how functionally dependent a gene may be, and no matter how complicated its interactions with other genes and environmental factors, it must always be true that a given gene substitution will have an arithmetic mean effect on fitness in any population. One al- lele can always be regarded as having a certain selection coefficient relative to another at the same locus at any given point in time. Such coefficients are numbers that can be treated algebraically, and conclusions inferred for one locus can be iterated over all loci. Adapta- tion can thus be attributed to the effect of selection acting independently at each locus. (Williams, 1966, pp. 56-57)

100 A Unified View of the Gene, or how to Overcome Reductionism

This is Fisher's cogency argument, and this is what we should probably consider the respectable problem of reductionism. Richard Dawkins has become known as probably the most radical modern proponent of the reductionistic view of population genetics. He is often viewed as the person who has carried reductionism to the limits of absurdity. His new catchword for the self- contained gene is "the selfish gene" (Dawkins, 1976). By the use of this term Dawkins wants to make it uncompromisingly clear that genes never reproduce "in the interest" of organisms and their features, but solely "in the interest" of their own. Evolution, according to Dawkins, is a process in which alternative alleles battle with each other for maximizing their own reproduc- tive success. Genes are seen as replicators whose phenotypic effects render them successful at propagating themselves, while phenotypes are nothing but survival machines for genes; "a monkey is a machine which preserves genes up trees, a fish is a machine which preserves gene in the water" (Dawkins, 1976, p.22). Dawkins' ruthlessly reductionistic position has met with severe criticisms ever since he published his first principle book on the subject in 1976. At best, he is supposed to have produced a caricature of (Gould, 1997). Nevertheless, I think it is fair to say that nobody yet has contradicted him on his own grounds. Though Fisher would not have used Dawkins terminology, most of Dawkins' reasoning is implicit in Fisher's philo- sophy of additive genes. Dawkins is then no less a respectable person.

It seems, then, as though population genetics per se is a reductionistic enterprise. Phenotypes are there for genes to compete, but in terms of evolution lead no life of their own. As Samuel Butler said, the hen is an egg's way of making more eggs. Surely something is wrong, but what? I want to suggest in this paper (notwithstanding the merits of numerous interactionist models that have been developed in the wake of during the last ten or fifteen years) that we should not only look at the interactions between ready-made genes but take a new look at the gene itself. In fact, it has been known already for a number of decades that "the gene" is no longer what it once was thought to be. Placed in proper perspective, this may provide a point of take-off towards a not quite conventional solution to the problem of reductionism in popula- tion genetics.

2. THE DISINTEGRATION OF THE GENE

Since the beginning of the 1960s new developments in molecular biology have gradually begun to call into question that triumphant story of the gene referred to in the beginning of this paper. At first nearly imperceptible, these developments finally led to a situation that may be described as the far-reaching disintegration of the gene. A somewhat premature indication of the insuffi- ciency of the gene concept per se to do justice to the newly discovered molecular biological phenomena came in 1955 when Benzer, inspired by his successful analysis of the fine structure

101 Peter John Beurton of a single gene in the E. coli bacteriophages, proposed the terms "muton," "reckon," and "cis- tron" to distinguish between the molecularbiological units of mutation, recombination and function (Benzer, 1955; see Holmes, this volume).

Something of a watershed came with Jacob and Monod's (1961) model of gene regulation. Be- cause the classical gene was inferred from the Mendelian trait it gave rise to, there always seemed to be an element of direct correspondence between genes and traits. But according to Jacob and Monod the general condition is one of "regulator genes" which control the expres- sion of "structural genes." They do so in one large class of cases by giving rise to a protein that binds to a structural gene's controlling site (the operator) thereby regulating this gene's rate of transcription. Is one then well-advised to treat such a regulator sequence as a gene by itself? Because it produces a protein it is said to have produced a phenotypic element. This terminol- ogy may hold as long as the protein can be assumed to be a placeholder for any one of those traits observed by Mendel or Johannsen. But if a regulator gene's "trait" is simply the effects it exerts on the expression of another gene's trait, it would be probably more appropriate to call it part of this other gene in the first place. However, the operator region of a structural gene often controls the transcription, not of one, but of several adjacent structural genes. Such an inclusive unit of structural genes was called by Jacob and Monod the "operon." Hence, in terms of tran- scription rather the whole operon should be viewed as an equivalent of "the gene". Unfortu- nately, this will do neither, because in terms of translation these structural segments continue to behave as individual genes by each giving rise to a different protein. What, then, is the gene?

This is a retrospective view; at the time, the importance of the empirical discoveries about reg- ulation of gene activity far outweighed such uncertainties on the conceptual side. True, it be- came common practice among molecular biologists to substitute for the term "gene" phrases like "regulatory sequences that control expression", or "genetic elements that control coding se- quences" (see the list of expressions collected by Falk, 1986, pp. 165-166). This however was probably not so much to avoid some nagging conceptual issues, but simply because such phras- es are, at least at first sight, anyway more informative to the molecular biologist. Jacob and Monod's model of gene regulation was of tremendous influence over the next decade. Accord- ingly, when repetitive DNA was discovered (which may consist of tens, hundreds, or even many thousands of repeats) the hypothesis was advanced in the wake of Jacob and Monod that variations in such sequences could allow the evolutionary exploration of new regulatory sys- tems without destroying the old (Britten and Davidson, 1969, 1971). But, equally important from the present perspective, repetitive DNA delivered an independent argument for loosening the ties between underlying genetic structures and their expressions. For obvious reasons high- ly repetitive sequences tend to be non-coding. They then hardly fulfil the criteria necessary for calling them genes. The classical gene from Muller through Watson and Crick had always been

102 A Unified View of the Gene, or how to Overcome Reductionism a discrete and coherent particle that was identified by its location and function and that served as a unit of (mutable) autocatalysis as well as heterocatalysis. In contrast, repetitive DNA looked like junk arising from an unchecked surplus in copying activity in some parts of the he- reditary machine (Doolittle and Sapienza, 1980). Another class of genes that may be mentioned here, are pseudogenes (Jacq et al., 1977). Pseudogenes resemble active genes, but are rendered nonfunctional, for instance, by mutations that affect transcription or translation. They arise, for instance, from unsuccessful gene duplication, either as a result of unequal crossing over or by retroposition, that is, reverse transcription of RNA intermediates into DNA.

Another discovery of the 1970's, of a still more obvious bearing on conceptual problems of the gene, came with overlapping genes (Barell et al., 1976). Two reading frames may overlap so that in such an area of overlap a single DNA sequence gives rise to parts of two proteins. In- stead of comprising a single gene, such a DNA sequence suddenly turns out to stand in the ser- vices of two different genes! But what then is a gene? Carrying speculations a little further, a gene begins to look like a dynamical function temporarily conferred to DNA stretches by read- ing frames (and mechanisms standing behind them) for the purpose of producing a protein. At any rate, no longer does any one particular DNA sequence in any sense seem uniquely respon- sible for a particular trait.

But even when genic properties are rigidly tied to DNA, a gene may lack a constant location. There are genetic elements (transposons) that move around actively in the genome modifying the expression of adjacent genes. This discovery, made by Barbara McClintock in maize plants in the 1940s (McClintock, 1951) was so far ahead of its time that it met with considerable scep- ticism and was treated rather as a curiosity for a long period of time. But during the '60s and '70s it was found that chromosomes may be littered with mobile elements and that they are widespread in both prokaryotes and eukaryotes.

Now the most important discovery, made simultaneously by numerous research teams during the second half of the '70s, came with "genes-in-pieces". Rather than forming one contiguous stretch of coding DNA, a gene may consist of a mosaic of coding and noncoding sequences called "exons" and "introns", respectively (W. Gilbert, 1978). The primary transcript of such a gene goes through a process of "RNA splicing" in which the introns are excised and discarded while the remaining segments comprising the exons are spliced together again yielding a "ma- ture" mRNA. Genes-in-pieces seems to be the rule rather than the exception in eukaryotes. The discovery of genes-in-pieces has led to a completely new understanding of the evolutionary dy- namics of genes. It was fairly obvious from the beginning that a mutation affecting a splice site could radically change the pattern of such a gene, and in addition it was hypothesized that through unequal crossing over such genes could exchange exons in the course of phylogeny.

103 Peter John Beurton

These ideas immediately found support by arguments from protein structure. Polypeptide chains often consist of two or more compactly folding globular units of semi-autonomous func- tion called domains. It was hypothesized by Blake (1978) that domains or subdomains of pro- teins correspond to exonic units of genes; notably, a couple of years earlier cases of striking homology between domains of different proteins had become known (Rossmann et al., 1974). By the mid '80s the first instance was recorded of the same exon occurring in different genes and thus coding for the same functional element in different proteins (Gilbert, 1985). Finally, it was found that exon shuffling is a process widely distributed among vertebrates. The impor- tant general insight was that gene shuffling during phylogeny could explosively increase the amount of protein diversity.

But if introns are defined by their silence, should they be called part of such a gene-in-pieces in the first place? Indeed, some authors restrict the use of the term "gene" to all and only those DNA sequences which are - by some standards - functionally involved in synthesizing a gene product. Carlson (1991), for instance, proposes the "informational gene" as the more appropri- ate term because (unlike the term "cistron") this gene concept does not require physical integrity of the sequence in question, and introns, then, cease to pose a problem because they can be ex- cluded from the informational gene. Hence, a view emerges of the nonlocal gene: a gene con- sists of a short series of isolated subunits dispersed across some stretch of DNA. The population geneticist Gale holds in contrast that introns "even if functionless, must be included, since a mu- tation upsetting splicing will destroy gene function" (Gale, 1990, p. 8). Also Carlson adds in a second thought: "They [the introns] can, of course, be decisive in providing mutations that can affect the way the exons are assembled" (Carlson, 1991, p. 478). The question, then, becomes, What are the standards by which such a non-coding DNA sequence may count as involved in gene function?

This is the situation known already from regulator genes. Carlson opts for putting operators, promoters, upstream and downstream regulators into a separate bag and calls them "'accessory sequences' for gene processing," which is something slightly less than parts of genes, "because they are universal features of all (or many) genes and are not unique features of each gene" (Carlson, 1991). I don't quite see the point: may they not affect a single gene's function by a very specific mutation (like with introns)? Singer and Berg (1991), in contrast, call for an in- clusion of all elements which have an influence on the regulation of transcription - and so arrive at the following dilemma:

These regions are so varied in their structure, position, and function as to defy a simple in- clusive name. Among them are enhancers and silencers, sequences that influence tran- scription initiation from a distance irrespective of their orientation relative to the transcription start site. (Singer and Berg, 1991, pp. 461-462)

104 A Unified View of the Gene, or how to Overcome Reductionism

Fogle concludes for similar reasons:

...it is as if the influence of the distal elements fades into the genetic horizon as one searches farther and farther upstream of the gene [and] a completely inclusive model is vacuous. (Fogle, 1990, p.360) It seems, then, as though what is called a single gene may not only occupy patch-wise bits of DNA but also be scattered across a large part of the genome. W. Gilbert, again, speculated that gene splicing need not be a hundred per cent efficient so that a single transcription unit may deliver a new product alongside the old. The implications are similar: variations in the genetic material may lead to a situation in which "the extra material is scattered in the genome, to be called into action at any time" (Gilbert, 1978, p. 501). Genes somehow seem to fade away into the genome.

Gilbert's speculation has been confirmed and substantiated by the discovery of "alternative splicing" (for a review, see Smith et al., 1989 ). Alternative splicing is the most dramatic dis- covery up to date in the context of genes-in-pieces. Not only do genes come in pieces, but dur- ing development alternative samples of exons from any one gene's primary RNA transcript may be pieced together to form a mature mRNA. A single gene, then, may yield varying assemblag- es of exons or different protein isoforms to meet the demands of the developmental stage in question. This condition can be compared in kind only with that of overlapping genes. In case of most of the other difficulties it was always possible to hold with some confidence that a gene, whatever it was and however fuzzy in appearance, was either something underlying a transmis- sible trait - or may not be called a gene proper. But overlapping genes and alternative splicing flatly contradict any one-to-one correspondence between a gene as part of the germ-line and its presumed product (a polypeptide, a protein, or simply a trait).

Here are two characteristic summaries of evidence on the fluidity and ephemerality of the gene:

[The gene] is neither discrete - there are overlapping genes, nor continuous - there are in- trons within genes, nor does it have a constant location - there are transposons, nor a clearcut function - there are pseudogenes, not even constant sequences - there are 'consen- sus' sequences, nor definite borderlines - there are variable sequences both "upstream" and "downstream". (Falk, 1986, p. 169)

There are no universal borders or discrete functional packets in the protoplasm to serve as guides for delimiting a gene. ... a molecular gene lacks demarcation without at once speci- fying the temporal and spatial cyto-complex of the system. (Fogle, this workshop) The situation is somewhat reminiscent of the theory of atoms after the turn of the century: the more molecular biologists learn about the structure and functioning of the gene, the less they know what a gene really is. What consequences shall we draw from this impressive scenario portrayed by Falk and Fogle? Of course, we must not be too narrow-minded: "It is important

105 Peter John Beurton that geneticists recognize the many levels at which genes can be perceived, but it is not helpful to select one of these levels and arbitrarily designate that as the universal definition of a gene." (Carlson, 1991, p.478). But Carlson's counsel may be taken as an advocacy of a pluralistic ap- proach towards such problems. Pluralism, however, often dissolves basic problems without solving them. If there is assemblage of primary gene transcripts to new functional units prior to translation and if, moreover, reassemblage may differ according to developmental stage, and also, if regulator sequences determine which segments are to be processed as a single gene in the first place, then the term "gene" becomes meaningless independently of the organizing ac- tivities carried out by the genome's regulatory apparatus. A gene may be scattered across the genome and is then "gathered together" by what may be called "the clever genome". The colin- earity thesis is no longer helpful in defining what a gene is. Causation seems to go the other way around. A gene is a genome's way of making a protein, or the clever genome temporarily brings genes into being to lever itself into the next generation. The important question then be- comes, How did the secret of the gene get hidden away in the genome?

These more recent findings of developmental genetics have prompted the view that the gene - the generic gene - is just a word. Genes are really anonymous stretches of DNA which the ex- perimenter, depending on how he chooses to manipulate the genome, makes use of in various ways and calls genes, that is, according to his own ends. There seems to be an element of pri- macy, or rather generality, about DNA strings. DNA strings seem more real while genes are seen as somewhat more instrumental or operational and thus as ranging more close to the epis- temological end of our inquiry into nature. But if genes prove to be unreal the whole problem of reductionism which became so pressing in the population genetic context collapses - for how could an unreal entity turn selfish? However, this is not the kind of answer I am seeking.

3. A PROPOSAL FOR A NEW UNITARY CONCEPT OF THE GENE

In what follows, I will argue in spite of seeming evidence to the contrary, for a unitary concept of the gene. I will try to outline the story of genes in a setting which harmonizes duly with the more recent findings of molecular biology without calling the reality of genes, or of the generic gene, into question. Indeed, I will do my best to put the recent molecular biological findings which seem to furnish an argument in favor of the disintegrity of genes, to the service of an ar- gument in favor of the unity of genes. However, in view of the wide-spread sentiment among molecular biologists that "the gene" is just a word, and search for "the gene" may even hamper molecular biological investigation (Rheinberger, this volume), I will adopt the procedure, not to talk positively of genes before having shown how nature in and of herself turns DNA strings into discrete and well-established entities which deserve such a name, that is, irrespective of any experimentalist's needs and the like.

106 A Unified View of the Gene, or how to Overcome Reductionism

In pursuing my strategy, I am then free from the outset to talk of DNA strings, though not yet of genes without begging the question. Also, I am free to talk of differential reproduction of individuals due to underlying genetic differences. What then are these differences, if not of genes? They are caused, for instance, by mutations (minute irregular upsettings in nucleotide sequences) and, more significantly, also by the reshuffling of chromosomes (e.g., by unequal crossing over and other rearrangements). Crossing over, for instance, does not respect the in- tegrity of genes - or rather imagine a situation in the beginning in which crossing over finds no genes to disrespect . Both, mutations and shuffling of chromosomes may already be part of my discourse.

We have then an array of genetic variations inside the genome - but not of genes. How does this variation affect the individual during ontogeny, that is, in the context of developmental ge- netics? Due to historical circumstances the attention of scholars focused for a long time on mu- tations with clearcut somatic effects. The study of such mutations established the universality of Mendelian heredity, and in the evolutionary context these were the most suitable for the study of the effects of natural selection. But it is unclear to what extent this is the usual condi- tion. According to the neutral theory of evolution (Kimura, 1968, 1983), mutations without suf- ficient somatic expression to be detected by selection might be the rule rather than the exception. The issue is still unsettled, and it seems that Maynard Smith's opinion is still reason- able that "there are features of the data which are consistent with the neutral theory, but nothing that compels one to accept it" (Maynard Smith, 1983, p. 713). At any rate, Kimura's emphasis was not on neutrality per se, but rather on an equivalence of functions among alleles under cer- tain conditions while a change in conditions might well induce selection. According to Kimura (1983, p. 271 and 307), mutations have frequently a latent potential for coming under the re- gime of selection. They may drift in and out of selective contexts.

Because this tendency of mutations towards neutrality might play an important role in evolu- tion, the notion of a mutation having by its very nature a straightforward effect which can be detected by selection (like causing one of those alternative characters observed by Mendel or Johannsen) might be an unwarranted assumption. However, we probably can assume the fol- lowing scenario. Being neutral one by one, such mutations will accumulate across the genera- tions until genomes will be inhabited by all sorts of different neutral mutations. Though individually subliminal in their effects , it nevertheless can be assumed that eventually the total number of genetic variations spread across any one individual's genome, after sufficient accu- mulation and through their joint action, probably will materialize into some net difference in development and in the performance among individuals across their whole life-cycle sufficient for selection to detect. This is a minute difference in adaptation among individuals. It is rea- sonable to assume that in the beginning this difference is, in terms of phenotypic expression,

107 Peter John Beurton very diffuse. It may be as diffuse as is the scattering of these genetic variations across the ge- nome. It will amount to some slight difference in overall performance among individuals rather than becoming manifest in an alternative expression of some well-demarcated trait. The em- phasis is then at present on differences among individuals, not on traits of individuals (we will see soon how these two things hang together). However, as a net difference it is sufficiently distinct for selection to detect and to qualify as a difference in adaptation; this is the important point. - Note that we are talking here already in terms of adaptation and a genetic basis of ad- aptation, not yet of genes, though.

This was the effect of genetic variations in the developmental context and its history. Now, when asking questions about selection, we are immediately confronted with the evolutionary context. Selection, by discriminating among the above-mentioned differences in adaptedness, causes differential reproduction. Differential reproduction of what? This has always been a somewhat tricky question (e.g. Keller, 1987). We often talk loosely of differential reproduction of individuals. But in sexually reproducing organisms the genome is reshuffled in every gen- eration. Individuals do not reproduce themselves literally; rather, two individuals of opposite sex always give rise to a new unique individual. It is therefore more to the point to talk of a dif- ferential reproductive contribution of an individual to the gene pool (or pool of DNA strings) of the generations to come. This, however, leaves unanswered our quest for the unit of differ- ential reproduction. On the other hand, neither do single nucleotides or those rearrangements called mutations qualify as such units if we take for granted that they have too small an effect to be detected by selection. Well, is it not true that nucleotide upsettings have at least in some cases a sufficiently distinct phenotypic expression ? Yes, when they qualify as an alteration of a gene's properties. Genes are the smallest units of selection, not mutations. Of genes, howev- er, I must not speak yet.

However, I am free to say that differential reproduction is a statistical bias among such arrays of alternative DNA variations which through their joint action produce a difference among in- dividuals large enough for selection to detect. This is, I claim, the unit of differential reproduc- tion we are seeking. Surely also these variations, being spread across the whole genome, are subject to constant shuffling in the sexual process. May this not obliterate such a selective bias? Not really; all we need to assume for the formation of such a bias across the generations is that this bias works in favor of all those variations which happen to support through very many dif- ferent kinds of combinations some such adaptation more often than other genetic variations do. This being so, however, we can also say that such a minute adaptive advantage will direct se- lection in favor of its own genetic underpinnings and cause discrimination across the genera- tions among, say, tens of thousands alternative variations scattered across entireties of DNA strings.

108 A Unified View of the Gene, or how to Overcome Reductionism

I suggest that such an array of nonlocalized DNA variations, whose reproduction comes to be controlled by some such overall difference large enough for selection to detect, begins to qual- ify as a gene. This is, I suggest, the generic gene which has been constantly bedevilling us. Hence, a gene need not be located in any one place. All those DNA variations which, due to this common guidance (because they all share in the production of some one difference), spread at the same rate in a population qualify as a single gene irrespective of location. It is this same- ness of reproductive rate by which these DNA variations begin to meet the standards of being one single gene. Such a gene lacks physical discreteness but nevertheless gains some discrete- ness as the smallest unit of selection.

Now if there is some constant selection for such an array of DNA variations and if, moreover, there is continual crossing over and thus redistribution of DNA segments in the genome, for rea- sons of economy those variations might gradually come to occupy a single location and acquire physical integrity which is, then, an additional outcome of selection. I propose the following terminological distinction: the gene as a particle presupposes the existence of the gene as an entity, or unit of selection, which may or may not possess physical integrity. This integrity of the gene is then the result of the locally contiguous relocation of all those bits of DNA through- out the genome which happen to support some one minute adaptive difference among individ- uals. Further sophistications might be added, like initiators, terminators, etc., although from the generic point of view this need not be so. While in the developmental context an initiator and terminator may define a gene, in the context of evolution in populations selection, when "gath- ering together" a gene, may also bring such additional sophistications into being.

I am using the term "unit of selection" in a special sense. The claim is not simply that the gene is a unit encountered by natural selection , but is rather one generated by natural selection from a background of never-ceasing variation contained in the genome. Besides neutral mutations we only need think of those processes listed above as seeming evidence of the disintegrating gene. For instance, unsuccessful gene duplications by retroposition have been said to "keep the genome in flux ... they can be considered a shotgun approach of nature wherein the majority of these genetic elements are inactive and left to rot in the genomic soil. Nevertheless, some seeds will integrate near a fertile genomic environment giving rise ... to new genes or gene domains..." (Brosius, 1991, p.753). It follows from such and similar processes that "[i]n principle, any string of nucleotides in a genome may be recruited as part of a novel coding region or regulatory element." (Brosius and Gould, 1992, p.10708). I employ these statements here for supporting the idea that natural selection compounds genes by "gathering together" all those genetic vari- ations from throughout the genome which support some one adaptive difference among indi- viduals more effectively than alternative variations do. Now, behind natural selection stands differential reproduction; or more precisely, natural selection is differential reproduction initi-

109 Peter John Beurton ated by individuals. The more ultimate source for the coming into being of such a localized gene is then this difference among individuals in adaptive performance and reproductive con- tribution to future generations . Such adaptive differences, then, impose in a process of down- ward causation those distinctions within genomes which we call genes. The term "downward causation", originally introduced by Campbell (1974), acquires in the present context a far more incisive meaning than previously suggested. Of course such an adaptive difference possesses its own genetic underpinning in the first place. But this doesn't make the argument circular for the very reason that such a genetic underpinning is originally scattered across the whole ge- nome. It is most important for avoiding a tautology to realize that this adaptive difference is in its original state genomic, not genic. To say that such an adaptive difference imposes those structures called genes is to say that genomic overall differences between individuals cause via their phenotypic effects and the good offices of natural selection the coming into being of genes inside individuals in a process of genomic individuation.

Difficulties in identifying genes or their locations has led frequently to the assertion that a gene is simply that "what makes a difference" between any two individuals. I am turning this around and saying, a physical difference among organisms, when perpetuated through populations, is what makes a gene. Schrödinger (1944) was stunned by the genes' stability because he saw them instrumental to the stability of organisms, it is, however, the other way around; I assume differential reproduction of organisms to be the most important source for the coming into being of stable units called genes.

As a corollary, such a difference between individuals, when in the populational context direct- ing selection towards its own genetic underpinnings and ultimately producing a particulate gene, may also harden and become very distinct, because in the ontogenetic context it becomes increasingly an expression of that gene. It may take on the character of a well-demarcated trait occupying a particular locality defined by the spatio-temporal dimensions of organismic archi- tecture. Such a trait, in the sense I am using this term here, is then the individualized outcome of some difference in overall performance among individuals (and may or may not be also well- demarcated in terms of human perception). A difference in adaptive performance may then in- duce a hardening, or individuation, of both, the underlying genomic difference into a gene and of the adaptive difference into a trait. Or more to the point, selection involves right from the beginning the compounding of a minute developmental pathway (a minute ontogenetic trajec- tory). Such a minute pathway is then not the result of the selection for an expression of an un- derlying gene, but the result of a long-selected, and therefore distinct minute difference in ontogeny among individuals; and figuratively speaking, this pathway turns at one end into a gene and at the other into a trait. The intriguing point, when viewed from this evolutionary per- spective, is that none of the three components, the gene, the pathway, or the trait, comes first.

110 A Unified View of the Gene, or how to Overcome Reductionism

This pathway actually is the gene-and-its-trait, both being imposed in the evolutionary context by a difference among individuals in adaptive performance. The gene, though in the develop- mental context appearing as the cause of a minute developmental pathway, may also be a prod- uct of such a pathway in the evolutionary context. Genes may be seen then as the products of this kind of interaction between the development of individuals and selection in populations, or between ontogeny and phylogeny.

Moreover, I assume that such a minute genetic pathway might be extremely important for the "proper" functioning of mutations. We have seen how genetic variations may be compounded into genes as a result of natural selection. But genes, once having come into being, may in turn also have a profound effect on the status of newly occurring mutations. Organisms, given the nonexistence of genes, are extremely messy things. In such a state, there is no reason to assume that a microscopic upsetting in the sequence of a few nucleotides causes in and of itself a well- defined macroscopic trait on the individual's surface. The metaphor of the blind watchmaker would be totally misleading in this context. Such an upsetting would not even deserve the name of a mutation. But once a gene has come fully into existence together with its pathway and well- defined trait, such a minor upsetting in nucleotide sequence will affect a long-selected, highly condensed, minute developmental pathway inclusive of all that delicate machinery necessary for the precise execution of its functions. From then on the watchmaker analogy applies. Only to the extent that a gene causes a well-defined trait may a mutation of such a gene cause a clearcut alteration of that trait. A slight upsetting in nucleotide sequence begins to qualify as a "proper" mutation only after a long history of gene evolution in which such a developmental pathway begins to function also as an amplification of the effects of that upsetting. According to the view presented here, neither genes nor mutations "just happen"; rather, both are products of a long history of evolution.

What is the bearing of the gene concept proposed here on the more recent findings of molecular biology? Sociobiology has introduced such suggestive terms like the "battle between the sex- es", "parent-offspring conflict" and even "intragenomic conflict". Whatever the merits of those terms, in the present context we may say that various kinds of differences among individuals that exist in a population may, in the process of being turned into genic differences, or minute ontogenetic pathways, inside individuals, compete for partly the same arrays of DNA variations and may finally compromise. The outcome would be overlapping genes. Different develop- mental stages may also do so. The outcome would be alternative splicing. Would it then be too sweeping to claim that the assumed cleverness of the genome is simply brute recapitulation of phylogeny by ontogeny? Genes are not created by the genome, but the genome is, to some ex- tent, a condensed history of what has been going on in populations. Thus the genome may re- capitulate the partitioning of DNA strings into genes in various ways. This, then, would be an

111 Peter John Beurton answer to the secret of how the gene got into the regulatory apparatus in the first place. And this is, I would claim, how the various forms of genes observed by the molecular biologist begin to make sense, or rather an element of one common sense.

4. IMPLICATIONS FOR REDUCTIONISM AND THE AUTONOMY OF BIOLOGY

Genes are only in a very limited sense the "last particles" of which individuals and populations are composed. In another and more comprehensive sense, it is the other way around: genes are the product of the evolutionary dynamics in populations and are brought into being by down- ward causation. Downward causation may be viewed as the counterpart to the central dogma of molecular biology which has accompanied for a long time reductionistic evolutionary expla- nations. "Counterpart" is not quite the right word, because both processes are not only different in direction, but also very different in kind. The central dogma is molecular biological and was derived by Crick in part on the basis of stereochemical arguments (see Crick's retrospect, 1970). Downward causation, in contrast, is neither molecular biological nor physiological or the like, but is an across-population process. It is a genuinely population-level process, not reducible to activities going on inside individuals taken one by one, but nevertheless taking place inside in- dividuals-hanging-together in the evolutionary context of populations. Only in the context of selective spreading of genetic material in populations does the justification arise to talk of adap- tive differences materializing into genes rather than the other way around. This horizontal di- mension in populations is crucial for understanding downwards causation and how populations bring into being genes.

This view of the horizontal dimension of selection also suggests an answer to the question asked earlier on: By which standards do a genes accessory sequences count as involved in gene func- tion and thus as part of the gene? An accessory sequence is part and parcel of the gene in ques- tion as long as genetic variation in this sequence affects the gene's phenotypic expression in such a way that this sequence forms part of the same unit of selection. Looking at single indi- viduals, functional interdependencies fade away into the genomic horizon, and there is simply no yardstick by which to define what belongs functionally to a single gene. However, the spreading effects of selection induce in their own right clearcut distinctions within genomes in terms of units of selection, or of function, or genes. Let us have another look at the Jacob and Monod regulatory model. How does this model relate to the gene concept as presented in this paper? Imagine a mutation occurring, for example, in the regulator sequence with the effect of increasing the rate of transcription initiation. This may lead to the synthesis of a larger quantity of one of the proteins in question which in turn causes some diffuse difference in adaptive per- formance among individuals large enough for selection to detect. This adaptive difference then directs selection toward its own genetic underpinning. Because the mutation occurred in the

112 A Unified View of the Gene, or how to Overcome Reductionism regulator sequence, the whole regulator circuit, including the operon (together with the opera- tor) will be affected as the unit of selection. But maybe also some other DNA variations nearby or far away in the genome happen to contribute to the more efficient working of the mutant cir- cuit as compared with the original and therefore acquire the same specific reproductive rate. This whole unit as defined by a specific reproductive rate qualifies then as the gene, and the regulatory model building on this unit therefore as one of gene action.

Genes are the most elementary particles, or downward extensions of life; they are life's atoms and qualify as such only by a specific reproductive rate. But may not something smaller than a gene serve as such a unit? The answer is that the gene, as the smallest unit of selection, is a self-defining object. Because it includes as many bits of DNA as are necessary for providing a genetic underpinning to a difference detectable by natural selection, no half or quarter gene can take its place, or when it does, it is by definition a full gene. Mutations, of course, may radically effect a phenotype and differential reproduction, but they are no units of selection. (Mutations do not compete in populations for a site in genes; genes however may be said to compete in some sense in populations for a locus.) Accepting that natural selection is blind for anything below the gene, and accepting, moreover, Dobzhansky's (1973) dictum that nothing makes sense in biology except in the light of evolution, the conclusion is that mutations, which involve a reconfiguration among molecules too small for selection to detect, make sense in the eyes of natural selection solely as properties of genes. Remember that a mutation qualifies as a "prop- er" mutation only after a long history of evolution of the gene, that is, only when it begins to qualify as a genic property. Downward causation in the realm of biology then truly comes to an end at the level of genes and their properties, and this also ensures the autonomy of biology.

A contrasting view has been defended by Kenneth Schaffner for more than 20 years. This is his claim with respect to the Watson and Crick model of the gene:

The helices are held together by weak hydrogen bonds which are explicable by quantum mechanics, and the purines pyrimidines - the nucleotide bases - are fully and completely characterizable in terms of physics and chemistry. ( Schaffner, 1974, p. 128) and more generally:

...however, ...since biological systems are thought by molecular biologists to be nothing but chemical systems, in the long run detailed investigations of such systems will be in full ac- cord with the dictates suggested by the general reduction model. (Schaffner, 1974, p. 139) This opinion arises because Schaffner, like most molecular biologists, is looking only at pro- cesses going on inside single individuals, not individuals-hanging-together in populations, and therefore sees only DNA and the like. My claim that genes come into being only in the context of the horizontal dimension in populations is, in turn, equal to saying that no piece of DNA pos-

113 Peter John Beurton sesses in and of itself the capacity to form a gene. A piece of DNA, being inherently chemical, may take on the function of a gene and thus turn biological by imposition from above, that is, by coming to qualify as a unit of selection. This is no inborn characteristic of DNA, rather, it arises from downward causation. Because Schaffner doesn't draw the distinction between DNA as a chemical agent and genes made of DNA, he isn't saying anything definitely wrong. Put another way, however I comment, I will be talking past him. Either he has in mind only DNA as a chemical agent in which case successful reduction is a triviality; or, should he really have genes in mind, he would be a hard-going reductionist. But, I repeat, the distinction is not con- tained in his argument.

Finally, I return to Fisher's cogency argument mentioned in the beginning which seemed the final source of all reductionism in population genetics. Sewall Wright's (1931, 1932, 1982) way to overcome Fisher's cogency argument was to assume that a gene's selective value, owing to inadaptive factors in local populations like genetic drift, need not be determinate from the out- set, but as it were, emerged from interactive processes in and between local demes throughout the species population. The way out suggested in this paper is more radical: Not only are the selective values of genes emergent properties, but the genes themselves emerge from interactive processes in populations. Once it is possible to show that genes are products of populations, reductionism becomes evidently emptied of all content.

Fisher's gene spreading served as the classical version of this horizontal dimension in popula- tions. But Fisher admitted only genes as causal factors in the evolutionary context, no pheno- types, and so he never could have conceived of a process in which genes emerge from downward causation, even if he had wanted to. With Dawkins' (1976, 1982) reductionism the situation is more complex. To see why we must look at how Dawkins defines the egoistic gene because this definition contains, at least tentatively, an important innovation. While Fisher takes genes for granted in his population genetic foundation of the evolutionary process, Dawk- ins takes Fisher's foundation of evolution for granted and infers from this context what it takes to be a gene. Taking the Fisher's cogency argument as a premise (that the spreading of self- contained genes is the essence of the evolutionary process), he draws the conclusion that a spe- cific rate of differential reproduction, in fact, defines when a stretch of DNA counts as a gene. He says, for instance, that we should look for "pieces of chromosome of indeterminate length which become more or less numerous than alternatives of exactly the same length" (Dawkins, 1982, p.90). This is a version of the gene definition given by G. C. Williams: "In evolutionary theory, a gene could be defined as any heredity information for which there is a favorable or unfavorable selection bias equal to several or many times its rate of endogenous change." (Wil- liams 1966, p.25). No matter which particular piece of DNA undergoes differential reproduc- tion, but whichever does is by definition a gene. The interesting point in the present context is

114 A Unified View of the Gene, or how to Overcome Reductionism that such a view can be interpreted as a feedback from the horizontal dimension in populations to how Dawkins (and G. C. Williams) wants genes to be understood. Does this include the com- ing into being of genes by downward causation? Such a question seems justified because this horizontal dimension is also for Dawkins sufficiently overriding to consider molecular biolog- ical cues at best of secondary importance for what counts as a gene. Dawkins argues that such a gene "is just a length of chromosome, not physically differentiated from the rest of the chro- mosome in any way." (Dawkins, 1976, p. 30) It may cut across what is conventionally called a gene by the molecular biologist. For instance, it may begin in the middle of one cistron and end in the middle of another one. Nevertheless this particular stretch of chromosome will qual- ify actually as a gene as soon as it undergoes differential reproduction (Dawkins, 1982, pp. 87- 88).

Unfortunately, this insight is immediately blurred by the selfishness which he ascribes to the gene. Dawkins, of course, never saw genes as products of phenotypic effects, but solely as rep- licators which by the goodness of their effects lever themselves into future generations. I as- sume the very reason for him to define genes in terms of their changing frequencies was to safeguard his idea of the selfish gene from the danger of being contradicted by empirical fact. Had he resorted to cues from a gene's internal structure, it would have always been possible to invoke some empirical circumstances in which such a gene's reproductive rate is causally de- termined by factors other than itself (by local interaction, linkage disequilibrium, group selec- tion, etc.). And by redefining the gene as that stretch of DNA which actually does change in frequency, he turns any potential criticism into a defence of the selfish gene. Dawkins, then, introduces the population genetic dimension only for upholding "selfishness" as the gene's dis- tinctive property. Though his populational dimension is suggestive of how genes are brought into being, and though he makes use of this dimension as a device for defining what is a gene, he always continues to view genes as the sole cause of their own change in frequency. This rules out the slightest possibility to conceive of downward causation and re-unites Dawkins with Fisher. Genes are then unmoved movers; Genes are things that "just are", he says (Dawk- ins, 1976, p.25), and thus portrays them as deeply ahistorical objects.

But neither has anybody else shown that genes have a deep history of coming into being in pop- ulations. May then not his critics inadvertently share grounds with him in ways that outweigh their criticism? emerge because Dawkins holds up a mirror more consistently than oth- ers. Be this as it may, if it can be shown that the real course of reductionism in evolutionary theory is that it has gone unnoticed how genes are brought into being through populations, then the implementation of such a program would turn Dawkins' view of the gene upside down, or maybe the right way up.

115 Peter John Beurton

CONCLUSIONS

What, then, is a gene? Arrays of accumulated genetic variations inside genomes which are caused by repeated shuffling of DNA strings make for some one difference in adaptedness dur- ing development. Various such overall differences among many individuals in turn induce se- lective discrimination, over the generations, within their genetic underpinnings, and such smallest units of selection we call genes. We need molecular biology not only for knowing a gene's internal structure, but also for being able to appreciate the holistic, or across-population dimension of the gene. Matter is as much coherent and continuos as it is particulate. Thus the evolutionary dynamics of populations leads to the compartmentation of DNA strings into genes. This is a nonpluralistic yet, hopefully, comprehensive view of the gene. We do then have "last organic particles" called genes; yet there is something behind them: the whole world of individuals-interacting-in-populations. Genes are not the product of human abstraction, but they become real in a process of material condensation taking place in populations of evolving organisms. And this also provides an answer to the long-standing problem of reductionism in population genetics.

REFERENCES

Avery, O. T., MacLeod, C. M., and McCarty, M. 1944. Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Journal of Eperimental Biology and Medicine 79 137-158

Barrell, B. G., Air, G. M., and Hutchinson, C. A. 1976. Overlapping genes in bacteriophage f X174. Nature 264 34-41

Beadle, G. W. and Tatum, E. L. 1941. Genetic control of biochemical reactions in neurospora. Proceedings of the National Academy of Science USA 27 499-506

Benzer, S. 1955. Fine structure of a genetic region in Bacteriophage. Proceedings of the Na- tional Academy of SciencesUSA 41 344-354

Blake, C. C. F. 1978. Do genes-in-pieces imply proteins-in-pieces? Nature 273 267

Britten, R. J. and Davidson, E. H. 1969. Gene regulation for higher cells: a theory. Science 165 349-357

Britten, R. J. and Davidson, E. H. 1971. Repetitive and non-repetitive DNA sequences: a spec- ulation on the origin of evolutionary novelty. Quart. Review of Biol. 46 111-133

Brosius, J. 1991. Retroposons - seeds of evolution. Science 251 753

116 A Unified View of the Gene, or how to Overcome Reductionism

Brosius, J., and Gould, S.J. 1992. On "genomenclature": A comprehensive (and respectful) taxonomy for pseudogenes and other "junk DNA". Proceedings of the National Academy of Science 89 10706-10710

Campbell, D.T. 1974. "Downward causation" in hierarchically organized biological systems. In: Ayala, F.C. and Dobzhansky, T. (eds.), Studies in the Philosophy of Biology. London. Pp. 179-186

Carlson, E. A. 1991. Defining the gene: an evolving concept. American Journal of Human Ge- netics 49 475-487

Crick, F. H. C. 1958. On Protein synthesis. Sympos. Society Experimental Biology 12 138-167

Crick, F. H. C. 1970. Central dogma of molecular biology. Nature 227 561-563

Dawkins, R. 1976. The Selfish Gene. Oxford: Oxford University Press

Dawkins, R. 1982. The Extended Phenotype. Oxford: W.H. Freeman and Company

Dobzhansky, T. 1973. Nothing in biology makes sense except in the light of evolution. Amer- ican Biology Teacher 35 125-129

Doolottle, W. F. and Spienza, C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature 284 601-603

Falk, R. 1986. What is a gene?. Studies in the History and Philosophy of Science 17 133-173

Fisher, R. A. 1930. The Genetical Theory of Natural Selection. Oxford: Clarendon

Fogle, T. 1990. Are genes the units of inheritance? Biology and Philosophy 5 349-371

Gale, J. S. 1990. Theoretical Population Genetics. London: Unwin Hyman

Gilbert, S. F., Opitz, J. M. and Raff, R. A. 1996. Resynthesizing evolutionary and develop- mental biology. Developmental Biology 173 357-372

Gilbert, W. 1985. Genes-in-pieces revisited. Science 228 823-824

Gilbert, W. 1978. Why genes in pieces? Nature 271 501

Jacob, F. and Monod, J. 1961. On the regulation ofgene activity. Cold Spring Harbor Sympos. Quant. Biol. 26 193-211

Jacq, C., Miller, J. R., and Brownlee, G. G. 1977. A pseudogene structure in 5S DNA of Xe- nopus laevis. Cell 12 109-120

Johannsen, W. 1909. Elemente der exakten Erblichkeitslehre. Jena: Gustav Fischer

Keller, E.F. 1987. Reproduction and the central project in evolutionary theory. Biology and Philosophy 2 383-396

117 Peter John Beurton

Kimura, M. 1968. Evolutionary rate at the molecular level. Nature 217 624-626

Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge UP

Maynard Smith, J. 1983. Staying neutral on evolution. Nature 306 713-714

Mayr, E. 1963. Animal Species and Evolution. Cambridge (Mass.): Belknap Press of Press

McClintock, B. 1951. Chromosome organization and genic expression, Cold Spring Harbor Symposia Quantitative Biology 16 1951 13-47

Muller, H. J. 1922. Variation due to change in the individual gene. American Naturalist 56 32- 50

Muller, H. J. 1966. The genetic Material as the initiator and the organizing basis of life. Amer- ican Naturalist 100 493-517

Nirenberg, M. W., Matthaei, J. H. 1961. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences of the United States of America 47 1588-1594

Rossmann, M. G., Moras, D., and Olsen, K.W. 1974. Chemical and biological evolution of a nucleotide-binding protein. Nature 250 194-199

Schaffner, K. 1974. The peripherality of reductionism in the development of molecular biol- ogy. Journal of the 7 1974 111-139

Schrödinger, E. 1944. What is life? Cambridge: Cambridge UP

Singer, M. and Berg, P. 1991. Genes and Genomes: A Changing Perspective. Mill Valey, CA: University Science Books

Smith, C. W. J., Patton, J. G., and Nadal-Ginard, B. 1989. Alternative splicing in the control of gene expression. Annual Review of genetics 23 527-577

Sturtevant, A. H. 1915. The behavior of chromosomes as studied through linkage. Zeitschrift für induktive Abstammungs- und Vererbungslehre 13 234-287

Watson, J. D. and Crick, F. H. C. 1953. The molecular structure of nucleic acids. Nature 171 737-738

Williams, G. C. 1966. Adaptation and Natural Selection. Princeton: Princeton University Press

Wright, S. 1931. Evolution in Mendelian populations. Genetics 16 97-159

Wright, S. 1932. The roles of mutation, inbreeding, crossbreeding and selection in evolution. Proceedings of the Sixth International Congress of Genetics 1 356-366

118 A Unified View of the Gene, or how to Overcome Reductionism

Wright, S. 1982. The shifting balance theory and macroevolution. Annual Review of Genetics 16 1-19

Comments by James R. Griesemer

Beurton's goal is to resolve problems of genetic reductionism with a novel characterization of the gene. The problems arise from the disunification of the gene concept due to the many dis- integrating discoveries of molecular and developmental genetics. His view of the gene is in line with, but more extreme than, others who define the gene in terms of the evolutionary process (Williams, Dawkins, Hull). The greatest novelty lies in Beurton's view that genes are not just "there" to be units of selection (or not), depending on whether they are germ-line (can be passed to offspring) and active (can influence their probability of being copied), as Dawkins (1982, p. 83) suggests. Instead, genes are actually generated by selection: "an array of nonlocalized DNA variations, whose reproduction comes to be controlled by some such overall difference large enough for selection to detect, begins to qualify as a gene" (p. 109). Selection acts as a force of Campbellian "downward causation" pulling together bits of genome that collectively cause a slight overall adaptive advantage to the organism whose genome it is. Prior to the operation of selection in a population there are neither genes nor traits. Adaptive genetic differences "mate- rialize" into genes, which must be seen as products of evolution by selection rather than as pri- mary causes. Downward causation converts genomic differences into genic ones. Reductionism is side-stepped as an ahistorical, structural solution to a historical, processual problem. The gene is reunified by rejecting pluralism of structural gene concepts in favor of a process-based unity.

I concur with Beurton's adoption of a process-based view of the gene and his pursuit of unity without reductionism of the old structuralist sort. While I admire the philosophical cleverness of his solution, there are difficulties. I agree that genes are products of evolution rather than un- moved movers, but I do not see how his concept of selective gene captures the phenomena ge- neticists are concerned with and am skeptical of the concept's coherence. Beurton wants to have his structuralist cake and eat it too: his gene is defined in terms of process but, once generated, it seems to have the qualitites of a classical gene. In my view, selection is too restrictive a pro- cess to be the conceptual basis of the gene. Since other combinations of evolutionary processes can do to molecular bits of genome what selection does, whether selection generates (organizes) genes should be counted an empirical question.

As a species of evolutionary gene, Beurton's selective gene depends on variation: no variation, no gene. Genes come and go as variation comes and goes. David Hull (1988) has taught us not to like this sort of ontological wandering in his criticism of theories of units of selection in which selection wanders up and down levels of the biological hierarchy as variation comes and goes. The same dislike should apply to Beurton's gene.

Moreover, Beurton's selective gene concept is more restrictive than the evolutionary gene be- cause variation must contribute to an overall fitness difference for selective genes to exist. Thus, even if there is differential reproduction in a population due to mutation and drift, this does not qualify genomic bits determining the differential for gene status. Only selection can make a gene, in Beurton's view, so there cannot be neutral genes. Kimura's neutral theory of molecular evolution poses no problem, Beurton believes, because he does not view mutations per se as events happening to genes until after selection has created them out of accumulated, mutated

119 Peter John Beurton genomic elements that do contribute fitness differences. Beurton assumes mutations are indi- vidually mostly neutral (undetectable by selection where s < 1/Ne) but that their accumulation in the genome may nevertheless yield some net fitness effect which selection can work on to begin to create a gene (p. 108). Presumably this collective non-neutrality can only arise out of neutrality by gene (sic.) interaction, but Beurton contrasts his theory to interactionist models (p. 101), so it is not clear how he thinks this gene generation process will work. Beurton's view as- sumes genetic gradualism: out of neutral or nearly neutral mutations, small overall fitness dif- ferences arise that are selected; the genomic bits determining these differences are gradually accumulated to become a gene when they are relocated in the genome as a hedge against recom- bination.

Beurton's theory relies on a mistaken understanding of the neutral theory. Kimura does not claim that most mutations are neutral or even that there is a tendency for this, but only that most of the mutations one observes segregating in any given snap-shot investigation will be neutral because mutations with selective advantages or disadvantages are quickly swept to fixation by selection compared to neutral variation subject only to stochastic forces. There is no basis for the gradualist assumption built into Beurton's view. Single mutations can produce large adap- tive differences (there are many known cases of point mutations that do this). If the rejoinder is that this is only because selection has already created many genes in which such mutations can appear, then Beurton's theory is one that properly applies only to the origin of life, before ge- nome organization took place – a significant restriction.

Moreover, forces other than selection can cause selective genes to come into and go out of ex- istence, so even if the gene is defined Beurton's way, its ontology cannot be fully understood in terms of selection alone. Mutation and migration can resurrect genes by reintroducing adaptive variation into populations that have gone to selective fixation. At fixation there is no adaptive variation, so selective genes go out of existence unless Beurton wants to have his structuralist cake, too, and claim that the genomic bits organized by selection in the past still count as genes after selection has ceased – they are still organized, after all. But if he does, then his genes face all the conceptual problems posed by the disintegrating arguments of molecular and develop- mental genetics because they can only be analyzed as structural genes when we lack evidence of the historical selection forces that created them. The problem is worsened if genes are defined as processual entities that exist only as – or while – the process operates, they cannot also be structures that survive the process. On the other hand, if their persistence does not depend on selection, then it is not obvious that their origination must either. Since selection is fuelled by variation and variation can run out as the result of its operation, selective genes are "dissipative structures" that must be continuously fuelled by variation or they cease to exist. Otherwise, it is hard to see how Beurton's concept is to be distinguished from the old-fashioned and problematic sort.

Perhaps such weirdness should be tolerated – no one said genes had to behave like ordinary macroscopic objects, even dissipative objects like flames and streams. Still, Beurton pays in- sufficient attention to the trait created along with his genes, overall or net fitness, which is quite peculiar. Moreover, the success of his gene concept depends on the empirical adequacy of un- derlying models of the evolutionary process. If such models are incompatible with one another, then the gene concept is incompletely specified without also specifying an underlying model. In standard models, fitness is a special property that does not behave as ordinary traits do. If evolution were Fisherian, then selective genes would only exist while there exists additive vari- ance in fitness. When there are no adaptive differences, there is no selection. But mass selection uses up this additive variance. Hence selective genes go out of existence at Fisher's long-run equilibrium. At equilibrium there will still be "genetic" variance in fitness, but selection will

120 A Unified View of the Gene, or how to Overcome Reductionism produce no response because all the variance in fitness will be non-additive. The Fisherian world creates the process-structure dilemma for Beurton.

If evolution is not Fisherian, but depends instead on a different interplay of evolutionary forces, then it is not clear how Beurton's gene should be understood. In Wright's shifting balance pro- cess, for example, evolution involves a balancing of drift, migration, and selection. Drift oper- ating in relatively small populations allows exploration of a neighborhood of adaptive peaks. Selection within demes drives them up peaks to produce locally favorable gene combinations. Migration among demes shifts the balance of gene combinations in the direction of the highest peak explored by the collection of demes connected by migration. Statistical interaction effects in the metapopulation can contribute to main effects (additive variance) in subpopulations, due to drift or inbreeding, allowing for a "conversion" of non-additive genetic variance in fitness into additive, selectable variation in fitness (Wade 1992).

Selection within demes eliminates additive variance in within-deme fitness, but drift converts non-additive between-deme variance in fitness into new, selectable additive variance. When variance in fitness is stratified into different levels, when and where are Beurton's genes genes? Are there genes in the subpopulations generated by local mass selection? Are there genes only in the metapopulation because a component of the variation that local selection operates on is determined by conversion of metapopulation interaction effects into local main effects? The conversion of non-additive to additive variance is a good technical illustration of "downward causation," but it also causes headaches in determining the location and individuation of genes if genes are defined in terms of process rather than structure but structures can persist beyond the process. In the Wrightean world, variance available to selection is created by a combination of evolutionary processes working in concert with selection, not operating prior to it (as in Beurton's adaptive-difference-out-of-neutral-mutation model). If the world is Wrightean rather than Fisherian, it would seem that the gene is generated from drift, selection, and migration to- gether, not from selection alone. A gene concept defined in terms of only one of them is an ide- alization that cannot be used to explain evolutionary dynamics in a Wrightean world. By relativizing the gene to a particular evolutionary process, Beurton leaves unclear what the rela- tionship is between his idealization and other relativized, ideal genes that might be constructed because he has not endorsed a view of evolution that gives logical priority to one of its compo- nent processes over others. My own view is that Beurton chose the wrong process: inheritance, not selection is primary because all evolutionary processes are inheritance processes, but not all evolutionary processes are selection processes (Griesemer, this workshop).

Despite these criticisms, I think the process-view holds promise and that Beurton has led us into interesting new territory, but not because the disintegrated gene can be reunified by selection theory. Rather, I think the gene has and always will face conceptual disintegration because it is best understood in terms of process and that the interesting phenomenon is the variety of resis- tance to disintegration. The gene is a boundary object for the biological sciences at the intersec- tion of many processes studied by different scientific specialties; its relativization to any one of them forms only a transient concept of interest to a few specialists (Rheinberger, this workshop; Star and Griesemer 1989). The problem for biology is to cope with the transience and instability of its concepts. The problem for science studies is to understand how biology can do it.

121 Peter John Beurton

REFERENCES

Dawkins, R. 1982. The Extended Phenotype. New York: Oxford University Press.

Hull, D. 1988. Science as a Process. Chicago: University of Chicago Press.

Star, S. and Griesemer, J. 1989. Institutional ecology, 'translations,' and boundary objects: am- ateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907 - 1939." Social Studies of Science 19: 387-420.

Wade, M. 1992. Sewall Wright: gene interaction and the shifting balance theory. in Oxford Sur- veys in Evolutionary Biology, eds. J. Antonovics and D. Futuyma, pp. 35-62. New York: Ox- ford University Press.

Comments by Thomas Fogle

Beurton reformulates the gene concept by reversing the bottom-up view of molecular genetics to argue that a top-down perspective is essential for explaining the many configurations of DNA domains. In his model, genes are those physical entities recruited through selection to increase in frequency in the population. Genes, he claims, are the smallest units of selection.

The ontology of a top-down gene is difficult to illuminate as a material structure. A gene be- comes real through selection of DNA that is polymorphic in the population. An adaptive DNA sequence spreads through the population at the expense of some other sequence. An advantage of Beurton's proposal is that one is truly freed to think of genes as comprised of collections of DNA strings that may or may not be contiguous. Whereas the bottom-up view must wrestle with whether to include regulatory sequences or trans-spliced products within the gene, no such problem exists for the evolutionary gene. The "clever" genome can recruit whatever is needed for adaptation. In other words, Beurton removes the requirement that a good gene must be a localizable unit.

But what does Beurton's gene look like? And how does it square with the gene of molecular and developmental biology which places a premium on pinning down the nucleotide specifica- tion of a locus. In other words, do the top and the bottom meet? One way to help unpack Beurton's evolutionary gene is through a hypothetical situation. In this way, I hope to find a starting point to seeking a bridge connecting the very abstract notion of a gene defined through selection and the materialist enterprise of molecular genetics.

Imagine a population of biologically identical beings on another planet. Such individuals gen- erate descendants by passing on a cell-like entity from the union of gametes containing an in- formation bearing molecules with the capacity to reform their species type. We are assuming no mutations of the heritable material and a very deterministic mode of development in which the genetic constitution specifies the organism. Of course, this begs the question of where this heritable material came from since we are assuming no evolutionary history. This impossibility is what sets the stage for why the alien creatures inform us about Beurton's gene concept.

To effect a more direct analogue to earth-like species, let us assume that the more-making sub- stance transmitted from generation to generation is DNA. All DNA from the aliens is identical

122 A Unified View of the Gene, or how to Overcome Reductionism and faithfully transmitted to produce an identical state among progeny.

So do our imaginary creatures have genes? According to the evolutionary gene model, the an- swer is no. There is no variation in the population and no means to individuate features of the phenotype that distinguish one member of the population from another. If one was to sequence the DNA, a sample from any one individual would be the same as a sample pooled from all in- dividuals. The genome is presumed to have sections of DNA that cause developmental changes through molecular events. Their effects on the phenotype might be enlightening about the physiology of the organism; however, they can not be used to explain how the species came to be. They are interesting bits of DNA, uninformative as to the history of the species.

One might be inclined to use the term gene to describe the bits of DNA recruited for various tasks essential to the living system. The bits do, after all, have an information content and di- rects production of the adult phenotype. However, one encounters the paradox alluded to above. There is no way to talk about where the bits came from. There is no variation into a populational sampling of DNA. These creatures have a heritage (genetically uniform ancestors in the population), but they have no genetic history. One could talk about regions of DNA as a tool for explaining developmental events while leaving the ontology for their origin unex- plained. The bits function for life support and can not be assigned an adaptive role. Adaptation is a relative term in which one organism must be better adapted than another. No such relation exists here. To call our bits of DNA "genes", we must be willing to give up any notion that we can explain how they came to be even though they affect the phenotype. We have a biology of expression without a biology of evolution. Where does this leave us?

To contrast the top-down evolutionary gene with a bottom-up molecular gene, let us induce a mutation in one of the bits that has physiological importance in a member of the alien popula- tion. For the first time there is inherited variation within the species. If the mutated region is differentially selected, it will increase in relative frequency compared with the original state. Here, I am assuming that the mutation is not neutral and that it spreads through the population with the reproductive process.

We have now witnessed the emergence of a region of the genome that is both an evolutionary gene and molecular gene. The newly formed bit has an evolutionary history and affects the phe- notype. Two classes of DNA bits reside in the genome. The bit that mutated became a an evo- lutionary/molecular gene. All other bits, which did not mutate and are monomorphic, are molecular genes only. The distinction is an artificial one because real organisms, those on earth, have bits of interesting DNA that effect the phenotype, and all have a history. At any moment in time, those bits may or may not be evolutionary genes because the population has no variation at that site, but all molecular genes have a history of having been a part of an evo- lutionary gene at some point in the past. Therefore, and here is the main point, Beurton's gene is ontologically prior to the bits that we acknowledge provide explanatory value for physiolog- ical processes in the present day. These are bits of DNA essential for development that must have passed through an evolutionary window and are, or were, bits in an evolutionary gene.

To round out the contrast, we need to be aware that we are without an explanation for how a bit of DNA originates. Our example presumed that the mutation was placed in a prior existing bit of DNA. This is important because it is the link connecting Beurton's proposal with the molec- ular gene. The genomes of earth organisms contain domains of functional significance such as open reading frames, TATA boxes, and other regulator sites. All are sites of interaction be- tween that domain and the cellular system. TATA boxes are sites of attachment of RNA poly- merase; enhancers are sites for binding transcriptional proteins; the site of RNA production is

123 Peter John Beurton a location for complementary polymerization, and so forth. Domains of DNA are potential sites where information content has an effect on the molecular state of the cellular system. Second- ary effects from the interaction of domains with the system, such as cellular translation of mR- NA, can induce, in very indirect ways, changes in the organismal phenotype. This bottom-up view moves through interactors, sites where DNA strings mediate with the surrounding proto- plasm.

Likewise, from the top-down view, the evolutionary gene selects DNA bits, not just mutated nucleotides, by acting through the physiology of the organism. Beurton's gene depends on the effects of DNA in a cellular system to give selection something to act on. Beurton too has an intervening level that requires accounting. His mutated bit of DNA must also be an interactor in order for selection to occur. A base change in the DNA can be selected only through a nu- cleotide string because they collectively interact with their intracellular surroundings.

Not all interactors with molecular genes would be interactors with evolutionary genes. The lat- ter must be polymorphic in the population as a condition for it being a gene. The mutated DNA of the alien creatures from my example had interactors in common with both an evolutionary gene and a molecular gene. All other bits in the alien genome had interactors that were part of molecular genes only. Keep in mind that this is so only because our focus is on present day bits of DNA. All monomorphic molecular genes in earth organisms have been part of evolutionary genes at some point in the past. The paradox of having bits without a genetic history does not exist for real organisms.

The reverse is true as well; evolutionary genes may have interactors that are not part of molec- ular genes. Telomeres of eukaryotes are non-sticky ends of chromosomes composed of spe- cific DNA sequences that promote chromosome integrity. In Beurton's scheme, selection for telomeres, to prevent chromosome from randomly fusing, would have been a gene at some point in the evolutionary past, and if there is variability that is selected in a current population, would still be a gene. In molecular genetics terms, these are features of chromatin structure, not genes.

Further exploration of the intervening level, the interactors, that overlap but do not map one-to- one between these two perspectives on the gene concept, might provide insight on their shared relationships. For example, the interactor proposal suggests that the first genes became genes because the hereditary molecule had a polymeric site of interaction with the primordial soup. Beurton's gene comes into play through mutation and positive selection of a polymeric string that engages in interaction with the surroundings. The molecular gene is the result, or product of the interaction promoting survival. Adaptation through selection is the agent enabling the survival of the molecular gene and its physiological effect in descendants. Although many in- teractions might arise between polymeric strings of one sort or another and the primordial soup, those that matter are those that lead to evolutionary survival. Therefore, the evolutionary gene establishes what becomes a molecular gene. That is, it is ontologically prior. In contrast, the bottom-up approach offers hope for explaining why a structure influences the system within which it is found. Unencumbered by whether there is evidence for selection, one can method- ologically pursue questions on organizational features and interactions.

In my efforts to evaluate the molecular gene (see my paper), I find enormous problems with attempts to unify the myriad DNA structures into a coherent gene concept. My solution is to forego the use of the term gene as a blanket to cover up the diversity of molecular configura- tions. At some point there is greater clarity in talking about domains and their collective effects on producing an expressed product than talking about the thousands of molecular "genes" in a

124 A Unified View of the Gene, or how to Overcome Reductionism genome. Beurton abandons hope that a more refined ascription for molecular biology might be fruitful and solves the problem from a different angle by unifing the myriad DNA structures into an evolutionary gene concept. His gene, however, leaves molecular biologists without a tractable means to call something a gene. One would need to empirically test each bit of DNA from a populational context to determine if it is both polymorphic and selected. This may be a lot to ask, for it even places strictures on bits inherited in a Mendelian fashion, as they must be shown to have a selected difference. Beurton is motivated from a theoretical standpoint to aid in explaining how the genome came to be. His offering is not intended to aid molecular genet- icists in ways to codify their bits.

The implications of Beurton's proposal are important. He gives us a fresh way to envision the gene by explicitly placing it in a historical context. By doing so, there may be hope for enlarg- ing, or at least clarifying, the gene concept from its splintered and sometimes disparate mean- ings. Perhaps further exploration at the intersection between the top and bottom, at the level of the interactors, might deepen insight in this regard.

125

ADDRESSES

DR. PETER JOHN BEURTON MAX-PLANCK-INSTITUT FÜR WISSENSCHAFTSGESCHICHTE WILHELMSTRASSE 44 10117 BERLIN GERMANY

PROFESSOR RAPHAEL FALK DEPARTMENT OF GENETICS THE HEBREW UNIVERSITY OF JERUSALEM 91904 ISRAEL PROFESSOR THOMAS FOGLE DEPARTMENT OF BIOLOGY SAINT MARY'S COLLEGE SOUTH BEND, IN 46556 USA

PROFESSOR FRED GIFFORD MICHIGAN STATE UNIVERSITY DEPARTMENT OF PHILOSOPHY EAST LANSING, MI 48824 USA

PROFESSOR SCOTT F. GILBERT SWARTHMORE COLLEGE DEPARTMENT OF BIOLOGY 500 COLLEGE AVENUE SWARTHMORE, PA 19081-1397 USA

PROFESSOR JAMES R. GRIESEMER UNIVERSITY OF CALIFORNIA AT DAVIS DEPARTMENT OF PHILOSOPHY DAVIS, CA 95616 USA

PROFESSOR EVELYN FOX KELLER MASSACHUSETTS INSTITUTE OF TECHNOLOGY E51-263 CAMBRIDGE, MA 02139 USA

PROFESSOR MICHEL MORANGE ECOLE NORMALE SUPERIEURE DÉPARTEMENT DE BIOLOGIE UNITÉ DE GÉNÉTIQUE MOLÉCULAIRE 46 RUE D'ULM F-75230 PARIS CEDEX 05 FRANCE

PROFESSOR HANS-JÖRG RHEINBERGER MAX-PLANCK-INSTITUT FÜR WISSENSCHAFTSGESCHICHTE WILHELMSTRASSE 44 10117 BERLIN GERMANY