Intro Riedl AnalogyNK Details Empirical Conclusion References

The Role of Gene Origin in the of

July 24, 2015, EvoEvo Workshop, ECAL 2015, University of York, UK

Lee Altenberg The Konrad Lorenz Institute for Evolution and Cognition Research, Klosterneuburg, Austria [email protected] http://dynamics.org/Altenberg/

Lee Altenberg | The Genome-As-Population | 1/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Basic Problem of Evolvability

Q. How do you randomly alter a complex organism and improve its function with non-vanishing magnitude at non-vanishing probability? A. By diverse mechanisms that focus variation in directions with adaptive opportunity while suppressing variation in harmful directions.

Lee Altenberg | The Genome-As-Population | 2/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

New theory and some empirical studies reveal variation production of the organism resembling ‘viscoelasticity’ (suppression of deleterious mutation production) LONG-TERM SELECTION MUTATIONAL PRESSURE

Lee Altenberg | The Genome-As-Population | 3/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Classical quantitative and models precluded such ‘viscoelasticity’

LONG-TERM SELECTION NO MUTATIONAL PRESSURE X ROBUSTNESS

Lee Altenberg | The Genome-As-Population | 4/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Variation can proceed along neutral networks of genotypes to find adaptive variation (Wagner, 2008)

ADAPTIVE VARIATION STABILIZING SELECTION

NEUTRAL NETWORKS

NEUTRAL NETWORKS

Lee Altenberg | The Genome-As-Population | 5/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Varying selection can wear ‘trails’ in the genotype-phenotype map that enhance re-evolvability: Meyers et al. (2005)

Kashtan and Alon (2005)

Crombach and Hogeweg (2007, 2008)

Draghi and Wagner (2008) Needed is a comprehensive understanding of how the variational properties of organisms are shaped in evolution.

Lee Altenberg | The Genome-As-Population | 6/60 Intro Riedl AnalogyNK Details Empirical Conclusion References Levinton (1988, p. 494) Genetics, Paleontology, and Macroevolution

“Evolutionary biologists have been mainly concerned with the fate of variability in populations, not the generation of variability. . . . The genetic and epigenetic factors that generate variability have received relatively little attention. This could stem from the dominance of population genetic thinking, or it may be due to a general ignorance of the mechanistic connections between the genes and the phenotype. Whatever the reason, the time has come to reemphasize the study of the origin of variation.”

Lee Altenberg | The Genome-As-Population | 7/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Classical and “Extended” Population Genetics

1 The fate of variation: The main concern of classical population genetics

2 The generation of variation: A central concern of the “extended evolutionary synthesis” (Pigliucci and Müller, 2010)

3 Theoretical approach to 2: Examine the fate of variation for the generation of variation.

Lee Altenberg | The Genome-As-Population | 8/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

A Fundamental quantity in the “generation of variability”:

The Distribution of Fitness Effects of Mutation PAPERS PER YEAR WITH "distribution of fitness effects"

200

150

100

50

[Google Scholar] 0 1985 1990 1995 2000 2005 2010 2015 YEAR Lee Altenberg | The Genome-As-Population | 9/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Distribution of Fitness Effects of Mutation

Recent talk by Jeffery D. Jensen: “On quantifying the distribution of fitness effects of new mutations—an experimental approach”. MUTATIONAL Yeast Hsp90 ROBUSTNESS Figure 3 GENETIC LOAD EVOLVABILITY ... Downloaded from

Bank, C., Hietpas, R. T., Jensen, J. D., & Bolon, D. N. (2015). A systematic survey of an intragenic epistatic landscape. Molecular biology and evolution, 32(1), 229-238. Lee Altenberg | The Genome-As-Population | 10/60

Figure 4 http://mbe.oxfordjournals.org/ at Vienna University Library on April 11, 2015

! "#! 27/32

A 700700 synonymous 600600 missense nonsense 500500

400400

300300 Intro Riedl AnalogyNK Details Empirical Conclusion References 200200 Downloaded from Number of alleles of Number Distribution of100100 Fitness Effects of Mutation 0 MUTATIONAL E. coli TEM-1 b-Lactamase -3 -2 -1 ROBUSTNESS0

FitnessGENETIC effect LOAD (log10 w) EVOLVABILITY http://mbe.oxfordjournals.org/ B ... 25002500

20002000

15001500 at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 10001000

Number of alleles of Number 500500

00 -3 -2 -1 0 Fitness effect (log w) 10 Firnberg, E., Labonte, J. W., Gray, J. J., & Ostermeier, M. (2014). A comprehensive, high- resolutionFig map. 1. of Distribution a gene's fitness of genelandscape. fitness Molecular effects Biology (DFE) and of Evolution mutations, msu081. in TEM-1. (A) The DFE of point Lee Altenberg | The Genome-As-Population | 11/60 mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e.

all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring

ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1.

The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the

DFE are indicated. Gene fitness as a function of codon substitution is provided as Data S1.

Intro Riedl AnalogyNK Details Empirical Conclusion References

Do distributions of fitness effects evolve?

Yes, all the time. Described as early as (1930): &ŝƐŚĞƌ͛ƐŐĞŽŵĞƚƌŝĐĂůĂƌŐƵŵĞŶƚ!

But in Fisher’s model, evolvability always goes down with . How can evolvability ever evolve to increase?

Lee Altenberg | The Genome-As-Population | 12/60 "#$!%&'(($)!*#$!$++$,*!-+!'!&.*'*/-0!*#$!&-)$!(/1$(2!/*!/%!*-!3$!3$0$+/,/'(! Intro Riedl AnalogyNK Details Empirical Conclusion References Riedl (1977) A systems-analytical approach to macroevolutionary phenomena. Q. Rev. Biol. 52: 351–370.

Evolution of evolvability and mutational robustness through new genes. Riedl considers the following situation, where adaptation requires simultaneous changes in two genes: A B

µ2 a b µ FITNESS µ

A b a B

Lee Altenberg | The Genome-As-Population | 13/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Riedl (1977), cont’d.

“What would happen if independent genetic units, the structural results of which have become functionally dependent, were also to become epigenetically dependent, for example, by adopting a superimposed genetic unit upon which both are dependent, as in the case of two structural genes dependent on an operator gene? OP+ A B µ µ2 a b op- µ FITNESS µ

A b a B

Lee Altenberg | The Genome-As-Population | 14/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Riedl (1977), cont’d.

OP+ A B µ µ2 a b op- µ FITNESS µ

A b a B A. “The mutation of only one genetic unit, the operator, will result in the change of both. Then the chance of a successful alteration . . . would increase as much as a millionfold.”

Lee Altenberg | The Genome-As-Population | 15/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Riedl (1977), concluded

“Such adaptive advantages . . . are so tremendous that the invention of a superimposed genetic unit must be expected, even if it would be a millionfold or trillionfold more unlikely than every other alteration within the genome.” “. . . The chances of successful adaptation increase if the genetic units, by insertion of superimposed genes, copy the functional dependencies of those phene structures for which they code. A radical idea: variation production records of the history of adaptation. Riedl’s mechanism arises from new gene creation.

Lee Altenberg | The Genome-As-Population | 16/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Is new gene creation no different from allelic substitution?

Reviewer for Evolution: “The subject of new genes and their role in the evolution is, indeed, interesting. The interest, however, is in the mechanisms by which they are introduced rather than in what happens to them after the introduction. The moment a new gene is introduced, it becomes totally equivalent from the population genetics point of view to any other gene in the genome. A new gene can be regarded as a mutation from a preexisting ‘zero’ allele in the respective locus.” What did the reviewer miss?

Lee Altenberg | The Genome-As-Population | 17/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Unique Effects of Gene Duplication:

When new genes are added to the genome, three things are changed: 1 The number of degrees of freedom for genetic variation is increased

2 The probability of allelic variation in that gene family is increased

3 The probability of subsequent27/32 gene duplications27/32 of that A A 700700 700700 synonymous synonymous 600600 missense 600600 missense nonsense nonsense 500500 500500

400400 400400

300300 300300

200200 200200 Downloaded from Downloaded from gene family is increased.alleles of Number alleles of Number 100100 100100

0 0 -3 -2 -1 0 -3 -2 -1 0

Fitness effect (log10 w) http://mbe.oxfordjournals.org/ Fitness effect (log10 w) http://mbe.oxfordjournals.org/ B B 25002500 25002500 27/32

2000 2000 SUBSEQUENT 2000 FIXED 2000 15001500 A A 15001500 B 700700

at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 synonymous at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 10001000 600600 missense 10001000 nonsense DUPLICATION 500500

Number of alleles of Number DUPLICATION alleles of Number 500500 500500 400400

00 300300 00 -3 -2 -1 0 -3 -2 -1 0 200200 Fitness effect (log w) Fitness effect (log w) Downloaded from 10 alleles of Number 10 100100 Fig. 1. Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point 0 Fig. 1. Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point -3 -2 -1 0

mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e. mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codonhttp://mbe.oxfordjournals.org/ substitutions (i.e. Fitness effect (log10 w) B all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring 25002500 all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1. ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1. 20002000 The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the GENETIC 15001500 C

DFE are indicated. Gene fitness as a function of codon substitution is provided as Data S1. DFE are indicated. Gene fitness as a function of codon substitution is provided at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 as Data S1. 10001000

SEQUENCE alleles of Number 500500 00 -3 -2 -1 0 Fitness effect (log w) 10

Fig. 1. Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point

mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e.

all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring

ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1.

The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the SUBSEQUENT ALLELICDFE are indicated. Gene fitness as a functionVARIA of codon substitution is providedTION as Data S1.

Lee Altenberg | The Genome-As-Population | 18/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

New dimensions of variation added by new genes

Much thought has gone to the causes of gene duplication. But what are the cumulative consequences of gene duplication? Processes that shape genome growth shape genetic variation production.

Lee Altenberg | The Genome-As-Population | 19/60 Intro Riedl AnalogyNK Details Empirical Conclusion References Yang et al. (2003) Organismal complexity, protein complexity, and gene duplicability

Lee Altenberg | The Genome-As-Population | 20/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Gene “Survivability”

Gene “survivability” is the language of natural selection. But they are not talking about organismal fitness. By “gene survivability” they mean long-term preservation of the gene in the genome. Differences in gene “survivability” therefore amount to a kind of natural selection in the genome-as-population. Suppose we pursue this analogy in full?

Lee Altenberg | The Genome-As-Population | 21/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Population Analogous Processes

Eukaryotic species have relatively closed genomes. This renders the genome analogous to a population of organisms: 1 gene duplication → reproduction

2 gene deletion or pseudogenization → death

3 de novo gene creation, horizontal gene transfer → immigration

4 maintenance of gene properties over time → heritability

Lee Altenberg | The Genome-As-Population | 22/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

A Novel Level Darwinian Processes: Genome-as-Population

Lewontin’s (1970) sufficient elements for Darwinian evolution: 1 Heritable

2 Variation in

3 Fitness (viability and fecundity) Are these elements all present in the genome-as-population?

Lee Altenberg | The Genome-As-Population | 23/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

A Novel Level Darwinian Processes: Genome-as-Population

If there are 1 gene properties that produce differences in gene survivability and duplicability, and

2 these gene properties persist over time and over gene duplications (heritability),

3 then the genome can become populated by genes with higher gene duplicability and survivability. That is, evolvability can evolve.

Lee Altenberg | The Genome-As-Population | 24/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

An analogy for selection: Vulnerabilities of WWII bombers

Q. WHERE SHOULD STRATEGIC ARMOR BE ADDED? DISTRIBUTION OF HOLES ON RETURNING R.A.F. BOMBERS, WWII

Lee Altenberg | The Genome-As-Population | 25/60 PART VIII VULNERABILITY OF A PLANE TO DIFFERENT TYPES OF GUNS1 In part V we discussed the case where the plane is subdivided into several equi-vulnerability areas (parts) and we deal t with the problem of determining the vulnerabi l i ty of each of these parts. It was pointed out in part VII that the method described in part V can be applied to the more general problem of esti- mat ing the probabi l i ty q(i , j) that a plane wi l l survive a hi t on part i caused by a bul let fired from gun j . However, this method is based on the assumption that the value of Y(i , j) is known where Y(i , j) is the conditional probabi l i ty that part i is hi t by gun j knowing that a hi t has been scored. In pract ice i t may be difficul t to determine the value of Y(i,j) since the proport ions in which the different guns are used by the enemy may be unknown. Intro RiedlOn the oAnalogyther hNKan d, it sDetailseems likelyEmpirical that f requenConclusiont ly we s hall beReferences able to estimate the conditional probability Y( i l j ) that part i is hit knowing that a hit has been scored by gun j . The purpose An analogy,of thi cont’ds memorandum is to invest igate the quest ion whether q( i , j ) can be estimated from the data assuming that merely the quan- t i t ies Y( i l j ) are known a priori . In what follows we shall rWesALD,trict ABRAHAM. ourselves t 1943.o the "Acas METHODe of inde pOFend ESTIMAence, i .TINGe. , i tPLANE wi l l b e assumed that the probability of surviving a hit does not depend oVULNERABILITYn the non-destruc BASEDt ive hi t ONs a lDAMAGEready rec eOFive SURd. VIVORS" Let 6(i,j) be the conditional probability that part i is hit by gun j knowing that a hi t has been scored and the plane survived the hi t . Furthermore, let q be the probability that the plane survives a hit (not knowing which part was hit and which gun scored the hit) . Then, simi lar to equation 82, we shall have

(101)

Let q(j) be the probabi l i ty that the plane will survive a hit by gun j (not knowing the part hi t) . Then obviously

, j) • (102) 1 Let <5( i | j ) be the conditional probabi l i ty that part i is hi t by gun j knowing that a hi t has been scored by gun j and the plane survived the hit . Clearly

s part of "A Method of Est imat ing Plane Vulnerabi l i ty Based on Damage of Survivors" was published as SRG memo 126 and AMP memo 76.8.

-78- Lee Altenberg | The Genome-As-Population | 26/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

An analogy, cont’d

A. ADD ARMOR WHERE THERE AREN'T ANY HOLES!

ABRAHAM WALD

Lee Altenberg | The Genome-As-Population | 27/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

An analogy, cont’d

The sampled planes were all very special—they were the ones that survived. In the same way, the genes functioning in the genome are very special—they are the ones that survived.

Lee Altenberg | The Genome-As-Population | 28/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

WHAT ARE THE "MISSING HOLES" AMONG NEW GENES?

DISTURBING FUNCTIONS UNDER STABILIZING SELECTION,

e.g. DOSAGE

Lee Altenberg | The Genome-As-Population | 29/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

"MISSING HOLES" AMONG NEW GENES

NEUTRAL ADDITIONS OR ARE HIT BY LOSS-OF-FUNCTION THAT DRIFT TO MUTATIONS, PSEUDOGENIZATION, EXTINCTION OR DELETION

Lee Altenberg | The Genome-As-Population | 30/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

NEW GENES THAT "MAKE IT" • IMMEDIATE EFFECT IS ADVANTAGEOUS OR NEUTRAL AND GOES TO FIXATION • SUBSEQUENT MUTATIONS GIVE • NEOFUNCTIONALIZATION, • SUBFUNCTIONALIZATION, OR • ESCAPE FROM ADAPTIVE CONFLICT

Lee Altenberg | The Genome-As-Population | 31/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

An Illustrative Model

To illustrate how selection on de novo gene origins can shape the genotype-phenotype map, I will use the NK landscape model of (Kauffman and Levin, 1987).

Lee Altenberg | The Genome-As-Population | 32/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

NK Model (Kauffman and Levin, 1987) for Genome Growth

FUNCTIONS

!1 !2 !3 !4 !5 !6 !7 ... ! f

GENOTYPE – FUNCTION MAP x GENOME NEW GENE 1 f w(x)= ! "i(x) f i=1

Lee Altenberg | The Genome-As-Population | 33/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

A Model of De Novo Gene Origin

GENOME GROWTH ALGORITHM:

ADD A NEW GENE TO THE GENOME

OBTAIN ITS FUNCTIONAL EFFECTS Assumptions: RANDOMLY FROM A GIVEN DISTRIBUTION 1 Strong selection IF 2 Weak mutation NEW GENE PRODUCES NEW GENE PRODUCES A FITNESS DECREASE A FITNESS INCREASE 3 CONSTRUCTIONAL Rare gene creation SELECTION

REJECT IT KEEP IT

ADAPT THE GENOME THROUGH ALLELIC SUBSTITUTION UNTIL IT IS AT A FITNESS PEAK

Lee Altenberg | The Genome-As-Population | 34/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

NK Model for De Novo Genes

SURWITHVIVING CONSTRUCTIONAL DE NOVO SELECTION GENES WITHOUTRANDOM CONSTRUCTIONAL DE NOVO GENESSELECTION

30 30

25 25

20 20

FUNCTIONS 15 15

{10 10

5 5

0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 GENE NUMBER GENE NUMBER

The "missing holes" New genes are filtered out to target only phenotypes with adaptive opportunity—i.e. “genes as followers” Lee Altenberg | The Genome-As-Population | 35/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

NK Model for De Novo Genes

20 AVERAGE 17.5 , 15 kn 12.5 10 7.5 5 2.5 0 5 10 15 20 25 30 GENE NUMBER, n

Lee Altenberg | The Genome-As-Population | 36/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

NK Model for De Novo Genes

SURWITHVIVING CONSTRUCTIONAL DE NOVO SELECTION GENES WITHOUTRANDOM CONSTRUCTIONAL DE NOVO SELECTIONGENES

LATER GENES HAVE DISTRIBUTION LOWER PLEIOTROPY

150 150

100 100

30 30 50 50

0 20 0 20

5 5 10 10 10 10 15 GENE 15 GENE 0 0 PLEIOTROPY 20 NUMBER PLEIOTROPY 20 NUMBER

Lee Altenberg | The Genome-As-Population | 37/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

De novo genes:

1 If selection has driven them to fixation in the population, they should be enriched for some properties of gene survivability (in the simulation, enrichment is for low pleiotropy)

2 But de novo genes have not been through iterations of gene duplication

3 They should therefore differ from highly duplicated genes in having

1 lower duplicability and

2 lower heritability of their survivability in time.

Lee Altenberg | The Genome-As-Population | 38/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Key Idea #1: Common Conditioned Histories

The mere existence of a gene tells us something about its history: The functioning genes in the genome generally share a common conditioned history that 1 When created, their effects were neutral or advantageous

2 The new genes grew in frequency or became fixed in the population [*a step skipped in whole genome duplication]

3 They came to be preserved by selection before they could be deleted or pseudogenized.

Lee Altenberg | The Genome-As-Population | 39/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Key Idea #1

Being neutral or advantageous when created (distribution of fitness effects of gene creation) in turn depends upon the gene’s map to the phenotype — i.e. its spectrum of phenotypic effects 1 varying traits with adaptive opportunity, while

2 leaving alone functions under stabilizing selection. i.e. a kind of modularity. In a nutshell: genes are born modular, or not born at all.

Lee Altenberg | The Genome-As-Population | 40/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Key Idea #2: The present connects to the past

The spectrum of phenotypic effects is preserved to varying degrees over time Those highly preserved over long periods: role in regulatory and interaction networks and development

peptide secondary structure Less preserved over long periods: Primary amino acid sequences

Primary nucleotide sequences

Distribution of fitness effects of mutation or duplication This question is almost entirely unexplored theoretically and empirically.

Lee Altenberg | The Genome-As-Population | 41/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Heritability between Duplications

Phylogenetic relationships of the Arabidopsis peroxidases (Tognolli, et al (2002)) (duplicates all have peroxidase function):

Lee Altenberg | The Genome-As-Population | 42/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Singh et al. (2014) Human Dominant Disease Genes Are Enriched in Paralogs Originating from Whole Genome Duplication

Singh et al. (2014) show that conditioning on the gene origin (whole genome duplication of 500 mya, w/ or w/out subsequent small duplication) alters the production of dominant vs. recessive disease mutations.

Figure 1. Distributions of WGD, SSD, and singletons in (A) the Figure 2. Distributions of WGD, SSD, and singletons for human Figure 1. Distributions of WGD, SSD, and singletons in (A) the whole human genome, (B) monogenic disease (MD) genes [1], orthologs of mouse genes (A) testedFigure for2. Distributions essentiality in of mouse WGD, SSD, and singletons for human whole human genome, (B) monogenic disease (MD) genes [1], (C) recessive MD genes, and (D) dominant MD genes. (***) [13], (B) found to be essentialorthologs in mouse, of and mouse (C and genes D) after(A) tested for essentiality in mouse (C) recessive MD26 genes, and (D) dominant MD genes. (***) [13], (B) found to be essential in mouse, and (C and D) after corresponds to highly significant deviations (p,10 , FE test) and (**) removing26 dominant disease genes, oncogenes, and genes with 23 corresponds to highly significant deviations (p,10 , FE test) and (**) removing dominant disease genes, oncogenes, and genes with to significant deviations (p,10 , FE test) from the references in2 (A).3 dominant negative mutations or autoinhibitory folds [5]. (*) to significant deviations (p,10 , FE test) from the references in (A). 2 Note that recessive MD genes (C) do not show any significant corresponds to small deviationsdominant (10 3,p,0.05, negative FE test) mutations from the or autoinhibitory folds [5]. (*) Note that recessive MD genes (C) do not show any significant 23 deviations in WGD, SSD, or singleton contents (p.0.3, FE test), references in (A). Note that humancorresponds orthologs to of small essential deviations genes in (10 ,p,0.05, FE test) from the Lee Altenberg | The Genome-As-Populationdeviations in WGD, SSD, or | singleton 43 contents/60 (p.0.3, FE test), although taking into account the age of SSD duplicates reveals a mouse do not show any significantreferences deviations in (A). in Note WGD, that SSD, human or orthologs of essential genes in although taking into account the age of SSD duplicates reveals a relative lack of recent SSD genes in MD genes (see text). singleton contents (p.0.05, FE test)mouse once do dominant not show disease any significant genes, deviations in WGD, SSD, or relative lack of recent SSD genes in MD genes (see text). doi:10.1371/journal.pcbi.1003754.g001 oncogenes, and genes with dominantsingleton negative contents mutations (p.0.05, or FE auto- test) once dominant disease genes, doi:10.1371/journal.pcbi.1003754.g001 inhibitory folds have been removed.oncogenes, Yet, taking and into accountgenes with the age dominant of negative mutations or auto- SSD duplicates reveals a relative lackinhibitory of recent folds SSD have genes been in removed. essential Yet, taking into account the age of genome duplication, while the overall genome exhibits 17.3% SSD duplicates reveals a relative lack of recent SSD genes in essential genome duplication,2 while the overallgenes genome (see text). exhibits 17.3% [respectively 15.1%] instead (p = 4.5610 34; FE test). This genes (see text). [respectively 15.1%] instead (p = 4.5doi:10.1371/journal.pcbi.1003754.g00210234; FE test). This suggests that the functional compensation, which can occur 6 doi:10.1371/journal.pcbi.1003754.g002 suggests that the functional compensation, which can occur between functionally redundant duplicates, leads to a depletion the mode of inheritance of human MDs. To this end, we retrieved between functionally redundant duplicates, leads to a depletion the mode of inheritance of human MDs. To this end, we retrieved (not an enrichment) of MD genes with recent SSD, in agreement the available information on the dominance and recessiveness of (not an enrichment) of MD genes with recent SSD, in agreement the available information on the dominance and recessiveness of with an earlier report [10]. MDs from Online Mendelian Inheritance in Man (OMIM) [11] with an earlier report [10]. MDs from Online Mendelian Inheritance in Man (OMIM) [11] In addition, we note that, while recent gene duplicates might be and Blekhman et al. [12]. Manual curation yielded 620 autosomal In addition, we note that, while recent gene duplicates might be and Blekhman et al. [12]. Manual curation yielded 620 autosomal able to mask the phenotypic effect of recessive (e.g., loss-of- dominant and 838 autosomal recessive MD genes after excluding able to mask the phenotypic effect of recessive (e.g., loss-of- dominant and 838 autosomal recessive MD genes after excluding function) mutations, dominant (e.g., gain-of-function or dominant sex-linked genes and MD genes documented as both dominant Figure 1. Distributions of WGD, SSD, and singletons in (A) thefunction)Figure mutations, 2. Distributions dominant (e.g., of WGD, gain-of-function SSD, and singletons or dominant for humansex-linked genes and MD genes documented as both dominant negative) mutations should typically lead to deleterious phenotypic and recessive (Dataset S1). whole human genome, (B) monogenic disease (MD) genes [1]negative), orthologs mutations of should mouse typically genes (A)lead tested to deleterious for essentiality phenotypic in mouseand recessive (Dataset S1). (C) recessive MD genes, andeffects (D) dominant regardless MDof the genes. presence(***) of any[13] functionally, (B) found redundant to be essential inUsing mouse, Chen and et (Cal.’s and dataset D) after and analysis, we then found that 26 effects regardless of the presence of any functionally redundant Using Chen et al.’s dataset and analysis, we then found that corresponds to highly significantparalog deviations at (p a, different10 , FE locus test) and on the (**) humanremoving genome. dominant disease genes,autosomal oncogenes, recessive and MD genes gene with duplicates (with possible functional 23 paralog at a different locus on the human genome. autosomal recessive MD gene duplicates (with possible functional to significant deviations (p,10 , FEIn test) order from to assess the references the extent in of (A). possibledominant functional negative compensation mutationscompensation) or autoinhibitory do not folds exhibit[5]. (*) significantly more correlated In order to assess the extent of possible functional23 compensation compensation) do not exhibit significantly more correlated Note that recessive MD geneson (C) the do retention not show of MD any gene significant duplicates,corresponds we have thus to investigated small deviationsexpression (10 ,p, profiles0.05, FE than test) NDfrom genes the (p = 0.426, Wilcoxon Rank deviations in WGD, SSD, or singleton contents (p.0.3, FE test),on the retentionreferences of in MD (A). gene Note duplicates, that human we orthologs have thus of investigated essential genesexpression in profiles than ND genes (p = 0.426, Wilcoxon Rank although taking into account the age of SSD duplicates reveals a mouse do not show any significant deviations in WGD, SSD, or relative lack of recent SSD genes in MD genes (see text). singleton contents (p.0.05, FE test) once dominant disease genes, PLOS Computational Biology | www.ploscompbiol.org 2 July 2014 | Volume 10 | Issue 7 | e1003754 doi:10.1371/journal.pcbi.1003754.g001 PLOS Computationaloncogenes, and Biology genes | www.ploscompbiol.org with dominant negative mutations or 2 auto- July 2014 | Volume 10 | Issue 7 | e1003754 inhibitory folds have been removed. Yet, taking into account the age of SSD duplicates reveals a relative lack of recent SSD genes in essential genome duplication, while the overall genome exhibits 17.3% 234 genes (see text). [respectively 15.1%] instead (p = 4.5610 ; FE test). This doi:10.1371/journal.pcbi.1003754.g002 suggests that the functional compensation, which can occur between functionally redundant duplicates, leads to a depletion the mode of inheritance of human MDs. To this end, we retrieved (not an enrichment) of MD genes with recent SSD, in agreement the available information on the dominance and recessiveness of with an earlier report [10]. MDs from Online Mendelian Inheritance in Man (OMIM) [11] In addition, we note that, while recent gene duplicates might be and Blekhman et al. [12]. Manual curation yielded 620 autosomal able to mask the phenotypic effect of recessive (e.g., loss-of- dominant and 838 autosomal recessive MD genes after excluding function) mutations, dominant (e.g., gain-of-function or dominant sex-linked genes and MD genes documented as both dominant negative) mutations should typically lead to deleterious phenotypic and recessive (Dataset S1). effects regardless of the presence of any functionally redundant Using Chen et al.’s dataset and analysis, we then found that paralog at a different locus on the human genome. autosomal recessive MD gene duplicates (with possible functional In order to assess the extent of possible functional compensation compensation) do not exhibit significantly more correlated on the retention of MD gene duplicates, we have thus investigated expression profiles than ND genes (p = 0.426, Wilcoxon Rank

PLOS Computational Biology | www.ploscompbiol.org 2 July 2014 | Volume 10 | Issue 7 | e1003754 Intro Riedl AnalogyNK Details Empirical Conclusion References

Thought Experiment

A Thought Experiment: 1 Suppose that different genes have different gene duplicabilities.

2 Suppose also that gene duplicability is preserved after gene duplication.

3 Then: the genome will come to be populated by genes with high gene duplicability.

Lee Altenberg | The Genome-As-Population | 44/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Gene duplication ‘tournament’

The differential expansion of the genome toward genes more likely to give rise to other genes.

Lee Altenberg | The Genome-As-Population | 45/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Some equations for this thought experiment

The duplicability of genetic elements of type i is :

di := Rate[i duplicates] × Prob[i fixes] × Prob[i maintained]

Mobile genetic elements have high values of Rate[i duplicates] but low values of Prob[i fixes] and Pr[i maintained]. Genes in large gene families and regulatory elements have high values for Prob[i fixes] × Prob[i maintained]

Lee Altenberg | The Genome-As-Population | 46/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Fundamental Implication

Prob[i fixes] —increases with the upper tail of the distribution of fitness effects of duplication Prob[i maintained] —increases with the lower tail of the distribution of fitness effects of deletion or pseudogenization

Because bigger upper tails of the DFE of duplication confer high gene-duplicability, the genome can grow bigger upper tails of the DFE of duplication through the addition of new genes. i.e. the genome can grow in evolvability through the addition of new genes.

Lee Altenberg | The Genome-As-Population | 47/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Thought Experiment Model, cont’d INCREASE OF DUPLICABILITY WITH GENOME GROWTH

2 di dit 2 ni(t)=e e− 2(0.002) 25,000 15 genes t = {0, 240, 480, 720, 960, 1200}

10

5 8,600 genes

Relative frequency Relative 3,700 genes 700 genes 0.002 0.004 0.006 0.008 0.010 0.012 Lee Altenberg | The Genome-As-Population | 48/60 di = Rate[duplication] * Prob[fixation] * Prob[maintenance] Intro Riedl AnalogyNK Details Empirical Conclusion References

Thought Experiment Model, concluded

Result (Fisher’s Fundamental Theorem of Natural Selection applied to genome growth)

Assuming that gene-duplicability di is perfectly transmitted between gene duplications, the genomic rate of production of genes that go to fixation and are maintained,

X ni (t) d(t) = d , i N(t) i∈G increases at rate d d(t) = Var(d ) ≥ 0. dt i

Lee Altenberg | The Genome-As-Population | 49/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Arbitrary heritability of duplicability

Result (Price’s Equation in genome expansion) For a gene of type j, let

dj be j’s probability of being stably incorporated in the genome, while

ξj be j’s offspring’s probability of being stably incorporated in the genome: P ξj = i∈G di T (i ←j). The rate of change in the average di of the genome is d d(t) = α Cov(ξ, d) + [ξ(t) − d(t)] d(t) , dt where X X ξ(t) = ξi pi (t), Cov(ξ, d) = ξi di pi (t) − ξ(t) d(t). i∈G i∈G

Lee Altenberg | The Genome-As-Population | 50/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

27/32 27/32

This was correlationA between the upperA tails of A and B. 700700 700700 synonymous synonymous 600600 missense 600600 missense nonsense nonsense 500500 500500

400400 400400

What about between300300 A and C? 300300

200200 200200 Downloaded from Downloaded from Number of alleles of Number alleles of Number 100100 100100

0 0 -3 -2 -1 0 -3 -2 -1 0

Fitness effect (log10 w) http://mbe.oxfordjournals.org/ Fitness effect (log10 w) http://mbe.oxfordjournals.org/ B B 25002500 25002500 27/32

2000 2000 SUBSEQUENT 2000 FIXED 2000 15001500 A A 15001500 B 700700

at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 synonymous at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 10001000 600600 missense 10001000 nonsense DUPLICATION 500500

Number of alleles of Number DUPLICATION alleles of Number 500500 500500 400400

00 300300 00 -3 -2 -1 0 -3 -2 -1 0 200200 Fitness effect (log w) Fitness effect (log w) Downloaded from 10 alleles of Number 10 100100 Fig. 1. Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point 0 Fig. 1. Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point -3 -2 -1 0

mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e. mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codonhttp://mbe.oxfordjournals.org/ substitutions (i.e. Fitness effect (log10 w) B all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring 25002500 all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1. ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1. 20002000 The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the GENETIC 15001500 C

DFE are indicated. Gene fitness as a function of codon substitution is provided as Data S1. DFE are indicated. Gene fitness as a function of codon substitution is provided at OHIO STATE UNIVERSITY LIBRARIES on March 11, 2014 as Data S1. 10001000

SEQUENCE alleles of Number 500500 00 -3 -2 -1 0 Fitness effect (log w) 10

Fig. 1. Distribution of gene fitness effects (DFE) of mutations in TEM-1. (A) The DFE of point

mutations (i.e. 1-bp changes in the gene). (B) The DFE of all possible codon substitutions (i.e.

all 1-, 2- and 3- base changes in the 287 codons of TEM-1). Gene fitness values for conferring

ampicillin resistance are presented on a log scale with 0 corresponding to the fitness of TEM-1.

The contributions of synonymous (red), missense (grey), and nonsense (blue) mutations to the

DFE are indicated. Gene fitness as a function of codon substitution is provided as Data S1. SUBSEQUENT ALLELIC VARIATION

Lee Altenberg | The Genome-As-Population | 51/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Traits that are Correlated with Gene Duplicability

Robertson’s Secondary Theorem of Natural Selection for Gene Duplicability: 1 Suppose that different genes have different gene duplicability.

2 Suppose that there are other properties that covary with gene duplicability.

3 Then these covarying properties in the genome will increase with addition of new genes.

Lee Altenberg | The Genome-As-Population | 52/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Idea in a Nutshell-1st & 2nd

So, on the level of genome-as-population, we have 1 “Fisher’s Fundamental Theorem of Natural Selection"

2 The , and

3 “Robertson’s Secondary Theorem of Natural Selection” where 1 gene duplicability takes on the role of fitness.

Lee Altenberg | The Genome-As-Population | 53/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Heritability of Duplicability Can Evolve

Suppose two genes both have high gene duplicability, but one has higher heritability of the duplicability from one gene duplication to the next. Then the gene with higher heritability of duplicability will predominate. This is the Reduction Principle (Feldman, 1972; A., 2012) on the level of genome-as-population.

Lee Altenberg | The Genome-As-Population | 54/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

The Empirical Research Program

Population Genetics of the “Genome-as-population”: Recall the components of gene duplicability:

di := Rate[i duplicates] × Pr[i fixes] × Pr[i maintained]. For each possible event that adds or removes genetic material i in the genome: Quantify 1 The rate at which it occurs 2 The probability that it goes to fixation in the population 3 The probability that it becomes stably maintained by selection 4 Properties that are correlated with each of the above: 1 Effects of dosage change 2 Positions within gene interaction networks 3 Effects on phenotypes under directional selection 4 Effects on phenotypes under stabilizing selection Lee Altenberg | The Genome-As-Population | 55/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

The Empirical Research Program, cont’d

Quantify the heritability of these properties,

i.e the preservation of these properties: 1 Over time 2 From one duplication to the next 3 With primary sequence evolution of the element itself 4 With primary sequence evolution of other epistatic elements 5 With changes in the environment.

Lee Altenberg | The Genome-As-Population | 56/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

The Empirical Research Program, cont’d

Parts of this program already underway: Quantifying properties correlated with gene duplicability: Connectedness of the protein in the protein interaction network (Guo et al., 2014) Dosage effects of gene duplication (Qian and Zhang, 2008) Complexity of the protein (Yang et al., 2003) Complexity of the organism (Yang et al., 2003) Production of dominant deleterious allelic variation (Singh et al., 2014) In silico studies (Fischer, Bernard, Beslon, and Knibbe (2014) A Model for Genome Size Evolution)

Lee Altenberg | The Genome-As-Population | 57/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

The Empirical Research Program, cont’d

What is missing from the research program so far:

1 Quantification of the heritability of properties on the level of genome-as-population

2 Examining the consequences of differential gene duplicability for the variational properties of the genome

3 Widespread appreciation that all the diverse events that add and remove genetic material in the genome are part of a single theoretical framework.

Lee Altenberg | The Genome-As-Population | 58/60 Intro Riedl AnalogyNK Details Empirical Conclusion References NUMBER OF PAPERS WITH:

"evolution of "distribution of evolvability" fitness effects" 1510 37 2190

1 3 2 361 "gene duplicability" [Google Scholar, 2015-4] Systems-biology approaches for predicting genomic evolution B Papp, RA Notebaart, C Pál - Nature Reviews Genetics, 2011 - nature.com Lee Altenberg | The Genome-As-Population | 59/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Conclusion

The only ways this mechanism will have not left a mark on evolvability and the genotype-phenotype map is: 1 There are no differences in the duplicability of new genes, or

2 The erasure of the variation in duplicability happens uniformly and rapidly among all genes Neither condition is tenable empirically. Therefore, we expect the conditions of gene origin have shaped the genotype-phenotype map and evolvability, and its extent remains an open empirical area for research.

Lee Altenberg | The Genome-As-Population | 60/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Acknowledgements

This research is supported by: The Konrad Lorenz Institute for Evolution and Cognition Research, Klosterneuburg, Austria

The Mathematical Biosciences Institute, Columbus, Ohio under their grant from the National Science Foundation Further reading: Altenberg (1995) Genome growth and the evolution of the genotype-phenotype map.

Thank you for your attention!

Lee Altenberg | The Genome-As-Population | 61/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

A. 2012. Resolvent positive linear operators exhibit the reduction phenomenon. Proceedings of the National Academy of Sciences U.S.A., 109(10):3705–3710. Altenberg, L. 1995. Genome growth and the evolution of the genotype-phenotype map. In Banzhaf, W. and Eeckman, F. H., editors, Evolution and Biocomputation: Computational Models of Evolution, volume 899 of Lecture Notes in Computer Science, pages 205–259. Springer-Verlag, Berlin. Crombach, A. and Hogeweg, P. 2007. Chromosome rearrangements and the evolution of genome structuring and adaptability. Mol Biol Evol, 24:1130–1139. Crombach, A. and Hogeweg, P. 2008. Evolution of evolvability in gene regulatory networks. PLoS Computational Biology, 4(7):e1000112. Draghi, J. and Wagner, G. P. 2008. Evolution of evolvability in a developmental model. Evolution, 62(2):301Ð315. Feldman, M. W. 1972. Selection for linkage modification: I. Random mating populations. Theoretical Population Biology, 3:324–346. Fischer, S., Bernard, S., Beslon, G., and Knibbe, C. 2014. A model for genome size evolution. Bulletin of mathematical biology, 76(9):2249–2291. Lee Altenberg | The Genome-As-Population | 61/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Fisher, R. A. 1930. The Genetical Theory of Natural Selection. Clarendon Press, Oxford. Guo, Z., Jiang, W., Lages, N., Borcherds, W., and Wang, D. 2014. Relationship between gene duplicability and diversifiability in the topology of biochemical networks. BMC genomics, 15(1):577. Kashtan, N. and Alon, U. 2005. Spontaneous evolution of modularity and network motifs. Proceedings of the National Academy of Sciences U.S.A., 102:13773–13778. Kauffman, S. A. and Levin, S. 1987. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128:11–45. Levinton, J. 1988. Genetics, Paleontology, and Macroevolution. Cambridge University Press, Cambridge. Meyers, L. A., Ancel, F. D., and Lachmann, M. 2005. Evolution of genetic potential. PLoS Computational Biology, 1:236–243. Pigliucci, M. and Müller, G. B., editors 2010. Evolution — The Extended Synthesis. MIT Press, Cambridge, MA. Qian, W. and Zhang, J. 2008. Gene dosage and gene duplicability. Genetics, 179(4):2319–2324. Lee Altenberg | The Genome-As-Population | 61/60 Intro Riedl AnalogyNK Details Empirical Conclusion References

Riedl, R. J. 1977. A systems-analytical approach to macroevolutionary phenomena. Quarterly Review of Biology, 52:351–370. Singh, P. P., Affeldt, S., Malaguti, G., and Isambert, H. 2014. Human dominant disease genes are enriched in paralogs originating from whole genome duplication. PLoS computational biology, 10(7):e1003754. Wagner, A. 2008. Robustness and evolvability: a paradox resolved. Proceedings of the Royal Society B: Biological Sciences, 275(1630):91–100. Yang, J., Lusk, R., and Li, W.-H. 2003. Organismal complexity, protein complexity, and gene duplicability. Proceedings of the National Academy of Sciences, 100(26):15661–15665.

Lee Altenberg | The Genome-As-Population | 61/60